Workload-Optimized Hyperconvergence for VDI

Last year’s EVO:RAIL specification from VMware marked the commoditization of hyperconverged infrastructure appliances (HCIAs). In the months that followed, seven new HCIAs were launched, all sharing a common hardware and software specification, with only minor differentiation to distinguish one product from the next. However, while EVO:RAIL has marked the commoditization of hyperconverged infrastructure platforms for general-purpose server workloads, it has not done the same for VDI. In creating EVO:RAIL, VMware has overlooked the growing importance of support for GPU virtualization in VDI. This has left the market open for innovative appliance vendors to build new high-performance VDI appliances, for which the hardware matters just as much as the software.

To recap: the EVO:RAIL appliance specification is based on a single 2U high enclosure hosting four rail-mounted (hence the name) slide-in server nodes. Each server node has the following specification:

  • Two Intel Xeon E5-2620 v2 six-core CPUs
  • 192 GB of memory
  • One SLC SATADOM or SAS HDD as the ESXi™ boot device
  • Three SAS 10 K RPM 1.2 TB HDD for the VMware Virtual SAN™ datastore
  • One 400 GB MLC enterprise-grade SSD for read/write cache
  • One Virtual SAN–certified pass-through disk controller
  • Two 10 GbE NIC ports
  • One 1 GbE IPMI port for remote (out-of-band) management

There’s nothing wrong with EVO:RAIL as a general-purpose appliance specification, and the explosive growth in the HCIA market fully validates VMware’s decision to take this path. My only criticism is that the hardware platform specification is a little pedestrian—a little lacking in ambition. The E5-2620 v2 CPU with just six cores running at 2.1 GHz was an entry-level server processor, and it was a year old when EVO:RAIL was announced. With a recommended price of just $406, it looks a little out of place in an appliance costing $200,000. If it hadn’t needed products that matched the EVO:RAIL specification ready for VMworld 2014, VMware would have been well advised to wait and use the Haswell generation E5-2620 v3 in its stead. Waiting for the newer v3 processor to ship would have given customers time to search down the back of the sofa for another $11 to cover the increased cost of the newer processor. As it is, EVO:RAIL is a 2013-era specification announced in 2014 and competing in 2015. For comparison purposes, at the time VMware announced EVO:RAIL, hyperconverged infrastructure leader Nutanix was already running the faster eight-core E5-2660 v2 in its 2U four-node NX-3000 workhorse, and it has since updated it to support a range of Haswell-generation processors.

Lakeside GPU Use
Lakeside GPU Use

Although the processor spec means that EVO:RAIL virtual desktops aren’t going to win any speed records, EVO:RAIL’s biggest problem as a VDI platform is that in defining the specification, VMware overlooked the growing importance of GPU support. At the recent GPU tech conference in San Jose, Florian Becker from Lakeside Software led a breakout session that highlighted the importance of hardware-based graphics acceleration in delivering even everyday applications such as Microsoft Office.

Very few Windows applications today fail to take advantage of hardware-based graphics acceleration. Trying to improve performance using soft GPU virtualization only steals compute resources away from where they are needed.

By the time the EVO:RAIL specification was announced at VMworld in August 2014, VMware had already made it clear that vGPU support was on its way; it could have taken this into account in the EVO:RAIL specification. As it was, the physical footprint chosen closed the door to vGPUs. Shoehorning full-height, full-length, double-width NVIDIA GRID boards into an EVO:RAIL node can’t be done. Only Fujitsu offers a solution here, and it succeeds only by bending the rules with a double-height EVO:RAIL node with space for a single NVIDIA GRID card.

VMware’s unofficial response regarding GPU support is to tell customers to wait for EVO:RACK. While EVO:RACK-compliant hardware isn’t shipping yet, we can make some educated guesses about what it would look like. VMware has confirmed that EVO:RACK will handle the full vCloud Suite with integrated virtual and physical networking and will manage servers, JBOD or DAS storage, and top-of-rack network switching. Anyone visiting Quanta’s stand at VMworld 2014 could see its EVO:RACK prototype based on its Rackgo X F03A server systems. Quanta’s Rackgo systems use the same 2U four-node packaging format that EVO:RAIL uses, but they use the Open Rack 1.0 specification rather than the standard 19-inch rack. Open Rack uses larger, 21-inch rack spacing. This gives Quanta a little more room to play with, and it has used it to squeeze two PCIe slots to each node. Packaging constraints limit these slots to PCIe x8; they are too small to accommodate the NVIDIA boards.

Don’t assume, though, that all EVO:RACK systems will use the Open Rack specification. In introducing EVO:RACK in a blog post last August, VMware’s Raj Yavatkar, Chief Architect for Hyper-Converged Infrastructure, indicated that “EVO:RACK…can run on a range of pre-integrated hardware configurations ranging from Open Compute Project–based hardware designs to industry-standard OEM servers and converged infrastructure.” Allowing standard OEM servers into the EVO:RACK specification opens the door for implementation partners to incorporate full-size servers into an EVO:RACK stack, with space for onboard GPUs. The only caution here is VMware’s positioning of EVO:RACK as an enterprise platform. vCloud Suite is overkill for VDI and could effectively price anything based on EVO:RACK out of the picture, if implementation partners are boxed into delivering systems that are geared for delivering thousands of VDI instances rather than the hundreds that many smaller businesses need.

Nvidia GridPotential appliance users fare a little better outside of the EVO world. Pivot3 supports a single NVIDIA GRID card in its vSTAC R2S P Cubed appliance. So does the current-generation Sphere 3D V3 appliances. However, there’s not much you can do with a single card beyond accelerating basic business graphics workloads, and with only one slot available, VMware Horizon customers need to weigh the benefits of accommodating a GPU board versus a Teradici APEX board. Nutanix looks to have gotten it right with its single-node NX-7000 platform. With three PCIe expansion slots, it can support two GRID K1 or three GRID K2 cards, giving it the makings of a good general-purpose business graphics platform. The ten-core Xeon E5-2680 v2 processor in the NX-7000 is substantially better than the EVO:RAIL processor, even if it is starting to look a little dated among today’s offerings. It still delivers significantly better desktop performance than the EVO:RAIL’s E5-2620 v2.

Nutanix has recently updated its NX-3000 and NX-6000 series appliances with a choice of Haswell generation processors, and I expect it to announce new Haswell-based NX-7000 series appliances by its .NEXT conference in June. Now, if Nutanix can increase the number of PCIe slots at the same time, it should have a winner.

At the GPU tech conference, NVIDIA detailed a potential market of 625 million professional graphics users with unmet needs. There are excellent business reasons to adopt VDI for professional graphics, and these reasons are largely immune to the price sensitivity that stalled mainstream VDI adoption for years. The price premium of converged infrastructure appliances is less of a concern. Early adopters of 3-D graphics on VDI are reporting direct business benefits of $30,000 per annum per employee: this highlights a market laden with opportunity. Currently, only one HCIA vendor has a hardware platform that approaches this lucrative market’s needs. Neither EVO:RAIL nor EVO:RACK are going to commoditize hyperconvergence for VDI, which leaves the market open for someone to step in and deliver a hyperconverged platform optimized for professional graphics workloads.