Evaluator Group Launches New VDI Performance Benchmark

Storage analysts, Evaluator Group have announced a new storage specific benchmark for VDI that takes an interesting and innovative approach to the inherent complexity of attempting to benchmark the storage infrastructure needed to support VDI workloads.

I have been championing the cause of a transparent benchmarking of desktop virtualization workloads ever since the days of the almost weekly spats between Citrix and VMware arguing the over the performance of XenDesktop and View until I was eventually able to announce that Citrix had adopted Login VSI as its “standard” benchmarking tool for XenDesktop. This was at the time the smartest thing that Citrix could have done. Not because it was good (although it was) nor because it was independently developed (although that helped), but because it was well documented, transparent, and freely available; and as result had already been adopted by the majority of the independent desktop virtualization subject matter experts as the de facto standard to judge the performance of Remote Desktop Services and VDI platforms.  Citrix’s support of Login VSI insured that the results that Citrix published could be independently verified and it was hoped by many that VMware would follow Citrix’s lead and move to adopt it as well. Disappointingly, VMware chose instead stick with its own internally developed benchmark Reference Architecture Workload Code (RAWC) which it followed up with the more accessible if less well-known VMware View Planner.  From a technical perspective RAWC had some advantages over Login VSI, most notably in the way it could randomize workload to more realistically simulate random variations in user activity. Regardless, it would have been good to see VMware engage the broader community by partnering with the Virtual Reality Check team and contribute its expertise to the development of Login VSI. Unfortunately, competitive concerns meant that VMware was unwilling to abandon its own solution at the time and the opportunity was lost.

Now, 18 months later the Evaluator Group is looking to introduce a new standard, so what is it and what should you make of it?

Announced as VDI-IOmark, the new benchmark does away with much of the infrastructure needed by Login VSI and RAWC to drive VDI workloads (the virtual desktops themselves, along with supporting Active Directory, e-mail, Web servers etc.) in favor of a far simpler model that bypasses the virtual desktops altogether and instead replays “pre-recorded” I/O transactions directly against the storage infrastructure under test. The attractions of this approach are readily apparent, the cost and complexity of setting up a test environment are dramatically reduced.  Evaluator Group claims that it is possible to perform a comparable the test using 1/10th of the resources that’s a conventional test would require, making it available to organizations with even the most limited resources. But does it compare favorably with either Login VSI or RAWC? Well it really depends on what you want to do.

Before going any further it is worthwhile looking more closely at the different types of benchmarking solutions.

In general, there are three different types of benchmarks that might be used in in this context:

  • Application Benchmarks – Generate load using real world applications driven an automated functional testing tool.  Examples of application benchmarking tools would include Login VSI, RAWC, and View Planner.
  • Synthetic Benchmarks – Generate load by combining basic computer functions in proportions the developers feel will represents an indicative measure of the performance capabilities of a system under test.
  • Workload Replay – This approach combines the best of both worlds capturing the essential characteristics of real application workload then reproducing them on demand in the same why that a synthetic benchmark generates load.

VDI-IOmark actually starts off by using VMware RAWC to drive application load against a VMware View environment, capturing the disk I/O activity generated by the virtual infrastructure for subsequent replay.  By replaying only the disk I/O activity, VDI-IOmark overcomes the shortcomings of synthetic benchmarks by using real applications to generate the I/O activity, whilst eliminating the complexity of building out a full virtual infrastructure to generate the workload.  It also ensures that tests are repeatable with minimum variation that might be caused by changes in configuration of the virtual desktop infrastructure that are dependent on configuration details (e.g., desktop operating system build configuration or patch level). However, if used inappropriately this approach can also be a weakness.

From an enterprise IT perspective a benchmark can assist in identifying products with the best price performance ratio, but is of little value when it comes to system sizing or capacity planning. Contrasted with a full application benchmark driving an end to end system where it is easy to change individual elements of the entire stack to perform what-if analyses, workload replay is frustratingly inflexible. Understanding the impact to storage on the transition from Windows XP to Windows 7 requires no more than replacing one virtual desktop image with another. Assessing the performance benefits to be obtained in moving from View 4.6 to View 5.0 would require little more work. Scripted workflows can readily be customized to meet specific testing needs, assessing how the unique business critical application will impact disk performance.  On the other hand, storage vendors should derive significant value from the ability to offer potential customers a simple to understand benchmark figure. However, for any standard benchmark to be accepted by its intended audience, it must be both trusted and relevant.

Compared to established benchmarks such as those published by the Standard Performance Evaluation Corporation (SPEC) or the Storage Performance Council (SPC), VDI-IOmark still has work to do. The VDI-IOmark Theory of Operation document (registration required) is worthwhile reading if you are looking to gain a basic understanding of the operation of VDI-IOmark, however it does not go into sufficient depth and consequently fails to provide the level of assurance needed to provide an appropriate level of trust in the results. Having said that, I am confident that this no more significant than a temporary documentation deficit. I spoke at length with Russell Fellows (Senior Partner, Evaluator Group) who led the development of VDI-IOmark.  There’s no doubt that’s he fully understands the challenges that VDI places on storage systems, and has a tailored VDI-IOmark to ensure that it delivers meaningful results and I would strongly recommend that anyone who is interested in exploring VDI-IOmark contact him directly.  A bigger challenge lies in the current development path that hypervisors are taking.  Although VDI-IOmark is today only implemented for VMware View, and hence only works with VMware vSphere, it is by design hypervisor agnostic.  However technologies such as Citrix IntelliCache for XenServer and its vSphere 5 equivalent blur the boundaries between hypervisor and storage. This may not directly impact raw performance numbers gathered through the benchmark, but it will introduce variations in results which could be used for marketing advantage.   More importantly, innovators such as V3 Systems and Nutanix that offer combined compute/storage appliances may be see a VDI-IOmark that does not accurately represent the overall performance that the tight integration of components creates.

The one area that causes me significant concern is the approach that Evaluator Group has taken in establishing a separate organization – VDI-IOmark.org to publish this benchmark. For starters, the bona fides of VDI-IOmark.org are less than clear.  Although its landing page clearly states “Our organization was created as a group of industry organizations, technology vendors and IT users, for the purpose of developing and reporting storage performance in a vendor neutral setting” there is no published list of the industry organizations and technology vendors that make up VDI-IOmark.org. Given that one of the primary factors in legitimizing standards groups is the depth of its membership, the lack of ready access to this information does not inspire confidence.  At the same time the VDI-IOmark Theory of Operation is labeled “Developed by Evaluator Group” rather than VDI-IOmark.org and includes the statement “All rights reserved; patents pending. VDI-IOmark is a trademark of Evaluator Group, Inc.” This falls a long way short of the openness of other benchmarking standards organizations such as SPEC or SPC. Evaluator Group needs to decide what it wants to do with IVD-IOmark.  If it really feels that it is necessary to establish a new standards body to support this one very specific use case, then it needs to go all the way and either establish it as a fully independent entity or hand it over to SPC to manage.  Alternatively, Evaluator Group might be better served collapsing VDI-IOmark.org back into its core business whilst remaining true to the principles of open communication and vendor neutral platform development; perhaps inviting participation from independent experts to ensure objectivity is seen to be maintained.  There are enough for profit open source businesses to show that this approach can be of benefit to all participants.

As things stand, the combination of uncertainty regarding the bona fides of VDI-IOmark.org and the limited information Evaluator Group has published on the operation of the benchmark make it difficult to offer a blanket recommendation in favor of VDI-IOmark just yet. However, with some fine tuning of the organizational structure, coupled with improved documentation, I would suggest that VDI-IOmark has the potential to become the preferred means of benchmarking of storage solutions for VDI environments.