DRS is one of the most useful and interesting features of VMware vSphere (to be more specific – feature of versions of vSphere from Enterprise on up). DRS is useful because it prevents workloads (VM’s) that are consuming more than the expected amount of resources, from potentially harming the performance of their neighbors in the same host with this “excess” resource consumption. DRS is interesting because the idea of dynamically balancing the load of a system in order to ensure the performance of the critical workloads running on that system is something that was taken for granted in the days of the mainframe, but has not as yet been well implemented on distributed Intel architecture systems.Scott Drummonds formerly one of the key vSphere performance experts who now works at EMC has a new blog with a post “Alternative to DRS” that is worth reading for Scott’s always valuable perspective. Scott makes two points; 1) that VMware has left a large gap in price between the Enterprise Edition ($2,875 per socket) and the cheapest edition that support vMotion ($995 per socket) that seems exploitable, but for the fact that there are many other features besides just DRS that justify that price gap, and that 2) a vendor – VMTurbo – seems to have provided some interesting improvements upon DRS but that this seems like a questionable strategy given what VMware is investing in the long term capabilities of DRS.
VMware can always play with the composition of its offerings and its pricing so as to package competitors out of existence (after all VMware is run by some very smart people who perfected the art of packaging competitors out of existence while at Microsoft) so let’s just assume that exploiting that price gap is not the reason to consider either building (on the part of a vendor) or buying (on the part of an enterprise) a replacement for DRS. What then could the set of reasons be for a “third party DRS”?
- Balancing load across more than vSphere. Sophisticated enterprises worldwide have bought into the notion of dynamic and agile IT infrastructures – as this makes IT more agile in support of business objectives. This is in fact the next generation payoff for virtualization (after you get the hard dollar ROI from tactical server consolidation). But many enterprises look at agile IT as an initiative larger that vSphere and/or vSphere compatible clouds. What if you need to balance load across virtual and physical resources? What if you would like to balance load across multiple hypervisors? What if you would like to balance load across your internal data center and one or more public clouds and not want to be constrained to clouds that are VMware compatible?
- More intelligent load balancing. When VMware released vSphere 4.1 we wrote about Dynamic Resource Load Balancing as a capability that was not just resource contention based (as DRS is today), but that leverages the new Network I/O Control (NOIC) and Storage I/O Control (SOIC) features of vSphere 4.1. VMware now delivers quite a bit of highly functional and granular control capabilities based upon a broad set of potentially constrained resources (CPU, memory, storage I/O and network I/O), but has yet to tie these features together into something where all of the constraints and the policies surrounding them can be managed together in a true intelligent load balancing solution.
- Applications aware load balancing. DRS today considers resource utilization by VM’s and equates VM’s to workloads. It assumes that if a VM is using too much CPU or memory that this must be bad and it finds a place for that VM to get that amount of CPU and memory without causing resource constraints for other VM’s. In order for dynamic load balancing to be really effective, applications systems need to become manageable entities within the virtualization platform including the idea of which applications system has a higher priority than other applications systems. The load balancing system then needs a true understanding of applications performance, i.e., response time and variability in response time. DRS does not know if a resource constraint is causing a response time problem, or in fact trying to fix a resource constraint could create one where none exists.
- Integration with IT as a Service initiatives. If IT as going to put up a service catalog backed up by an automated orchestration and provisioning system, then it will be necessary for the load balancing system to be able to deal with the constant arrival of new dynamically provisioned workloads, and the departure of expired workloads in addition to being able to just balance what has been running in the environment for some time. This means that within the VMware world, DRS is going to have to become aware of, and in some way integrated with VMware vCloud Director. It means that in the larger world of cross-platform dynamic and agile IT that products that look at performance and want to take performance based actions, are going to need to become integrated with products that orchestrate and provision resources at the request of actions taking in a service catalog.
So if the vision is a dynamic and agile IT infrastructure of which VMware vSphere is certainly a strategic component, but that extends in scope to potentially physical platforms and other virtualization platforms, by whom might dynamic load balancing be provided? It is important to realize that while many fine solutions exist that address parts of this problem, the large problem as stated above requires integration between multiple of the vendors mentioned below that has only just started to happen:
- VMTurbo – mentioned in Scott’s blog post – is interesting for two reasons. The first is that VMTurbo has worked very hard to identify and collect data in its product that represents constraints to performance. This means that 99% of the garden variety resource utilization data that is just noise in the system is largely ignored and what is focused upon are the things that really indicate that resource issues are highly likely to be impacting performance. VMTurbo then does something that no other vendor has done. VMTurbo prices these resources based upon their scarcity (scarce resources cost more), and allows the customer to assign budgets (in terms of virtual dollars) to workloads. The system then does what an economic free market system does at equilibrium – it allocates resources to their highest and best use. This “allocation” can take the form of a recommendation for the admin to execute manually, or it can be implemented automatically by the VMTurbo product.
- Platform Computing. Platform Computing ISF is able to dynamically balance resources and provide for guaranteed resource reservations across multiple virtualization platforms, physical and virtual systems, as well as multiple clouds. Platform has many years of experience providing analagous functionality in the high performance computing realm for grids, and has used its experience and technology to build the new ISF product.
- The IT as a Service vendors. Quest Software acquired Surgient and markets the resulting product as Quest Cloud Automation Platform which is a full self-service and orchestrated provisioning system that spans VMware and physical resources. Quest also has market leading applications performance management functionality in the form of Quest Foglight. The integration of these two product portfolios would provide for precisely the kind of suite of functionality alluded to above. ManageIQ, DynamicOps, Embotics, Eucalyptus, and newScale all offer IT as a Service functionality that expands upon what is offered in VMware vCloud Director in some way. Integration of any of these IT as Service solutions with any of the leading edge virtualization aware APM solutions (covered in this post) would provide for the kind of applications performance aware load balancing alluded to earlier.
The question of whether and how to replace DRS is really a part of the question of what is in the virtualization platform and what is not. Clearly the virtualization platform consists of much more than the hypervisor. VMware would like to define the virtualization platform as all of vSphere Enterprise Plus, and then suggest that vCloud Director and its own performance management solutions are logical extensions of that platform. Enterprises need to be careful about where they draw their own lines in this regard. As VMware is a clear market leader both in terms of product functionality and enterprise installations, VMware needs to be given full credit for the quality of vSphere and its success. However full credit does not need to imply that one is 100% locked in to VMware solution as there is room to pursue third party IT as a Service, Performance Management, and Service Assurance strategies as well as replace/augment components in vSphere.