In “CA Starts the Race To Self-Destruction Among the “Big Four” in Virtualization Management” we explained why the big four are not a good choice for managing your virtual infrastructure (and for that matter your private/hybrid/public cloud). There are two top level reasons for this. The first is that virtualization both breaks how legacy management solutions work and introduces a new set of requirements that legacy solutions cannot address. The second is that the management vendors who are finding success in the virtualization market have focused upon an “easy to try, easy to buy, and affordable to own” business strategy that is the opposite of how the big four do business.
Why is Managing your Virtual Infrastructure Important and Different Problem?
As we all know infrastructure monitoring and management solutions have existed for years. These tools got their start in the mainframe era. The modern generation of legacy tools got their start with the emergence of local area networks as computing platforms with first client/server and then n-tier systems replacing many mainframe and mini-computer based systems as preferred deployment platforms. Legacy infrastructure management platforms were all characterized by two fundamental principles. The fist is that they used some method to know whether the server or network devices were up or down. The second was that they used some method to collect resource utilization data from the servers and network devices to be able to infer whether or not the performance of the environment was normal or not..
Unfortunately for IBM, CA, and HP, the world has changed in significant ways. The following changes have created new requirements for how infrastructure management solutions must work, and what they must deliver:
- Management agents (for infrastructure management) have become a really bad idea. Such a bad idea so that many organizations remove them from physical servers as a part of the process of virtualizing those servers. This is because management agents for infrastructure management have been worse than useless in the virtualized world. They consume resources, add complexity, and deliver no value (more on this below).
- A predictable and static environment has now become very dynamic. The rate of change in just the normal operation of a virtualized environment creates such a large number of configuration change events, that legacy approaches to keeping up with configuration data completely fall down. This basically means that the CMDB (which is the underlying configuration data store for legacy infrastructure management systems) has itself become worse than useless (again, because it add complexity, costs a lot of money and time to maintain, and adds no value).
- Dynamic environments and high rates of change require near real-time data collection. The legacy systems management industry was built around the notion of hourly, 15 minute or 5 minute polls of the environment to collect data. The problem in a virtual environment is that a lot of things can go horribly wrong in 4 minutes and 59 seconds. Leading edge tools have emerged that focus on getting as close to real-time and continuous as possible in their data collection approaches. Legacy tools can neither collect data at this rate, nor can they accept and store such a stream of data in their back end architectures.
- The combination of rapid changes to applications, dynamic run time environments and distributed run time environments creates a need for tools that constantly discover the environment and self-configure to deal with the new reality of the system. Legacy tools cannot keep up with the rate of change, nor can they auto-adapt to it.
- The manner in which resources are abstracted and shared in a virtualized environment means that one can no longer infer that a system and its applications are performing properly based upon a normal resource utilization profile. This breaks one of the core operating assumptions of the legacy management solutions, and is the principle reason why it no longer makes sense to deploy infrastructure management agents inside of guest virtual machines.
- As mentioned at the top of this article, the people that own virtualized environments in large, medium and small enterprises, have little patience for how legacy management vendors want to do business. One year evaluation cycles, expensive purchase prices, expensive maintenance policies, and ongoing requirements for vendor provided consulting services are completely out of step with how the virtualization team wants to evaluate end procure software solutions.
To meet these challenges it is critical to select an infrastructure management solution for your environment that has the following capabilities:
- Cross Virtualization Platform Support. The first question that you have to ask yourself when selecting a product in this space is whether you are or are going to be a more than one virtualization platform shop. If you are going to have more than one virtualization platform, then it makes no sense to build separate and different management stacks for each of those platforms. Doing that will simply cause you to repeat the mistakes in management tool proliferation that made the physical environment so costly and time consuming to manage. So this one will either matter to you or not – but figure it out before you start your tool evaluation process.
- Self-Learning Analytics. VMware forever changed this space with the acquisition of Integrien and the subsequent integration of the Integrien technology into vCenter Operations. Self-learning analytics address some of the core limitation of legacy solution. The first is that on one has the time or the expertise to manually set thresholds – not withstanding the fact that in a dynamic environment, the manual thresholds are almost certain to be wrong – if not at the moment they are set, within a few days. The second is that no one has the time to crawl through hundreds of metrics and to try to find how and why the degradation in X is caused by metric Y associated with infrastructure element Z. The virtualization admin as a human metric correlation engine is simply a failed model.
- System Topology Discover and Mapping. So what is talking to what, and what is dependent upon what? Keeping on top of this is where the CMDB abjectly fails and becomes worst than useless (again because it consumes time and money and adds no value). Understanding the performance and capacity of individual hosts and VM’s is of little value. You need to understand their performance and capacity in the context of the application systems that they support. VMware has made important strides here with the integration of VMware Infrastructure Navigator into vCenter Operations Enterprise.
- Performance Management. Are resource constraints in your environment (CPU, memory, network, I/O operations) impacting the performance of your workloads? This is one of the central questions involved in effectively managing your virtualized environment. This capability is joined at the hip with the Capacity Management capability below (it is hard to have one without the other).
- Capacity Management. In the physical world, capacity planning was done on a periodic basis. In a dynamic virtualized environment, capacity management which means understanding how temporary constraints in capacity impact workload performance is an essential capability. This also includes the ability to “right-size” your VM’s and eliminate the over-provisioning that costs you money .
- Configuration Management. The most common cause of issues with performance and availability have to do with mis-configuration, inconsistency in configuration, and changes to configurations that are done by the wrong people and that are improperly implemented. For this reason marrying a real time understanding of configuration change events to your understanding of workload performance and capacity is essential.
- Infrastructure Performance Management. Understanding the actual performance of your infrastructure means much more than understanding resource utilization. It means understanding the end-to-end latency of requests made to the infrastructure . This is an emerging area of focus for several leading edge vendors in the market.
- Automated Remediation: This capability is the cornerstone of VMware’s promise for how it is going to reinvent infrastructure management, and move everyone from a “monitor-notify-manually fix” model to a “monitor-automatically fix – notify the human model”. While this is a most compelling vision, and one that carries with it the promise of dramatically reduced OPEX costs for enterprises it is not yet implemented in any of VMware’s solutions in this area.
With the above criteria in mind, here is a comparison of some modern and very capable virtualization infrastructure management solutons:
|Vendor/Product||Virtualization Platforms||Product Focus||Self-Learning Analytics||System Topology Discovery||Resource Based|
|Infrastructure Performance Management||Automated Remediation|
|Cirba V 7||vSphereVMware, RHEV, IBM LPARS, Solaris Zones||Continuous capacity management driven by policies and analytics|
|Hotlink SuperVISOR||vSphere, Hyper-V, KVM, XenServer||Allows you to use vCenter to manage Hyper-V, KVM, and XenServer|
|Liquidware Labs Stratusphere||VMware View and Citrix XenDesktop on vSphere platforms||Monitoring vMware View and Citrix Xen in production on vSphere platforms|
|ManageEngine OPManager||vSphere, Hyper-V||Broad monitoring of networks, servers, and virtualization environments in one product.|
|Microsoft SCOM/SCVVM||Hyper-V naively – others through management packs||Management of Windows servers and Windows Hyper-V. Extensible through a wide range of management packs to be able to monitor and manage most hardware, and most non-Microsoft environments including vSphere.|
|Netuitive||vSphere||Self-learning performance management analytics at scale|
|PHD Virtual Virtual Monitor||vSphere, Citrix XenServer, Citrix XenDesktop, VMware View||Broad monitoring of networks, servers, and virtualization environments in one product.|
|Quest Software vFoglight||vSphere, Hyper-V||Comprehensive monitoring of the virtual and underlying physical infrastructure|
|Reflex Systems||vSphere||Highly scalable, near real-time performance, capacity, and configuration management|
|Solarwinds Virtualization Manager||vSphere, Hyper-V||vSphere capacity planning, sprawl control, performance management and configuration management integrated into one solution.|
|Veeam nworks and Monitor||vSphere, Hyper-V||Use to buy and implement vSphere monitoring, as well as the ability to integrate vSphere monitoring with Microsoft SCOM and HP Operations Manager|
|Virtual Instruments Virtual Wisdom||vSphere||Understanding the real time and continuous performance (latency) of storage transactions from the perspective of the SAN, and how that latency impacts the performance of the virtual infrastructure.|
|VKernel (Quest)||vSphere, Hyper-V||Resource constraint based performance and capacity management|
|VMTurbo||vSphere, Hyper-v, Xen||Automatically assigns the needed resources to workloads based upon workload priority, preventing performance and capacity issues from occurring.|
|VMware vCenter Operations||vSphere||Self-learning analytics integrated with vSphere focused performance, capacity and configuration management|
|Xangati||vSphere, View and XenDesktop on vSphere||Infrastructure performance management with a focus upon network and storage latencies that impact desktop and server virtualization initiatives|
|Zenoss||Broad range of physical hardware, vSphere and Hyper-V||Comprehensive end-to-end management system for your physical and virtual storage, network and server systems.|
Virtualization is such a profound change to how systems operate that it not only creates new management requirements, but it also breaks legacy management solutions. For these reasons, enterprises should look outside of traditional legacy management vendors for their virtualization performance and capacity management solutions. The focus should be on the richness of the virtualization aware functionality in the solutions, their support of multiple platforms, and how easy the solution is to try, buy and implement.