VMware customers who have needed to manage the Availability, Resource Utilization, Capacity, Configuration, and the impacts of these areas upon system performance have had a rich set of vendors to choose from to get these types of solutions. Up until today, a comprehensive solution in this area has not been available from VMware itself. VMware has now announced vCenter Operations, three editions of which are intended to address these issues.
vCenter Operations (vC OPS), is a new member of the vSphere product line that integrates availability, resource utilization, capacity management, and configuration management with self-learning analytics from the Integrien acquisition thrown in for good measure. There are three editions of vC OPS:
vCenter Operations Standard Edition
vC OPS Standard includes resource utilization monitoring, capacity management, configuration management and the automated analytics that came from Integrien. The idea behind vC OPS Standard is that the key vCenter resource utilization metrics that can impact performance are automatically analyzed and then scores for Workload, Health and Capacity are automatically calculated. This promises to be a great time saver for VMware Admin’s, as the process of crawling through vCenter to find the metric that indicates the problem, and/or setting manual thresholds for these metrics has been automated by the self-learning analytics in the product. vC OPS Standard is implemented as a vCenter plug-in and is limited to 500 VM’s per Virtual Center.
vCenter Operatons Advanced Edition
vC OPS Advanced is a bundle of vC OPS Standard and CapacityIQ. This bundle is obviously useful for environments that have highly variable workloads and that therefore need continuous capacity management, or environments where the load is growing fast enough so that adding hosts in time to keep up with the growth in load is a concern. vC OPS Advanced is implemented as a vCenter plug-in and is also limited to 500 VM’s per Virtual Center.
vCenter Operations Enterprise Edition
vC OPS Enterprise is a bundle of vC OPS Standard, CapacityIQ, vCenter Configuration Manager (the former ConfigureSoft product). Targeted at large enterprise installations, this adds the robust configuration management abilities of vCenter Configuration Manager to the solution. It also includes the version of the Integrien technology that supports adapters to other monitoring solutions. Adapters exist for fairly wide range of products from things like Keynote that generate synthetic transactions for measuring response time, to Hyperic which is a broad scale monitoring solution (from VMware) that can monitor the entire physical infrastructure underlying a vSphere installation. There is also an SDK that allows additional adapters to be developed with professional services when there is a need to integrate custom or applications specific monitoring data. vC OPS Enterprise includes its own standalone management console and does not come with an upper bound restriction on the size of the environment that it can manage.
So What Does This Mean?
VMware has on numerous occasions stated that virtualization creates a need for a new management stack, and stated their intentions to deliver that management stack. However up until now, other than on the security front with vShield, the delivery has been spotty. In particular customers who wanted and needed a robust availability, performance, and capacity solution had to look to a third party tools like NetApp (Akorri) BalancePoint, Quest vFoglight, Veeam Monitor, vKernel Capacity Analyzer, VMTurbo, Netuitive, Xangati, or Zenoss. While vC OPS does not do everything that all of the tools listed above do, it certainly does enough in the realm of availability, performance, capacity and configuration management to warrant consideration. The inclusion of the Integrien self-learning analytics is a strong differentiator in vC OPS. The only other vendor of self-learning analytics is Netuitive. If self-learning analytics becomes a required feature due to its inclusion in vC OPS, then Netuitive as the leading independent alternative of self-learning analytics will be in a strong position to partner with other vendors in the VMware ecosystem in order to create a best of breed alternative bundle.
Pricing for vC OPS starts at $50 per VM. This is the price for vC OPS Standard. So if you have 20 VM’s per server, vC OPS Standard will cost you $1,000 per server. If you have the 500 VM’s that vC OPS Standard supports this would be $25,000 for your 25 hosts. Obviously this price will come down in quantity, and will also go up for the higher end editions of vC OPS. One issue with this pricing is that one of the benefits of managing your vSphere environment with a product like this, is that you ought to be able to increase VM density on a host with proper management of resource utilization. Therefore as you increase that density, your price per host is going to go up. This is in direct contrast to most of the vendors in the third party ecosystem who charge by host, CPU socket or core which means that the price per host of these solutions does not go up as you increase VM density per host.
The Strategy and the Stack
It is important to understand the totality of the VMware strategy and product set. If you need to monitor the physical environment that underlies your vSphere installation, VMware offers Hyperic, a broad scale monitoring solution that has support for a very wide range of hardware and software. vC OPS is an attractive combination of resource utilization, capacity, and configuration monitoring for vSphere that includes self-learning analytics. VMware has not talked about AppSpeed much in the last six months, but if you have a web/Java/.NET/SQL based application, AppSpeed is a perfectly capable applications performance management solution, and if there is not already an Integrien adapter for it, there will likely shortly be one – allowing for the integration of response time data into the vC OPS Enterprise self-learning model. So VMware has a fully functional “monitoring stack” starting at the physical layer and extending all of the way up to applications response time.
So Where are the Holes?
While VMware has done an absolutely fantastic job with the feature set, performance, scalability and stability of vSphere, ESX and ESXi, this has not to date been the case with VMware’s management offerings. The best example of this is the recently released vCloud Director and vCloud Request Manager which are widely viewed as being more typical version 1.o products (not fully complete, and with some bugs left to fix). vC OPS is going to compete with some products that have had years of seasoning, that are installed in some very large environments and that have thousands (and in some cases) ten’s of thousands of customers using the product in production.
VMware has also not been enormously successful selling any management product except for vCenter which is really the management console for vSphere with some management functionality thrown in. For example when VMware rolled out AppSpeed it discovered that this was a product that gets bought by applications owners, not the owners of the virtualized infrastructure, requiring an entirely different sales and marketing approach. The lack of this approach severely crimped the sales of AppSpeed as VMware did not arm its sales teams with the information required to compete in the APM realm. Right now there are two audiences for a product like vC OPS. The team that owns and runs vSphere in the enterprise is an audience, one that VMware knows extremely well how to sell and market to. However, enterprise management solutions like those from CA, HP, IBM (Tivoli), BMC, Quest, and Netuitive are sold to the enterprise management team, an audience with which VMware has little to no experience.
From a product functionality standpoint there are several areas where products in the third party management tool ecosystem have advantages over vC OPS. The first is in the Infrastructure Performance Management category. This category consists of vendors who focus upon the infrastructure response time (IRT) or latency as a core part of their offerings. NetApp (Akorri) BalancePoint pioneered this category with a product that measured IRT (latency) from the server that contained the HBA to the spindle on the array and back. Virtual Instruments uses a TAP on the SAN to measure exchange completion times between VMware hosts and storage arrays independently of what hardware and software resides on either end of the SAN. Xangati measures IRT for the entire network that supports a VMware environment, something that is particularly useful in VDI environments where so much of the end user’s experience is dependent upon network issues. CA Virtual Performance also uses a network TAP to measure TCP/IP response time for both the physical and virtual network. No IRT functionality of this type is present in vC OPS.
The next area where VMware still has some gaps is in the lack of tight integration across the monitoring stack. Yes, Hyperic, vC Configuration Manager, vC OPS, vC CapacityIQ and vC AppSpeed together constitute an entire stack but these are five separate products whose only integration comes via the use of adapters in vC OPS Enterprise. Quest Software’s Foglight includes all of the virtualization specific functionality of vFoglight, but also includes deep network monitoring, deep server hardware monitoring, and is a fully functional APM solution with response time features and deep dive Java and .Net code analysis features. VMware still has an enormous amount of development to do, in order to take the databases and consoles for at the minimum vC Configuration Manager, vC OPS, and vC CapacityIQ and integrate 3 consoles and databases into one.
Both vC OPS Standard and Advanced implement their consoles in vCenter. This means that in order to use these consoles you have to have access to vCenter. In many companies the people supporting the environment are not the same people that are the virtualization Admin’s and who have day-to-day access to vCenter. Giving people access to vCenter whose primary role is problem troubleshooting and resolution can also give them access to other parts of vCenter and the opportunity to accidentally create problems. vC OPS Enterprise has a separate management console which means that for enterprise customers this will not be an issue.
There will likely be substantial room for third party vendors underneath the $50 per VM pricing umbrella of vC OPS. Many customers will simply choose more affordable solutions like Solarwinds (who acquired Hyper9), Veeam Monitor, Zenoss, and vKernel. It is important to remember that VMware has 250,000 customers, the vast majority of whom have no third party monitoring solution in place, and for whom $50 per VM might be too much money.
vC OPS relies heavily on the Integrien Alive technology for automated self-learning analytics. Prior to the acquisition of Integrien by VMware, Integrien and Netuitive were the two independent vendors of self-learning performance analytics. Netuitive is in use by over 350 companies including 48 large enterprises. Integrien did not achieve this kind of customer success prior to the acquisition by VMware, and the jury is therefore still out on how seasoned this technology is in terms of operational success in large scale enterprises.
Although AppSpeed is not part of vC OPS and AppSpeed is an applications performance management solution, not an infrastructure management solution, when one looks at the VMware monitoring stack, one has to look at AppSpeed in comparison with other APM vendors. On the APM front vendors like CA (Wily), dynaTrace, Quest Software, AppDynamics, BlueStripe, New Relic, and Coradiant all have significant product advantages over AppSpeed and more importantly significantly more progress on the front of customer adoption than does AppSpeed.
One of the very interesting frontiers in monitoring for virtualized and cloud based environments is to take advantage of the dynamic and elastic nature of the virtualized infrastructure to take actions based upon monitoring data to automatically heal the environment and ensure applications performance. We call this Service Assurance which is the idea that one should measure the service level (response time) of the applications running on the environment and then automatically take configuration or resource allocation actions to ensure those service levels. There is no fully complete Service Assurance solution on the market today, but VMTurbo has a great start with a product that can take automated actions based upon resource consumption data in the VMware environment. It would be very interesting to see VMware take the response time data from AppSpeed and integrate it with vC OPS and vCenter Orchestrator to address this opportunity.
The matrix below compares vCenter Operations against alternative solutions. This comparison matrix, along with detailed explanations of the criteria used in the matrix is also available in the newly updated Virtualization Practice Performance and Capacity Management for Virtualized Environments White Paper. It should also be noted that while VMware’s vC OPS products do not contain Chargeback and Application Topology Discovery features, VMware does have separate products that perform these functions. It should also be noted that both vCenter OPS Enterprise, and Netuitive have a wide range of adapters for other monitoring solutions. Therefore both products can integrate with just about anything that produces management or monitoring data, including business data like revenue per minute. The color of the dots below reflect just what these products integrate with natively, not what they can integrate with via third party products.
The single most important part of vC OPS is the the inclusion of the Integien technology. This means that VMware believes that in a dynamic environment manual and deterministic approaches to root cause do not work, and that interpretation of monitoring data needs to be automated with self learning analytics. We agree. If you are an enterprise class VMware customer and you do not have a monitoring solution in place then you should start by reading Virtualization Performance and Availability Monitoring – A Reference Architecture, and realize that you will end up with more than one monitoring solution, and that self-learning analytics is the only way to integrate them. You might want to also read our Performance and Capacity Management White Paper to get a complete picture of the issues and approaches for monitoring dynamic infrastructures like VMware vSphere.
vC OPS Enterprise certainly belongs on your short list as you embark upon your evaluation and investigation process. However, if you are going to look at VMware as a strategic vendor of systems management products, you should evaluate the entire stack (Hyperic, vC OPS Enterprise, AppSpeed), and then decide if you want to use the entire stack or make substitutions at the bottom and top layers of the stack. VMware has said that they are open to working with any vendor of monitoring data from the perspective of integrating a third party product via a vC OPS Enterprise Adapter. While there are no integrations with products in the VMware ecosystem today, an excellent test of VMware’s openness will be to see how fast these adapters get created in response to customer demand.
You should also take a look at constructing an alternative stack out of best of breed third party solutions. This will be the subject of an entirely separate post, but it would be entirely feasible to construct an alternative stack out of, for example Reflex Systems for configuration management, an Infrastructure Performance Management solution like the ones mentioned above, an APM solution like Quest Foglight, AppDynamics, BlueStripe, dynaTrace, New Relic, or Coradiant and Netuitive. If IPM is not your cup of tea, then something like Zenoss, Netuitive and the APM solution of your choice would work as well.