Toward Converged Virtualization Management Suites

The infrastructure monitoring, application monitoring, cloud management and image provision sectors of the virtualization management space have been going through an extremely rapid evolution in the last few months. This makes it useful to take a look at these four aspects of virtualization management (leaving out security and data protection) in the context of each other.

The short answer is that virtualization and cloud are driving significant changes in how systems and applications should be provisioned and managed, and agile and forward thinking management vendors are driving some of those changes in their products.

Where Are We Going to End Up?

When describing a long and winding road, it is useful to first describe the destination at the end of the road. This is important because if we have the destination in mind, we can better plan our journey along the road. To this end, here is a proposed destination:

  • A user chooses a raw OS image (IaaS), a development environment (PaaS) or a complete n-tier application from a service catalog. This could be a new instance of any of the above, or a new user of any of the above (the distinction is very important). This choice is made in a portal created by a Cloud Management solution.
  • The images required to deliver whatever the user has ordered are provisioned on the fly by an Image Provisioning System. This of course requires integration between the Image Provisioning System and the Cloud Management solution like what Abiquo has recently announced with Opscode Chef.
  • When the user chooses the service, along with the choice of the service the user is presented with several price/performance options. For example if response time has to be 500 milliseconds on average with 99% of the transactions being less than 700 MS this comes at one price. A less stringent response time SLA is less expensive. This is implemented in a modern virtualization and cloud aware APM solution that understands the tradeoffs between price, performance, and transaction load. The APM product feeds these tradeoffs to the Cloud Management Solution. The APM solution also enforces compliance with the SLA by integrating with monitoring products at the infrastructure layer and directly with the virtualization or cloud platform.
  • Infrastructure Performance Management solutions are used to allow the IT Operations team to continuously monitor every infrastructure request for end to end latency. These latency numbers are then fed to Operations Management products and to APM products so that SLA’s can be automatically enforced.
  • Resource and Availability Management solutions continuously ensure that resource bottlenecks and configuration changes are not not causing issues with applications performance. This requires integration of these solutions with the Cloud Management Solutions, the APM Solutions, and the Infrastructure Performance Solutions.
  • Real Time Self-Learning Performance Analytics are a crucial layer in this stack, as the sources of data about the performance of these systems are too numerous and varied in their behavior to allow for manual thresholding (which leads to a blizzard of false alarms). The long term key will be for these solutions to be able to detect emerging anomalies, and then automatically signal an Operations Management solution to take the correct preventative action.

Who is Who in the Converged Management Ecosystem?

The diagram below shows who the key vendors are in these categories. It is notable that VMware has products in each of the categories except Infrastructure Performance Management which will be addressed in the VMware strategy section below.
Virtualization Management Overview

 VMware’s Strategy

The great news for customers who use virtualization, private cloud and public cloud platforms in their businesses is that VMware has a compelling strategy driving its efforts on all of these fronts. That strategy is to automate as much of IT Operations as possible, leading to dramatic OPEX savings, and dramatic increases in IT and business agility. Let’s look at each of the five categories above from the perspective of VMware’s IT Operations Automation strategy:

  • App Director will automate the process of building images, using predefined application templates to do so.
  • vCloud Director will automate the provisioning of services order by users
  • vSphere App Discovery will automatically discover applications that show up in the infrastructure and automatically map their topology.l
  • vFabric APM will instantiate monitoring of these applications
  • vCenter Operations will automatically bubble the thousands of metrics collected by vSphere up into Health, Risk and Efficiency scores, and automatically ensure that infrastructure service levels around these scores are met. When vFabric APM is integrated into vC OPS, automated compliance with APM level SLA’s (response time) will be possible.
So does VMware have any gaps? There are two conspicuous ones. The first is that VMware’s view of the performance of its infrastructure is not based upon a comprehensive understanding of continuous and real time latency. This is what the Infrastructure Performance Management vendors specialize in, and this is a notable gap in VMware’s product offerings. The second is that VMware’s management offerings are tied to its vSphere virtualization platform.
VMware’s focus upon just its virtualization platform is a real problem. It is a real problem for two reasons. The first is that the reason that management is so screwed up in the physical world is that customers have purchased overlapping management products for each environment that they own. So a full set of different things were often purchased for Solaris, Linux and Windows, and different sets of things were purchased for web servers, Java servers, .NET server and database servers. This lead to a proliferation of tools by platform. Virtualization represents an opportunity to unify this mess, and end up with far fewer tools. The second problem is related to the first. Most customers believe that they are going to end up with more than one virtualization platform, and do not want to repeat the mistakes made in the physical world by implementing separate management stacks for each virtualization platform that they deploy.

Constructing a Best of Breed Alternative to VMware’s Management Stack

While VMware has certainly taken a position of strategic leadership in Virtualization Management and Cloud Management, for the reasons mentioned above (primarily a focus just upon VMware’s own vSphere platform), some enterprises might choose to construct their own cross-platform management stack out of best of breed alternatives, and rely upon these vendors to integrate their solutions with adjacent solutions, or do that integration via API’s and scripts themselves.

In effect what you will then be trying to build for yourself is the diagram below.

Reference Architecture for Converged Virtualization Management  

Virtualization Management Layers and Functions1

When constructing your own Converged Virtualization Management solution it is critical to focus upon the following considerations:

  • If lack of support for other virtualization platforms on the part of VMware is the primary reason to go down this route, you need to take that line of thinking to its logical conclusion. That conclusion is that not everything is going to be virtualized, and that the new converged management platform has to support provisioning, automated SLA compliance, and self-service across, virtual, cloud and physical deployment scenarios.
  • While “framework” is a dirty word in the management business, the fact of the matter is that you will need one. You will in fact probably need three. You will need one to manage the overall availability of the entire hardware and software stack. This is your chance to throw out the legacy frameworks you are saddled with now, and start over with something like Zenoss. You will need something to manage performance from your applications to the spindles on your arrays – something that focuses upon response time and latency not just resource utilization.
  • While you can get an availability management framework to deal with all of the blue layers (Storage trough Applications) in the above diagram (Zenoss) and you can get a Cloud Management solution that handles physical, virtual, and public clouds (Platform Computing, DynamicOps, Abiquo, and Gale Computing) you are not going to get real time, continuous and comprehensive performance management across all of those layers in one solution. This is where you will need to focus upon multiple solutions and integrate their data via self-learning performance analysis statistics.
  • Since these are the early days of assembling best of breed component products into an automated operations suite, you will need to pay very careful attention to who partners with and integrates with whom. Netuitive integrates with just about every source of performance management data from just about every product that is widely used to monitor business critical applications. The new VMware VC Operations Enterprise Plus product has this capability as well, but recent pricing and packaging actions on the part of VMware make it seem that VMware is going to focus just on vSphere with VC OPS Enterprise, and not focus upon Enterprise Plus. On the cloud management front, Abiquo has announced an integration with Opscode Chef – and there will likely be further partnerships between the Image Provisioning and Cloud Management vendors in the near term.
VMware has articulated and is starting to deliver on a compelling strategy of Automated Operations for its virtualization and cloud platforms. This will precipitate profound changes in the vendor ecosystem as third party vendors partner up and acquire in order to come up with the same depth of functionality that VMware is offering, but on a broader set of platforms (Quest buying VKernel is just the start of this process).