VMware has made it known for quite some time that virtualization, private clouds (IT as a Service), hybrid clouds, and public clouds will create the need for a new management stack, and that VMware intends to be an aggressive supplier of such a new management stack. However, what VMware has never before said is precisely what would be different about this new management stack (other than it explicitly supporting vSphere) than all of the other management stacks that have existed for all of the other computing platforms in the world.
Well in his keynote at VMworld 2011, the CEO of VMware, Paul Maritz, articulated a simple and profound strategy for re-inventing how computer systems are managed:
- The Old Way – First Monitor, then when something goes wrong, Alert, and then manually Respond to the issue.
- The New Way – Monitor, then when something goes wrong, automatically Respond (fix it automatically), and then Alert to notify the humans that it has been fixed.
- The problem must be accurately identified. For example the problem of an degradation in the response time of one application is different from the problem of the degradation in the response time of many or all applications. The degradation of response time on average is a different problem than the increase in the variation of response times even with the average remaining the same. An increase in the error rate for the transactions in the application is likely a completely different problem than is anything having to do with response time.
- Once the problem had been identified, the most likely cause of the problem needs to be determined. Problems roughly fall into two categories – those that are in the application itself, and those that are in the infrastructure that supports the application. Application level problems are easily found for modern applications with tools that instrument code in production like AppDynamics, New Relic, and dynaTrace (now part of Compuware). However finding issues in the infrastructure that are causing issues in application response time is very difficult due to how hard it is to tie specific issues with applications to specific behaviors in infrastructure.
- Monitoring applications performance is easily done by a modern APM solution that is build for dynamic environments. Good choices include AppDynamics, New Relic, BlueStripe, ExtraHop, and dynaTrace (now part of Compuware).
- Taking action automatically is not hard. VMware has already built substantial automation features that are accessible via API’s in to vSphere. Enterprise focuses IT as a Service vendors like DynamicOps, Embotics, Platform Compuiting, Nimbula, and Gale Technologies all have substantial orchestration capabilities in their solutions.
- Notification is not a hard problem and is solved in a variety of ways in virtually every product that plays a role here.
- It is the Decision Engine where the rubber meets the road. This is where the decision is made as to what automated action should be taken to try to fix the problem that is at hand. If you notice the loop that starts with the first “No, the problem is not solved” you can probably envision the disastrous consequences of a bad decision (ever heard of “vMotion sickness”).
- VMware deserves an enormous amount of credit for simply stating that IT Operations needs to be reinvented around automation, for building the ability to programatically control vSphere so as to guarantee resource levels to certain workloads, for committing to identify the workloads running in VM’s, and for releasing vCenter Operations with a root cause capability based upon the stochastic technology acquired from Integrien.
- Netuitive is the only independent vendor with a self-learning performance management capability that can be applied to Service Assurance. A logical way to construct an independent Service Assurance solution would be to combine the APM solution that fits your applications with Netuitive and possibly your choice of an enterprise focused private cloud management vendor.
- In order for APM to play a role in Service Assurance, APM solutions need to be significantly modernized with respect to what most enterprises have installed today. Legacy APM solutions that are expensive, hard to install, and require constant manual re-configuration simply do not fit the bill here. Look to vendors like AppDynamics, AppFirst, New Relic, Extrahop, BlueStripe, and dynaTrace/Compuware for modern solutions that provide fast time to value and low cost and effort of ownership.
- Several vendors of private cloud (IT as a Service) management solutions have already implemented significant service assurance functionality in their solutions. Platform Computing, Abiquo and Gale Technologies all fall into this category.
- In a category of its own is VMTurbo, which is the only management solution for vSphere that attempts to package up a full Service Assurance capability in one product. VMTurbo is able to identify the constrained resources in a vSphere environment, and is able to ensure that the applications that are the most important workloads get priority access to those constrained resources. VMTurbo does not yet have the ability to act upon application response time, but this will likely occur in partnership with one or more APM vendors.