One of the important questions that we should all frequently ask ourselves is, “How will virtualization and cloud computing be different this year and next year than they have been in the past”? One of the answers to those questions involves the kinds of applications that you are virtualizing, and/or putting in clouds (public or private). The short version of the answer is that the applications that are left to virtualize, are for the most part, very different from the applications that have been virtualized to date.
This is illustrated in the diagram below, that shows how far VMware has progressed with virtualizing certain kinds of applications in its customer base with its flagship product vSphere. VMware deserves a lot of credit for the progress shown between January of 2010 and April 2011 (this diagram was provided as a part of the vSphere 5 launch materials).
But it is also instructive to look what has not been virtualized yet. Over half of the MS SQL database servers in VMware’s customer base have not been virtualized yet. 66% of the Oracle database servers in VMware’s customer base have not been virtualized yet. Missing from this graph is custom developed applications – many of which are performance critical, and are probably tied to those database servers that are not virtualized yet. So it is fair to say, that in order for a substantial part of what is not virtualized (or running in a cloud) to get virtualized, the question of how to virtualize performance critical applications must be addressed.
The Importance of Applications Performance Management
In order to successfully virtualize performance critical applications, you will have to overcome the political resistance of the owners of those applications to their virtualization. The resistance is political in nature as it is based upon the perception that with physical hardware the application has more control and less risk, and will therefore encounter fewer applications issues than if the application is running in a shared and virtualized environment.
In order to combat this perception, the team that owns and manages the virtualized environment must step up and guarantee the performance of these applications. And, let’s be clear, performance means response time, not how much resource those applications are using. Large enterprises are going to have to organize around this need and create a whole new function, Applications Operations, whose sole purpose to to operate the applications in production.
Criteria for a Virtualization and Cloud Aware APM Solution
In order to guarantee the true performance (again response time) of applications in production, you will have to measure the end-to-end response time of those applications in production. And you will have to do this an Application Performance Management (APM) solution that is built for virtualization and the cloud. Specifically, your APM solution should have the following attributes:
- Automatic Application and Transaction Topology Discovery and Mapping. In a virtualized or cloud based world, where various components of your applications are running may be changing rather frequently. They may move between servers in your data center, or they may move from one of your data centers to a cloud based upon load. Your APM solution must be able to keep up with these changes automatically without your having to do any manual configuration.
- End-to-End and Hop-by-Hop response time measurement, down to the transaction level. This is the single most important feature of an APM solution, and it needs to work just as automatically as the topology discovery above. The bottom line is that your application owner is going to think about application performance in response time terms, and complain about application performance in response time terms, and your had better be able to measure it in response time terms.
- Cloud Friendly Communications Architecture. This means that if there multiple components (for example agents and a back end) that comprise the APM system, that the components can communicate across data centers and from clouds back into data centers without undue firewall work. Specifically in the case of agents, the agent should open the connection to the back end over ports like 80/443 so that it is unnecessary to open ports inbound into your application. Management systems that poll agents over local subnets are “old school” and should be discarded.
- Appropriately trade off Depth of Analysis vs Breadth of Applications Support. Appropriate in this case means appropriate to your set of applications. If all of your applications are custom developed in Java or .NET then a deep dive agent based solution is appropriate. If most of your applications are purchased and written in who knows what, then an approach that is based upon an OS agent, or the network would work best.
The Different Approaches to Application Performance Management for Virtualization and the Cloud
Depending upon your environment (just virtualization, virtualization and cloud, or just cloud) and depending upon the mix of the applications for which you will need to guarantee. The table below compares and contrasts these approaches. A brief summary of each solution is below the table:
|In-Depth Code Level Diagnostics||Measures|
|Legacy Agents that just measure|
|All applications that run on the supported OS’s of the products||CA, IBM, HP, BMC|
|Legacy APM Solutions||Java/.NET||CA, IBM, HP, BMC|
|Agent-less deep packet inspection|
TCP/IP networks via tap or mirror port
|All applications that communicate over TCP/IP||VMware vFabric APM,|
|Agent in the application or container||Typically Java and .NET.||AppDynamics, New Relic,|
aware OS agents
|Typically all Windows and Linux based applications|| BlueStripe,|
Legacy Resource Utilization Monitoring Agents. This is the “old school” of APM and is definitely not recommended for any modern virtualized or cloud based systems. The bottom line is that this approach does not measure response time, and since it does not measure what your application owners and end users care about, it is useless.
Legacy APM Solutions. The first generation of true APM solutions were acquired by CA, IBM, HP, and BMC and are still being sold by these companies. These solutions can do true APM for Java and .NET applications in production, but they are typically very expensive to buy, very hard to implement and maintain, and have not been modernized for dynamic and multi-location application systems.
Agent-less Network Based APM. This is one of the most promising new developments in the industry, as this approach is pretty much guaranteed to work for all of your applications with little to no application level configuration or customization required. This approach gives you almost all of the advantages of a modern agent inside of the application, but since it is agent-less can be implemented far more easily. This approach is especially well suited for situations where you have to support a diverse set of applications – some purchased and some custom developed (with the custom developed ones being written in many other things besides Java and .NET).
Modern Agents in Application Containers. These are the modern Java and .NET oriented tools. They feature extremely deep code level diagnostics, work very well in modern virtualized and cloud based environments are increasing are branching out to support more than just .NET and Java. These tools are very appropriate for performance critical in house developed custom applications.
Transaction Aware OS Agents. This is a completely new category of APM solutions that focus upon inserting an agent in between the OS and the application or its container. These solutions give you great end-to-end response time visibility across a very broad range of applications. The tradeoff here is that they do require an agent, but they support every Windows, Linux and even sometimes Unix based applications.
If you are going to try to virtualize performance critical applications in 2012, you should arm yourself with a tool that can measure how those applications perform in the eyes of their end users – which is their end-to-end response time. The approach you take should be a function of the mix of applications you have to support – including whether they are purchased or custom developed and if custom developed with what language or framework.