It is now very clear the VMware vSphere 4.0 and 4.1 have demonstrated the robustness and performance necessary in order for them to be trusted virtualization platforms for many business critical applications. It is also very clear that many organizations are well down the road toward putting business critical applications on vSphere. We may not yet be at the point where the most response time critical applications (like online trading) are on vSphere, but we are certainly at the point where line of business applications like SAP and enterprise resident CRM applications are being virtualized.
We are also at the point where many of the custom built applications that most enterprises rely upon to run large numbers of their business processes are going onto the vSphere platform. Unlike purchased applications like SAP where if they do not work the enterprise can yell at their SAP consultant or SAP, with custom developed applications the enterprise applications support team has to rely upon their own expertise and the applications performance management tools that they have purchased to manage the applications.
The single most important question that needs to be addressed when these types of applications get virtualized is whether or not the way the application is managed needs to change because the application has moved from physical hardware to a virtualization platform like vSphere. To address this question we need to understand what is different about managing an application on a virtual platform like vSphere as opposed to a physical set of hardware. The principal differences are:
- The management agent that used to be in the OS to monitor resource utilization may longer be present. Whether it is present or whether its function has been replaced by a product that gets the same data from vCenter is really irrelevant as in a virtualized system one cannot infer the performance of an application in a VM from how that application is using resources in the VM.
- The application will no longer be running on its own dedicated hardware. It will be sharing hardware with either other instances of the application, other applications or both.
- What is running with the application at any moment in time on the same hardware can change and be changed automatically by things like DRS, HA, and DPM.
- The application may well become elastic with instances of the application being rapidly added and subtracted as load shrinks and grows.
- The topology of the application will not be fixed, but will change as VM’s are moved between hosts and as the application scales up and down.
- It might even be the case that while most of the application runs inside of the four walls of the enterprise’s data center, portions of it may run (perhaps only at moments of time) in public clouds.
- The staff to manage the application in production is not going to grow just because the application’s execution environment has become more complex, dynamic, elastic, and abstracted from its hardware.
- If your operations are agile and dynamic, it is entirely possible that your development is Agile as well, which means that your application is likely undergoing a high rate of enhancement and change. This alone can call for a new approach to applications performance management.
To meet these challenges it is critical to select an Applications Performance Management (APM) solution that has the following capabilities:
- Focus on Response Time: The single most important metric when measuring applications performance, and especially applications performance for applications running in virtual or cloud environments is applications response time. The reason for this is that this metric accurately reflects the service level that the application is delivering to its users, and it is therefore a metric that the applications owners will easily buy into as one that represents their concerns. It is therefore essential that you choose an APM solution that can measure response time for your applications in the most realistic (as close as possible to what the user actually sees) and comprehensive (across all tiers of the applications system) as possible.
- Breadth of Applications Support: The APM solution has to work with and support your applications architectures. If you just have web based applications with Java middle tiers and database back ends there are many good tools to choose from. The more you diverge from HTTP/.NET/Java/SQL as the applications architecture the fewer tools there are to choose from. If your application has a proprietary front end (a Win32 client), a proprietary middle tier (maybe something written in COM+, or C++ ten years ago) and a database that no one supports then you need to look for at a tool that operates at the TCP/IP layer since instrumenting the application itself will likely be impossible. However, in so doing you will give up the insights into the business logic that Java and .Net aware tools provide.
- Application Topology Discovery: As your application will now be “dynamic” you will need a tool that can keep up with your application and its topology no matter how it is scaled out, and no matter where a component of the application is moved. This means that if the APM tool relies upon an agent, then that agent must travel with (inside) of the application so that as the application is replicated and moved, the agent comes up and finds its management system. It is also critical that these agents map the topology of the application system from the perspective of what is talking to what. Otherwise it will be impossible to troubleshoot a system with so many moving parts.
- Private/Hybrid/Public Cloud Ready: If you are thinking about putting all or a part of an application in a public cloud, then you need an APM solution that works when there is no LAN connection or VPN between the agent in the application and the management system. Polling agents that live in clouds will not work, as you cannot assume the existence of an inbound port to poll through. Therefore the agent needs to initiate the connection, open an outbound port back to the management system, and which then needs to be able to catch the incoming traffic in your DMZ.
- Little to No Configuration Required: If you are an Agile Development shop, then it is essential that you choose an APM solution that can keep up with your rate of enhancement and change in the application. Essentially this means that you need a “zero-config” APM tool, as with a rapid rate of change in the application you will have no time to update the tool every time you do a release into production.
- Transaction Tracing: For some really performance critical applications, being able to trace transactions through the layers of an application system can be invaluable when it comes to understanding end-to-end performance, but this capability is traded off against breadth of platform support. The bottom line is that you cannot have both the deepest possible visibility, and the broadest possible support for applications architectures in one product.
With the above criteria in mind, here is a comparison of some virtualization and cloud aware APM solutions:
|On Premise||All TCP/IP on|
Windows or Linux
|Optier BTM||On Premise||Broad Range of|
AppDynamics is a Java/.NET APM solution based upon an agent that does byte code instrumentation for Java and .Net based applications. AppDynamics is different from the first generation of Java APM solutions in that it installs and works out of the box, it is designed and priced for applications scaled out across a large number of commodity servers, and it includes cloud orchestration features designed to automate the process of adding instances of the application in a public cloud based upon sophisticated and intelligent rules.
BlueStripe FactFinder is based upon an agent that lives in the Windows or Linux OS that supports the application. This agent watches the network flow between that OS and everything that it is talking to. Through this process FactFinder discovers the topology map of the applications running on each set of monitored servers, and calculates an end-to-end and hop-by-hop response time metric for each application. Since FactFinder is in the OS and not in the application, FactFinder is able to calculate these response time metrics for any application that runs on a physical or virtual instance of Windows or Linux. This makes FactFinder into the only product that provides this level of out of the box functionality for such a breadth of applications.
dynatrace is a Java and .NET APM solution that is differentiated in its ability to trace individual transactions through complex systems of servers and applications. This is a different level of tracing than just understanding which process on a server is talking which process on another server – it truely means that individual transactions can be traced from when they hit the first Java or .NET server until they leave the last one in the system (usually to hit the database server). This tracing combined with in depth code code level diagnostics via byte code instrumentation is what distinguishes dynatrace.
New Relic pioneered the Monitoring as a Service category by being the first APM vendor to offer robust APM functionality on a SaaS (or more accurately MaaS) basis. The product is truly plug and play, all you do is sign up, install the agent in your Ruby, Java, .NET or PHP application and then log onto a web console that points back to New Relic’s hosted back end of the monitoring system.
OPNET App Xpert is a product line that includes end user experience monitoring, deep dive analysis into Java and .NET applications, and transaction analysis. The product includes virtual appliances that use the promiscous port on virtual switches to collect response time data from applications running in VM’s, and the Java/.NET agents in the product use fully public cloud aware methods to phone home to their management systems.
Optier is also focused upon tracing individual transactions, but across a broader range of operating systems and middleware than any other vendor. Optier is the only vendor that can trace transactions through complex systems that might involve web front ends, Java middle tiers, Tuxedo or other enterprise service bus middleware, IBM MQ or other messaging middleware, and even CICS mainframes.
Quest Foglight is a broad and deep APM solution that uses an appliance to capture HTTP transactions as they enter the data center and combines this with deep byte code instrumentation of Java and .NET middle tiers and the market leading tools for analyzing the performance of Oracle and Microsoft SQL Server databases.
VMware AppSpeed is based upon a virtual appliance that collects data from a promiscuous port on the vSwitch (or Nexus 1000v) in the VMware host. AppSpeed these uses protocol decoding to understand applications level performance for HTTP, Java, .NET and database applications. AppSpeed is a VMware vSphere specific product and is therefore limited to understanding the performance of applications components running on vSphere, as well as their interactions with adjacent physical resources. Since AppSpeed does not use an agent in the Java or .NET application, AppSpeed cannot provide the level of diagnostics that agent based solutions can provide, but this is traded off against the advantages of not having a management agent running in the VM to begin with.
As business critical applications move into production virtualized environments, the need arises to ensure their performance from a response time perspective. Legacy Applications Performance Management tools are in many cases not well suited to make the jump from static physical systems, to dynamic virtual and cloud based systems. For these reasons enterprises need to consider new tools from vendors that have virtualization aware and cloud aware features in their APM solutions. Vendors like AppDynamics, BlueStripe, dynatrace, New Relic, OPNET, Optier, Quest, and VMware (AppSpeed) are currently leading this race to redefine the market for APM solutions.