In “Comparing the Different Approaches to Application Performance Management for Virtualized and Cloud based Environments” we compared the different approaches to implementing APM for applications residing in virtualized and cloud based environments. In this post we take a deeper look at the key vendors in the space, and compare their offerings. The purpose of this deeper look is to help you decide which ones to put on your short list for further evaluation.
Why is APM for Virtualization and the Cloud an Important and Different Problem?
As we all know APM solutions have existed for years. The first generation of APM solutions focused upon measuring how much CPU and memory the processes that comprise the applications were using, and create normal bands of usage under the theory that everything was fine if the resource consumption was normal. We learned through bitter experience that this was not the case and focus turned towards knowing what the experience was of the users of the transactions. Script based synthetic transaction solutions from vendors like Keynote and Gomez (later acquired by Compuware) became an important part of the APM arsenal for business critical and performance critical applications. But this was also not enough.
It turned out that it was necessary to understand the behavior of the application for all of its users and all of its transactions. For certain classes of applications (ones custom developed to Java or .NET), a new generation of tools was brought to market by vendors like Wily Technology (later acquired by CA) which could actually monitor code running in production. This generation of technology is now the basis of products like CA Wily, IBM ITCAM, and HP Diagnostics.
Unfortunately for IBM, CA, and HP, the world has changed in significant ways. The following changes have created new requirements for how APM solutions must work, and what they must deliver:
- Applications have become much more distributed – even within a data center. As opposed to having 10 very large J2EE servers running very large, complicated and monolithic Java applications, many applications are now split into self-contained modules that run in their own JVM’s on their own servers.
- This distribution of applications has been facilitated by the power of commodity hardware, the functionality of open source deployment platforms like TomCat and vFabric, and the low price of these attractive hardware/software platforms.
- The modularization of applications has been further driven by Agile Development techniques that focus upon having small dedicated teams working on each module and driving its progress forward with frequent releases (often weekly or monthly).
- Many applications are now deployed on virtualized platforms (vSphere) or in public clouds which means that their deployment environment is both dynamic and potentially distributed.
- The combination of rapid changes to applications, dynamic run time environments and distributed run time environments creates a need for tools that self-configure and self-instrument the applications as there is no time for the humans to keep the tools up to date as the applications and their environments change.
- Distributing applications across data centers and clouds creates the need for APM products to work even if the agent and the management system live in entirely different network belonging to entirely different organizations.
- The decreasing cost and increasing capability of applications platforms (commodity hardware + virtualization + public clouds) has fueled the demand to solve more and more problems with software. Therefore we are seeing an astounding proliferation of applications. In fact new applications are being put into production at a faster rate than management solutions are being deployed to manage them.
- The above point leads to a need for the overall procurement, deployment, ownership, management, and configuration process for APM tools to be much cheaper and simpler than is the case with legacy first generation tools.
To meet these challenges it is critical to select an Applications Performance Management (APM) solution that has the following capabilities:
- Focus on Response Time: The single most important metric when measuring applications performance, and especially applications performance for applications running in virtual or cloud environments is applications response time. The reason for this is that this metric accurately reflects the service level that the application is delivering to its users, and it is therefore a metric that the applications owners will easily buy into as one that represents their concerns. It is therefore essential that you choose an APM solution that can measure response time for your applications in the most realistic (as close as possible to what the user actually sees) and comprehensive (across all tiers of the applications system) as possible. All of the solutions profiled in this article focus upon response time, so there is not column for this criteria in the table below, as it is met by all of the vendors.
- Deployment Method. This is where you have to make some difficult tradeoffs. The first tradeoff is an on premise solution vs a SaaS delivered solution (Monitoring as a Service). The advantage of MaaS is that you do not have to maintain the back end, and as the vendor adds features to the product, they just upgrade the back end and you get the new features. The advantage of an on-premise solution is that data about the performance of your business critical applications is not sent over the Internet to someone else’s data center.
- Data Collection Method. This and Supported Application Types (directly below) is where you make your tradeoff in the breadth of the applications that you can manage with your APM solution vs the depth of the analysis. You basically have three sources of data to choose from in a modern APM solution. The first choice is to collect the data from the network via a physical or virtual appliance that sits on a physical or virtual mirror port. The virtue of this approach is that it works for every application that you have – irrespective of how it was built, or whether it was built or purchased. The next choice is a modern transaction oriented agent inside the operating system. These are very different agents than the legacy agents that just capture resource utilization statistics. These agents capture the detail of how the applications interact with the OS, and how the processes that comprise the application communicate over the network that connects all of the servers that host the application. The last choice is to use an agent that lives in the application run time environment. This provides for the deepest level of diagnostics and transaction tracing, but only works for applications that are written to the specific run times supported by the APM vendor (you get depth, but you give up breadth).
- Supported Application Types: The APM solution has to work with and support your applications architectures. If you just have web based applications with Java middle tiers and database back ends there are many good tools to choose from. The more you diverge from HTTP/.NET/Java/SQL as the applications architecture the fewer tools there are to choose from. If your application has a proprietary front end (a Win32 client), a proprietary middle tier (maybe something written in COM+, or C++ ten years ago) and a database that no one supports then you need to look for at a tool that operates at the TCP/IP layer since instrumenting the application itself will likely be impossible. However, in so doing you will give up the insights into the business logic that Java and .Net aware tools provide.
- Application Topology Discovery: As your application will now be “dynamic” you will need a tool that can keep up with your application and its topology no matter how it is scaled out, and no matter where a component of the application is moved. This means that if the APM tool relies upon an agent, then that agent must travel with (inside) of the application so that as the application is replicated and moved, the agent comes up and finds its management system. It is also critical that these agents map the topology of the application system from the perspective of what is talking to what. Otherwise it will be impossible to troubleshoot a system with so many moving parts.
- Private/Hybrid/Public Cloud Ready: If you are thinking about putting all or a part of an application in a public cloud, then you need an APM solution that works when there is no LAN connection or VPN between the agent in the application and the management system. Polling agents that live in clouds will not work, as you cannot assume the existence of an inbound port to poll through. Therefore the agent needs to initiate the connection, open an outbound port back to the management system, and which then needs to be able to catch the incoming traffic in your DMZ. You also need a system that is able to map the topology of your application system across the data centers that it executes in.
- Zero Configuration Required: If you are an Agile Development shop, then it is essential that you choose an APM solution that can keep up with your rate of enhancement and change in the application. Essentially this means that you need a “zero-config” APM tool, as with a rapid rate of change in the application you will have no time to update the tool every time you do a release into production.
- Deep-Dive Java/.NET Diagnostics: For some really performance critical applications, being able to trace transactions through the layers of an application system can be invaluable when it comes to understanding end-to-end performance, but this capability is traded off against breadth of platform support. The bottom line is that you cannot have both the deepest possible visibility, and the broadest possible support for applications architectures in one product.
With the above criteria in mind, here is a comparison of some virtualization and cloud aware APM solutions:
|AppDynamics||On Premise/SaaS||Agent inside of the Java JVM
or the .NET CLR
|On Premise||Agent inside the Windows, Linux,
AIX or Sun Operating System
|All TCP/IP on Windows, Linus, AIX, or Sun OS||
|AppEnsure||On Premise||Agent inside the Windows and Linux Operating Sytems||All applications that run on Windows or Linux|
|Correlsense||On Premise||Agent inside the Windows, Linux,
AIX or Sun Operating System
|All TCP/IP on
Windows, Linus, AIX, or Sun OS
|dynaTrace (Compuware)||On Premise||Agent inside of the Java JVM
or the .NET CLR
|Java/.NET, Websphere Message Broker CICS, C/C++||
|ExtraHop Networks||On Premise||From a mirror port on a physical switch or
the vSphere vSwitch
|All TCP/IP regardless of platform|
|SaaS||Agent inside of the Java JVM,
NET CLR, or the PHP/Python runtime
|On Premise||Mirror port on the vSphere vSwitch and an
agent inside the Java JVM
AppDynamics is a Java/.NET APM solution based upon an agent that does byte code instrumentation for Java and .Net based applications. AppDynamics is different from the first generation of Java APM solutions in that it installs and works out of the box, it is designed and priced for applications scaled out across a large number of commodity servers, and it includes cloud orchestration features designed to automate the process of adding instances of the application in a public cloud based upon sophisticated and intelligent rules.
BlueStripe FactFinder is based upon an agent that lives in the Windows or Linux OS that supports the application. This agent watches the network flow between that OS and everything that it is talking to. Through this process FactFinder discovers the topology map of the applications running on each set of monitored servers, and calculates an end-to-end and hop-by-hop response time metric for each application. Since FactFinder is in the OS and not in the application, FactFinder is able to calculate these response time metrics for any application that runs on a physical or virtual instance of Windows or Linux. This makes FactFinder into the only product that provides this level of out of the box functionality for such a breadth of applications.
Correlsense also makes use of agents that live in the operating system, and which use interactions between the application and the OS to map application topologies and time transaction performance. This is another solution that provides excellent end-to-end and hop-by-hop application response time and transaction response time across a very wide range of applications and runtime environments.
dynatrace is a Java and .NET APM solution that is differentiated in its ability to trace individual transactions through complex systems of servers and applications. This is a different level of tracing than just understanding which process on a server is talking which process on another server – it truely means that individual transactions can be traced from when they hit the first Java or .NET server until they leave the last one in the system (usually to hit the database server). This tracing combined with in depth code code level diagnostics via byte code instrumentation is what distinguishes dynatrace. Dynatrace is also the only vendor that can trace individual transactions from their inception in the user’s browser through the entire application system.
ExtraHop Networks uses a mirror port on either the physical network or a mirror port on the VMware vSwitch to see all of the network traffic that flows between physical and virtual servers. This source of data means that ExtraHop can see application topologies and measure end-to-end response time for every TCP/IP based application on your network without requiring the installation of any agents in applications, JVM, virtual servers, or physical servers.
New Relic pioneered the Monitoring as a Service category by being the first APM vendor to offer robust APM functionality on a SaaS (or more accurately MaaS) basis. The product is truly plug and play, all you do is sign up, install the agent in your Ruby, Java, .NET or PHP application and then log onto a web console that points back to New Relic’s hosted back end of the monitoring system.
VMware vFabric APM is based upon a virtual appliance that collects data from a promiscuous port on the vSwitch (or Nexus 1000v) in the VMware host and a new agent that lives inside of the Java virtual machine that hosts your web/java/database application. vFabric APM is therefore a combination of some breadth in application support, as with the virtual appliance approach it can see all TCP/IP traffic on the virtual networks, and with the Java agent it can see deeply into the performance of the actual applications. VMware will also be buidling automatic remediation into vFabric APM so that when issues occur they can be automatically addressed. The issues with vFabric APM is that is only works for applications written to the vSphere platform, which means of course that it does not support applications running on physical hardware either.
As business critical applications move into production virtualized environments, the need arises to ensure their performance from a response time perspective. Legacy Applications Performance Management tools are in many cases not well suited to make the jump from static physical systems, to dynamic virtual and cloud based systems. For these reasons enterprises need to consider new tools from vendors that have virtualization aware and cloud aware features in their APM solutions. Vendors like AppDynamics, BlueStripe, Corellsense, ExtraHop Networks, dynatrace, New Relic, and VMware (AppSpeed) are currently leading this race to redefine the market for APM solutions.