When VMware announced its new management strategy (monitor – fix automatically – notify the humans), at VMworld Las Vegas, that strategy was incomplete. It was incomplete because the thing that needs monitoring to ensure service quality is the applications that deliver those services. At VMworld Europe, VMware completed the strategy by announcing vFabric Application Performance Manager (APM), and clearly tying issues with applications to automated remediation in the infrastructure.
It is impossible to overstate the long term consequences of this strategy by VMware (assuming it is successfully executed over time) upon the management software industry, and how those products are used by customers. VMware has already succeeded in getting large numbers of companies to adopt its virtualization platform which replaced static and dedicated systems with dynamic and share systems that are inherently more manageable.
The balance of this post will focus upon how these changes will impact the APM industry, and how customers will use, and therefore should evaluate APM solutions. As we work our way through these changes, the way in which APM products will get used will change in the following ways:
- Today, most companies only use APM products for their most important applications. In “Why is Application Performance Management so Screwed Up?“, we explained the reasons for this – most of which are caused by legacy APM vendors making their products too hard to buy, deploy, customize, and maintain. The first change is that a new generation of APM vendors have arrived (detailed below) who are determined to address these deployability and usability issues.
- The point above means that we are going to move from APM being something that most companies only use for the most important 5% of their applications, to something that they use for all of their applications. This is because once applications move into dynamic and shared environments (virtualization, IT as Service, and Private Cloud), it will become essential to manage their performance in terms of end-t0-end response time.
- APM tools will become the “canary in the coal mine” as the trigger for dynamic operations. The importance of this cannot be stressed enough. Monitoring resource utilization as a proxy for application performance has proven to be woefully inadequate for the purposes of ensuring application performance. The only way to truly understand application performance is to measure it in end-to-end response time terms.
- Once application response time becomes the metric within APM tools which makes them in to the canary in the coal mine, and the trigger which causes automated actions, the hard work of tying response time based SLA violations to automated remediation will start.
- The linkage between a response time based SLA violation and an automated action will come in one of two forms. Some of these linkages will be deterministic or rule based (if a response time based SLA gets violated immediately after a new image goes into production, roll back to the previous image automatically). The more difficult linkages will have to be derived via the automated self-learning technology that VMware acquired with Integrien, and that Netuitive offers as a market leading self-learning performance analytics solution.
As APM solutions evolve to meet these new use cases, the criteria by which to evaluate them will change. How various products meet these criteria will be the subject of numerous follow on posts. However the list below is a good starting point. The one thing that is abundantly clear at this point and that is that legacy APM solutions from CA, IBM, HP, and BMC are completely unable to meet these criteria today, and are highly unlikely to undergo the significant changes required to meet these new challenges. To meet these challenges it is critical to select an Applications Performance Management (APM) solution that has the following capabilities:
- Focus on Response Time: The single most important metric when measuring applications performance, and especially applications performance for applications running in virtual or cloud environments is application response time. The reason for this is that this metric accurately reflects the service level that the application is delivering to its users, and it is therefore a metric that the applications owners will easily buy into as one that represents their concerns. It is therefore essential that you choose an APM solution that can measure response time for your applications in the most realistic (as close as possible to what the user actually sees) and comprehensive (across all tiers of the applications system) as possible.
- Breadth of Applications Support: The APM solution has to work with and support your applications architectures. If you just have web based applications with Java middle tiers and database back ends there are many good tools to choose from. The more you diverge from HTTP/.NET/Java/SQL as the applications architecture the fewer tools there are to choose from. If your application has a proprietary front end (a Win32 client), a proprietary middle tier (maybe something written in COM+, or C++ ten years ago) and a database that no one supports then you need to look for at a tool that operates at the TCP/IP layer since instrumenting the application itself will likely be impossible. However, in so doing you will give up the insights into the business logic that Java and .Net aware tools provide.
- In-depth Code Analysis. If you trade off an ability to monitor every application in production, for an ability to monitor applications written to a specific application framework or run-time environment, you will get in return a product that can provide deep-dive diagnostics which will help your development team dramatically improve code quality in production.
- Application Topology Discovery: As your application will now be “dynamic” you will need a tool that can keep up with your application and its topology no matter how it is scaled out, and no matter where a component of the application is moved. This means that if the APM tool relies upon an agent, then that agent must travel with (inside) of the application so that as the application is replicated and moved, the agent comes up and finds its management system. It is also critical that these agents map the topology of the application system from the perspective of what is talking to what. Otherwise it will be impossible to troubleshoot a system with so many moving parts.
- Private/Hybrid/Public Cloud Ready: If you are thinking about putting all or a part of an application in a public cloud, then you need an APM solution that works when there is no LAN connection or VPN between the agent in the application and the management system. Polling agents that live in clouds will not work, as you cannot assume the existence of an inbound port to poll through. Therefore the agent needs to initiate the connection, open an outbound port back to the management system, and which then needs to be able to catch the incoming traffic in your DMZ.
- Little to No Configuration Required: If you are an Agile Development shop, then it is essential that you choose an APM solution that can keep up with your rate of enhancement and change in the application. Essentially this means that you need a “zero-config” APM tool, as with a rapid rate of change in the application you will have no time to update the tool every time you do a release into production.
- Transaction Tracing: For some really performance critical applications, being able to trace transactions through the layers of an application system can be invaluable when it comes to understanding end-to-end performance, but this capability is traded off against breadth of platform support. The bottom line is that you cannot have both the deepest possible visibility, and the broadest possible support for applications architectures in one product.
- Automation Triggers. This is the key capability that will allow APM tools to transition from being a “monitor” to being the start of an automated problem resolution process. This means that you can create a set of conditions in the APM solution that when met or violated, trigger a set of actions either inside of the APM tool, or a set of calls to an external environment (like VMware vSphere).
With the above criteria in mind, here is a comparison of some virtualization and cloud aware APM solutions:
|AppDynamics||On Premise or SaaS||Java/.NET|
|On Premise||All TCP/IP based Applications|
|On Premise||Any app based on Oracle, SQL Server,|
|ExtraHop||On Premise||All TCP/IP based Applications|
|VMTurbo||On Premise||Any app with a measurable|
|On Premise||Java for deep Diagnostics|
All TCP/IP for basic response time
AppDynamics is a Java/.NET APM solution based upon an agent that does byte code instrumentation for Java and .Net based applications. AppDynamics is different from the first generation of Java APM solutions in that it installs and works out of the box, it is designed and priced for applications scaled out across a large number of commodity servers, and it includes cloud orchestration features designed to automate the process of adding instances of the application in a public cloud based upon sophisticated and intelligent rules.
BlueStripe FactFinder is based upon an agent that lives in the Windows or Linux OS that supports the application. This agent watches the network flow between that OS and everything that it is talking to. Through this process FactFinder discovers the topology map of the applications running on each set of monitored servers, and calculates an end-to-end and hop-by-hop response time metric for each application. Since FactFinder is in the OS and not in the application, FactFinder is able to calculate these response time metrics for any application that runs on a physical or virtual instance of Windows or Linux. This makes FactFinder into the only product that provides this level of out of the box functionality for such a breadth of applications.
Confio IgniteVM focused upon the database layer of the application system. IngiteVM provides unique visibility into the peformance of your database transactions and then cross-correlates that with visibility of the latency of the vSphere environment all of the way down to the storage layer. This allows the DBA to answer one of the hardest questions – “Is it my database, or is it the storage array”?.
dynatrace is a Java and .NET APM solution that is differentiated in its ability to trace individual transactions through complex systems of servers and applications. This is a different level of tracing than just understanding which process on a server is talking which process on another server – it truely means that individual transactions can be traced from when they hit the first Java or .NET server until they leave the last one in the system (usually to hit the database server). This tracing combined with in depth code code level diagnostics via byte code instrumentation is what distinguishes dynatrace.
ExtraHop has reinvented the category of looking at the performance of applications from a network perspective by combining a virtual appliance that can collect data from a mirror port on the vSwitch, a physical appliance that connects to the mirror port on physical switches with deep decodes of many application level protocols and a unique ability to reassemble TCP/IP request responses into transactions and flows. This gives ExtraHop the best capability to understand true application performance on the market of the solutions that do not rely upon an agent in the application itself. Using the network instead of an agent in the application as the point of data collection has the benefit of allow ExtraHop to support all TCP/IP based applications in the environment -whether they are purchased or custom developed.
New Relic pioneered the Monitoring as a Service category by being the first APM vendor to offer robust APM functionality on a SaaS (or more accurately MaaS) basis. The product is truly plug and play, all you do is sign up, install the agent in your Ruby, Java, .NET, Pyton, or PHP application and then log onto a web console that points back to New Relic’s hosted back end of the monitoring system. New Relic is notable for its support of the broadest set of applications platforms, and its inclusion of end-user-experience monitoring and system level monitoring in one simple to buy and deploy product.
Quest Foglight is a broad and deep APM solution that uses an appliance to capture HTTP transactions as they enter the data center and combines this with deep byte code instrumentation of Java and .NET middle tiers and the market leading tools for analyzing the performance of Oracle and Microsoft SQL Server databases.
VMTurbo gets added to this list by virtue of the fact that they have added application workload monitoring to their tool for managing the performance and capacity of vSphere environments in an automated manner. VMTurbo has unique technology that allows for workloads (now applications) to be assigned budgets and for resources to be assigned prices (based upon scarcity), and for the highest priority workload (the one with the most budget) to get the scarce resources. VMTurbo is therefore further down the road in terms of automated remediation than any other vendor in the field. All that is needed is a couple of alliances between the true APM vendors that focus upon response time and VMTurbo and we will have our first true automated end-to-end management solution for vSphere.
VMware vFabric APM is the newest entry in the field. vFabric AM is a combination of the old AppSpeed product and some brand new Java monitoring technology that came from the SpringSource acquisition and which has been significantly enhanced as a part of becoming part of vFabric APM. By having both the virtual network view of the application (via a tap on the vSwitch), and the inside of the JVM view of the application (via the new byte code instrumentation technology) vFabric APM offers a unique combination of depth and breadth. The topology and response time of many different kinds of applications can be discovered at a network level, and in-depth code analysis is available for Java based applications.
As business critical applications move into production virtualized environments, the need arises to ensure their performance from a response time perspective. Legacy Applications Performance Management tools are not well suited to make the jump from static physical systems, to dynamic virtual and cloud based systems. For these reasons enterprises need to consider new tools from vendors that have virtualization aware and cloud aware features in their APM solutions. Vendors like AppDynamics, BlueStripe, Confio, dynatrace, ExtraHop, New Relic, Quest, VMTurbo and VMware (vFabric AppInsight) are currently leading this race to redefine the market for APM solutions.