We categorize the vendors that do performance and availability monitoring for virtualization and cloud computing into four categories:
- Resource and Availability Monitoring – This is primarily about taking data from the hypervisor vendor (most often the VMware vCenter API data), storing it, trending it, reporting on it, analyzing it, and alerting on it.
- Infrastructure Performance Management – This is primarily about understanding the end-to-end response time of the virtual/physical infrastructure from the guests to the spindles and back again as requests for work are placed upon the infrastructure by applications.
- Applications Performance Management - This is about understanding how applications are performing from a response time perspective. Some of these solutions go deep and provide code level root cause information, others focus on a broader set of applications from end to end.
- Transaction Performance Management – This is about following the response times of transactions through a complex n-Tier applications system.
There are five challenges that arise in virtualized and cloud environments that create new requirements for Application Performance Management (APM) products in this space:
- The Timekeeping Problem – specifically the fact that time based measurements taken within a guest (by a guest agent, or by the OS itself) become randomly shifted by the degree to which guests are scheduled in and out by the hypervisor.
- Resource Sharing – specifically the issue that measurements of how much resource is being used by a guest or by an application within a guest can be warped by the degree to which those resources are shared among multiple guests by the hypervisor.
- Invalidity of Resource Utilization as an Index of Performance – as result of #1 and #2 above, one cannot infer applications performance from a normal or abnormal resource utilization profile for virtualized applications as one used to be able to do with applications running on physical servers. The definition of performance for a virtualized application therefore needs to change from a resource based definition to a response time based definition.
- Dynamic Application Topology Discovery –since applications move around in resource pools in a virtual environment, an up to date map of what is running where, and what is talking to what is necessary in order for an end-to-end view of applications performance to exist.
- Communications Challenges for Cloud Bursting Scenarios – if part of the application is running in the Cloud, and part is running in the four walls of your data center (behind different firewalls) then the agents must initiate one-way communications over HTTPS back to a web service in the DMZ in order for Cloud APM data to be integrated with internal APM data for the application system in question.
Today, two vendors have announced significant new releases in the APM category that significantly advance the state of the art in virtualization aware and cloud aware APM. These solutions along with the existing cloud aware solution from New Relic are summarized below.
BlueStripe FactFinder 3.1
BlueStrip FactFinder is based upon a agent that runs in the guest that discovers the components of the applications in the guest, discovers which components are communicating with which other components in other guests (dynamic application topology mapping), and then times the total and hop-by-hop response time for the entire application system. BlueStripe FactFinder has had previous support for Windows 2003 guests and Linux guests, and is today adding support for AIX V5, AIX V6 (WPAR and Micropartioning Support), Windows 2008, as well as expanding support for Solaris Zones. FactFinder works across physical and virtual deployments of all of its supported operating systems. FactFinder is therefore the only solution on the market that can provide an end-to-end response time number for an application system that spans physical and virtual (irrespective of hypervisor) deployments of an application across multiple flavors of Windows and Linux/Unix environments. This is a critical step forward for the APM industry as the back ends for many business critical tier 1 applications run on Unix, and FactFinder will now provide accurate response time measurement across those applications systems as portions of them move from physical to virtual environments. FactFinder is also the only APM solution in the marketplace that supports all TCP/IP based applications irrespective of their applications architecture.
AppDynamics is a new company founded by some ex-Wily people that is delivering a “next generation” APM solution. The release of AppDynamics v2.0 raises the bar for J2EE/.Net application performance monitoring and management products by delivering unmatched breadth, depth and ease-of-use in a single, integrated product that can be up and running within minutes. AppDynamics has advanced the state of the art in APM in the following respects:
- Built for Distributed Applications. AppDynamics provides a visual map of the distributed application to illuminate the application topology for IT Operations. This is based upon an Application Mapping feature to dynamically discover all application tiers and back-end services, even when agile development introduces new code.
- Business Transaction Centric. AppDynamics focuses upon the business transaction. AppDynamics Transaction Flow Monitoring provides visibility into how each transaction performs as it journeys along the distributed environment, a technique that enables IT Operations to be extremely precise in troubleshooting application problems.
- Deep Diagnostics. AppDynamics delivers deep diagnostics in an “always on” capacity, delivering class and method-level detail without introducing excess overhead in production environments. AppDynamics enables rapid root cause analysis with no more than 2 percent overhead.
- Policy Driven. To avoid false alarms, AppDynamics distinguishes between consistent patterns of poor performance versus one-time anomalies. It also assesses business transaction health by learning each business transaction’s historical performance pattern and comparing it to current performance.
- Cloud Ready. AppDynamics monitors cloud-deployed applications and enables leveraging elastic computing to create capacity on demand. AppDynamics features cloud orchestration to enable companies to intelligently scale up and scale down capacity as needed.
New Relic RPM
While New Relic announced its version 2 in October of last year, it is important to look at New Relic relative to these two new announcements due to the unique features, architectures and capabilities of New Relic. New Relic is also a next generation APM solution with the following unique attributes and capabilities:
- New Relic is itself a Cloud hosted offering. Unlike BlueStripe or AppDynamics, New Relic is not an on-premise solution. You simply distribute the New Relic Ruby-on-Rails or Java agent with your application to its target server and then log onto the New Relic Web Portal to get your APM console. New Relic is therefore an extremely easy solution for an applications development team to use to instrument an application of interest as no internal infrastructure is required to get the APM solution up and running.
- New Relic has achieved a significant customer base of over 3,500, with over 1,500 of these customers using New Relic for cloud hosted applications. This makes New Relic into a clear leader in the emerging area of providing an APM solution for cloud hosted applications.
- Pricing for New Relic is a simple $50 per month per server.
The Virtualization Practice has previously done a detailed Product Review of the New Relic solution. Please read this Product Review for a very detailed look at the capabilities of the New Relic RPM solution.
A Brief Comparison of these Solutions