For years, Gartner has insisted that if an APM tool does not cover each of its “Five Dimensions of APM,” one of which is deep code analysis, then it is not an APM tool. Gartner has therefore defined APM to be relevant only to custom-developed applications. Well, it has finally woken up and realized that 70% of the applications that enterprises run are in fact purchased and that maybe the performance of these applications might be important as well. So, Gartner has created a new category, application-aware infrastructure performance management.

Gartner’s New Categories

In its blog post The Three Topologies in Applications and Infrastructure, Gartner identifies three categories of tools:

  • Application Performance Management (APM) focuses just on finding bugs in the code of custom-developed applications. The target task here is to support rapidly changing applications in production, which means finding problems in code quickly and accurately. The target audience for APM tools is the developers who support these applications in production.
  • Network Performance Management (NPM) focuses on the operation of the network in support of the applications running in the environment. The target task here is to quickly find issues in the network that are affecting business-critical applications. The target audience is the network engineers who support the local and wide-area networks in the company.
  • Application-Aware Infrastructure Performance Management (AA-IPM) focuses on the response time and throughput of every application (custom developed and purchased) in production and on how interactions between the applications and infrastructure are affecting application performance. The target audience is the IT operations teams that have to support every application in production.

Application-Aware Infrastructure Performance Management

If you are in IT operations, you should rejoice. Gartner has finally realized that the existing IT operations tools that you have been using for decades are not adequate for virtualized data centers, software-defined data centers, or clouds. Gartner has further realized that since IT operations gets yelled at when applications are slow, maybe IT operations should have some visibility into how all of the applications are actually performing (with performance being defined as response time and throughput—NOT resource utilization). In Gartner’s research notes, it focuses on AppEnsure, AppFirst, BlueStripe Software, Boundary, Correlsense, Neebula, ExtraHop, INETCO, and Virtual Instruments. We would add Riverbed Cascade and Xangati to Gartner’s list and we have done so below.

Application-Aware Infrastructure Performance Management Tools

Vendor/Product Product Focus Deployment Method Data Collection Method Supported App Types Application Identification Application Topology Discovery Cloud Ready “Zero- Config”
AppEnsure Manage response time and throughput of every Windows and Linux application, whether purchased or custom-developed, physical or virtual, or remote or local, and whether in hybrid or public clouds. Includes automated application discovery, topology mapping, and root cause analysis. On-premises/ SaaS Agent inside of the Windows or Linux operating system All TCP/IP on Windows or Linux
AppFirst Monitor every application in production irrespective of source and deployment On-premises/ SaaS Agent inside of the Windows or Linux operating system All TCP/IP on Windows or Linux
BlueStripe FactFinder Monitor every application in production irrespective of source and deployment On-premises Agent inside the Windows, Linux, AIX or Sun operating system All TCP/IP on Windows, Linus, AIX, or Sun OS

Boundary Monitor the impacts of network flows upon the application SaaS Agent inside of the Windows or Linux operating system All Linux TCP/IP applications
Correlsense Monitor transactions of complex enterprise architecture that are based on a large variety of platforms, languages, and middle tiers On-premises Agent inside the Windows, Linux, AIX or Sun operating system All TCP/IP on Windows, Linus, AIX, or Sun OS  
ExtraHop Networks Monitor every application in production irrespective of source and deployment On-premises From a mirror port on a physical switch or the vSphere vSwitch All TCP/IP regardless of platform
INETCO Insight Monitor every application in production irrespective of source and deployment On-premises From a mirror port on a physical switch All TCP/IP regardless of platform
Neebula Discover and map the topology of all web-based applications in the enterprise On-premises/SaaS From standard management interfaces in the environment All TCP/IP web-based regardless of platform
Riverbed Cascade Monitor application-level packet data and flow data to determine application performance from the perspective of the network On-premises Flow Collector, physical appliance on a physical mirror port, or virtual appliance on the VMware vSwitch All TCP/IP regardless of platform  
Virtual Instruments Collects subsecond storage latency data and throughput data for Fibre Channel–attached storage On-premises A TAP on the SAN, allowing every Fibre Channel transaction to be observed from the outside in All applications that are dependent on Fibre Channel–attached storage
Xangati Monitoring the end-to-end latency of the virtual infrastructure with a focus on VDI On-premises Collects storage latency data and resource utilization data from vSphere and network performance data from Netflow All applications running in a vSphere environment

AppEnsure is an on-premises and SaaS-delivered APM solution that focuses on identification of each application by name, application topology discovery, end-to-end response time monitoring, and automated root cause analysis across all applications (custom developed or purchased) deployed across any mixture of physical, virtual, and cloud-based environments. AppEnsure is based on an agent in the Windows and Linux OSes that sees the interaction of processes with the operating system, and then sees the interaction, over the network, of those processes with the adjacent components of the application system. AppFirst is an on-premises and SaaS-delivered APM solution that is most frequently used by SaaS software vendors or delivered through cloud vendors to customers. AppFirst focuses on the collection of a comprehensive set of metrics including OS metrics, performance metrics, log files, and statd metrics. It is a perfect complement to New Relic, as it monitors all of the layers of the application system that an agent in the application runtime cannot see. BlueStripe FactFinder is based on an agent that lives in the Windows, Linux AIX and Solaris OS and supports all applications running on those operating systems. This agent watches the network flow between the OS and everything that it is talking to. Through this process, FactFinder discovers the topology of the applications running on each set of monitored servers and calculates an end-to-end and hop-by-hop response-time metric for each application. Since FactFinder is in the OS and not in the application, FactFinder is able to calculate these response-time metrics for any application that runs on a physical or virtual instance of Windows or Linux. This makes FactFinder the only product that provides this level of out-of-box functionality for such a breadth of applications. Boundary is a SaaS-delivered solution for both public and private cloud deployments that focuses on using deep and real-time analysis of the infrastructure and network to understand how issues are affecting application performance. Boundary provides DevOps and IT operations teams with a tool for matching operations visibility with modern application change frequency. This is complementary to code-focused APM solutions deployed to the cloud and on-premises, since these deployments tend to result in the kinds of complicated interactions between the components of the application system that is Boundary’s focus. Correlsense uses an agent that lives in the operating system. It observes the interactions between the application and the OS and maps application topologies and time transaction performance. This is another solution that provides excellent end-to-end and hop-by-hop application response time and transaction response time across a very wide range of applications and run-time environments. ExtraHop Networks uses a mirror port on either the physical network or the VMware vSwitch to see all of the network traffic that flows between physical and virtual servers. This source of data means that ExtraHop can see application topologies and measure end-to-end response time for every TCP/IP-based application on the network without requiring installation of any agents in applications, JVMs, virtual servers, or physical servers. INETCO Insight uses a mirror or span port on physical switches to collect detailed data about the flows between the components of applications. INETCO Insight relies on its Unified Transaction Model framework to reconstruct multi-tier and multi-hop transactions, mining relevant transaction information and business context from the decoded fields. INETCO Insight provides network performance data, application payload intelligence, and detailed transaction response times and completion metrics all in one view, for every transaction. INETCO specializes in payment applications used by large financial institutions. Neebula ServiceWatch focuses on mapping the infrastructure that supports critical business services. Once these maps are automatically created, ServiceWatch can also monitor the key resource utilization statistics in the infrastructure components that comprise each business service. Riverbed Cascade collects layer 2 through 7 TCP/IP data from physical taps on physical switches, from virtual taps on the virtual mirror port of virtual switches, or via agents inserted into the network stack of the Windows or Linux operating system. Cascade includes layer 7 decodes of popular protocols like HTTP, which allows it to identify applications that use unique ports and protocols and to measure their end-to-end response time. Virtual Instruments VirtualWisdom focuses on collecting real-time (subsecond) storage latency and transaction completion information for Fibre Channel–attached storage arrays. This data is collected via a tap that is inserted into the Fibre Channel SAN and sees every transaction that is flowing to and from Fibre Channel–attached storage arrays. VirtualWisdom provides real-time, comprehensive, and deterministic latency information on every storage transaction for a Fibre Channel–attached storage array. Xangati is based on an in-memory database that can process a large amount of real-time data. This allows Xangati to accept, process, and then display end-to-end infrastructure performance information for VDI, server virtualization, and network virtualization scenarios.

How to Choose an Application-Aware Infrastructure Performance Management Solution

The most important thing to do when choosing a performance management tool is to focus on what problem you are trying to solve and for whom you are trying to solve it. This leads to the following process:

  • Is this a custom-developed application for which the tool’s job is to support the application development team by finding bugs in production? If so, consider an APM tool (a tool that instruments custom code). If it is the job of the tool to support IT operations staff who have to support every application in production and the infrastructure that supports those applications, then pick the application-aware IPM tool.
  • Make sure that the tool you select can automatically identify (by name) the applications you care about, map their topology, measure end-to-end response time and throughput, and provide actionable diagnostics when response time or throughput degrades.
  • Press every vendor you consider on its definition of performance. Select only vendors that can prove to you that their tools can measure performance in terms of response time and throughput.
  • Despite some of the marketing and sales messaging that goes back and forth between vendors, APM tools do not compete with AA-IPM tools. In fact, you might be well-served buying one of each.
  • Pursue a strategy of instrumenting every important application in production with an appropriate solution. If you are virtualizing business-critical applications (for many organizations, they are all that is left to virtualize), then baselining the performance of the application with an AA-IPM solution on physical hardware and then using that baseline as the SLA for its virtualized instance is really the only sensible way to overcome application owners’ objections to the virtualization process. Therefore, choosing the right set of solutions is a critical part of your virtualization initiatives for 2014 and an absolutely essential part of your strategy for migrating to a software-defined data center.

Summary

Application-aware–infrastructure performance management tools will be a critical part of ensuring that the applications that matter to your business are highly available and perform well in your software-defined data center. Running rapidly changing applications on a highly dynamic software infrastructure will lead to intractable problems unless proper APM tools are deployed for your developers and AA-IPM tools are deployed for your IT operations staff.

Share this Article:

Share Button
Bernd Harzog (326 Posts)

Bernd Harzog is the Analyst at The Virtualization Practice for Performance and Capacity Management and IT as a Service (Private Cloud).

Bernd is also the CEO and founder of APM Experts a company that provides strategic marketing services to vendors in the virtualization performance management, and application performance management markets.

Prior to these two companies, Bernd was the CEO of RTO Software, the VP Products at Netuitive, a General Manager at Xcellenet, and Research Director for Systems Software at Gartner Group. Bernd has an MBA in Marketing from the University of Chicago.

Connect with Bernd Harzog:


Related Posts:

8 comments for “Application-Aware Infrastructure Performance Management

  1. virt
    April 3, 2014 at 2:56 PM

    Interesting and insightful article as always. So where would VMTurbo’s APM or even a vCOPS from VMware fit in the Application Aware Infrastructure Management space.

  2. Sean
    April 4, 2014 at 5:32 PM

    Also wondering the same… Recently saw a great demo from Blue Medora working with VCOPS (Oracle integration).

    There seems to be some great benefit in using VCOPS for application aware trending which is useful for capacity/availability management, but still yet to see useful tracking/timing of specific transactions.

    Overall really impressed with VCOPS but keen to see if anyone yet using that as a platform for APM (or per this article AA-IPM.)

  3. Bharzog
    April 6, 2014 at 11:20 AM

    You ask a really interesting question regarding VCOPS and VMTurbo. I view both of those products as being in the Virtual and Cloud Operations Management space. The issue that this raises is that Gartner retired its IT Operations Magic Quadrant (ITOM) last year, and said that they were not going to revive it. I think that this is because if there were to publish an ITOM MQ with a focus upon data center virtualization, the software defined data center and the cloud, IBM, BMC, HP and CA would all be in the lower left hand quadrant and Gartner might have some reluctance doing that to four vendors that spend a lot of money with them.

    I personally do not think Operations Management products like VCOPS, VMTurbo, Zenoss, Cirba, etc belong in a quadrant that is all about performance management which is defined as response time and latency. I think we need a new category of tools that are focused upon capacity management, capacity planning, optimal workload resource configuration, optimal workload placement, and configuration management that targets the virtualized data center, SDDC and cloud use cases. I will probably create such a category and call it Virtual and Cloud Operations Management or something like that.

  4. Bharzog
    April 6, 2014 at 11:26 AM

    The platform question is a really interesting one. I think that the frameworks from IBM, BMC, HP and CA have failed. I think that they only thing that stands a chance of replacing them for the SDDC and the cloud are a set of management platforms wherein an ecosystem of vendors collectively solve the entire problem for customers. Splunk is the best example of such an ecosystem which is based upon the idea that every vendor puts their data into Splunk and every vendor gets to reuse the data from everyone else. This is the only that cross-domain problems (security affecting application performance) are ever going to get found and fixed quickly.

  5. GP
    April 8, 2014 at 8:32 AM

    Bernd – thanks for your post. You have done an excellent job of helping create a list of great solutions. Another interesting area of exploration is the use of open source solutions in this space. Gartner has also started covering them and highlighting some of the interesting ones e.g. see Jonah Kovall’s recent post on Torch and Graylog2

    There is a school of thought that believes that the recent emergence of real-time analytics, streaming and the ability to process large amounts of data using tools like Apache Flume, Stream processors like Storm will provide an additional layer of actionable intelligence related to APM especially in situations where there is a COTS solution.

    Thanks again for a great post.

  6. TrueAPM
    April 9, 2014 at 10:19 AM

    Instead of looking at a vendor that does many things or has their focus on other things like databases, OS’s or VM’s, why not look at the vendors that do only APM.
    The reason is that they are focused on this field and if tehy can’t answer to the questions , they will disappear. So in that sense I agree with Bharzog that the BIG FOUR have failed.
    There shouldn’t be any sensible Ops house that have tools that require more resources (time,CPU,disk,network,people) to monitor an application, than the application itself.
    The philosophy has to focus on the users as this is why the application exists in the first place. If the user is unhappy, the solution has to at leas be able to tell where in the chain of devices, the origin of the deviation is.

Leave a Reply

Your email address will not be published. Required fields are marked *


− 7 = one