PerformanceManagement

Managing Application Performance with Log Analysis

PerformanceManagement

There are three use cases for log analysis: security, IT Operations, and application management. In this post, we drill down into how to use log analysis, the various complementary apps, and integrations from various APM and AA-IPM partners that can be used to manage applications and their performance. 

The Log Analysis Landscape

Before we get into how well one can manage application performance with log analysis, let’s briefly review the vendors in this space:

Splunk

Splunk is the clear market leader and product functionality leader in the arena of log analysis. Splunk has more customers than any other log analysis vendor, it is used in some of the most demanding situations (ingesting nearly a petabyte of data a day), it has the broadest and deepest ecosystem of vendor partners, and it is the only vendor in the space that is available on both an on-premises and cloud-hosted basis. Splunk shines in the area of security log analysis, which requires a combination of real-time ingestion of massive amounts of data with near real-time availability of that data to queries. It delivers this functionality in a set of compelling user interfaces and is rated a leader by Gartner in the security event and information management (SEIM) market.

The only criticism widely levied at Splunk is that, since it charges by the quantity of data ingested per day, at the high end it can get very expensive, especially for enterprises that wish to use it not just for security data, but also for IT operations data, and maybe for some business data as well.

The only question left to answer about Splunk is what its position is with respect to its ecosystem. When it announced the Splunk App for Stream, Splunk essentially signaled that it was going to compete with a partner, ExtraHop. Basically, Splunk produced an agent that, when installed in the operating system, can collect much of the same data that ExtraHop collects. This has made every vendor with agents that collect unique data about application behavior, especially vendors in the APM space, wonder what Splunk’s next move is going to be.

VMware Log Insight

Log Insight is VMware’s on-premises competitor to Splunk. It features an English-language search engine and a variety of connectors relevant to VMware customers. The pricing is either per–virtual machine (per operating system instance) or per–CPU socket. Long-term differentiation between Log Insight and Splunk will arrive in the form of bidirectional integration of Log Insight with vCenter Operations and the rest of VMware’s recently rebranded vRealize Suite of management software.

Logentries, Sumo Logic, and Loggly

These vendors are all cloud-hosted log management services that focus on being easy to adopt. They are widely used for cloud-hosted applications, often complementing New Relic.

The major advantage of these solutions is that since they are cloud services, the customer can avoid all of the effort associated with setting up, maintaining, and managing the back-end infrastructure required to run log management on premises. This makes it much faster to get logging set up and requires much less ongoing effort to maintain the back end of the logging system.

The only issue with these solutions is the same issue that affects all cloud-hosted logging solutions. It takes time to ship the logs over the Internet and then to access the back end over the Internet, which means that cloud-hosted log management solutions cannot be used for the extremely real-time security log analysis for which Splunk on-premises is widely used.

Elasticsearch

Elasticsearch is a mashup of three open-source projects: Logstash for data collection, Elasticsearch for the data store and search engine, and Kibana for dashboarding. This is commonly referred to as the “ELK stack” (for Elasticsearch, Logstash, Kibana). The ELK stack is widely downloaded, but it appears to be used only by organizations that have the skill set required to deploy and manage open-source solutions. If you do not have developers who can work with the ELK stack, you do not have the skill set to use ELK in production. This explains why packaged commercial solutions like Splunk dominate the on-premises enterprise market.

Managing Application Performance with Log Analysis

The first thing to understand about log analysis when it comes to managing application performance is that when most people think about application performance, they think about the Application Performance Management category of products as defined by Gartner. These products are all about instrumenting custom-developed code in production. The new APM markets are led by AppDynamics, New Relic, and AppNeta. The market for managing Java applications that include legacy components like the mainframe is led by Compuware. Log analysis is definitely not an APM solution, because an APM solution must have an agent that sits inside of the runtime of the application (in the JVM), and the leading log analysis solutions have no such agent. So, if you are sticking with the formal definition of APM, log analysis is not APM, and you can stop reading right here.

That said, there are several options for managing application performance with log analysis, the viability and desirability of which depend on your situation.

The One Custom Application Case

If you only have one (or a few) custom-developed applications, and you are one of those companies for which the application is the company and the company is the application, then that single application is very important to you. In fact, you probably know an awful lot about it and how it works in production. Further, it might well be the case that all of your company’s revenue is tied to the stable operation and great performance of this application. In this case, you have a bunch of good options for using log analysis to manage your application’s performance:

  • Use log analysis simply to collect the logs and operational metrics from your hardware, your operating systems, and your application runtimes. Putting all of that information into Splunk will give you a better picture of how your environment is working than most legacy operations management solutions will.
  • If you know how you expect that application to behave on the network, and you are a Splunk customer, you can use the new Splunk App for Stream to collect the subsets of the network data for just that application you care about. If you are interested in a more comprehensive treatment of your network data, you should look at ExtraHop, which has integrations with Splunk, Log Insight, and Elasticsearch. Note that since network data is the most voluminous type of data, you need to look at how your log analysis vendor prices things and at the quantity of data produced before you go down this road.
  • The next step after that would be for you to instrument your code so that logs are generated whenever anything happens in your code that is unexpected or undesirable. Note that this assumes that you can know ahead of time what all of the undesirable events could be, and that you have the development time to implement all of this logging in your applications.

The downside of this approach is that you can do all of the logging of your applications that you want, and instrument your applications as deeply as you want, and you are still not going to approach what a true APM solution can do for you. Why? If you rely on the developer who wrote the application to instrument the logging for that application, that developer is going to instrument the application for how he or she expects it to be used. We all know that in the real world, users never use applications in the way in which developers intended (you cannot make a piece of software foolproof, because fools are so ingenious). Therefore, something that watches everything the application does is going to be called for if you really care about the operation of that application. That “something” would be a true APM solution.

The Many Custom Applications Case

If you have many custom applications (10s, 100s, 1000s), then this becomes a matter of available developer time. Would you rather have your developers spend their time implementing logging in an application or adding to applications a functionality that makes you more money or improves your market share? If you are in this situation, anything other than high-level logging in the application becomes a giant waste of time, and a true APM solution is mandatory.

The Many Custom and Many Purchased Applications Case

This is the situation in which most enterprises find themselves. They have hundreds or thousands of applications. No one knows enough about any of them to implement any kind of custom instrumentation per application. And there certainly is not enough time in which to implement any kind of custom instrumentation. In this case, logging needs to be augmented by two different kinds of solutions:

  • An APM solution (from a vendor like AppDynamics, New Relic, AppNeta, or the Dynatrace division of Compuware) to automatically instrument and manage the custom applications in production.
  • An application-aware infrastructure performance management solution (from a vendor like AppEnsure, AppFirst, BlueStripe, Boundary, Correlsense, ExtraHop, Riverbed, Virtual Instruments, or Xangati) that can discover all of the applications in production, map their topology, and measure end-to-end response time and throughput for those applications.

The modern APM and AA-IPM solutions are covered in Who’s Who in Application Performance Management for the SDDC and Cloud. Since this post is about managing application performance with logs, there are a few vendors who deserve special mention:

  • AppEnsure has just delivered the Splunk App for AppEnsure. This is a breakthrough in the area of managing application performance with Splunk, since AppEnsure puts all of its data directly into the Splunk data store, making it directly and immediately available for queries by Splunk users. This product also ships with its own app, which is an extension to the Splunk Web Framework that shows you all of your applications and their topologies, response time, throughput, and error rates across those topologies. You can download the Splunk App for AppEnsure from the Splunk website, and you can download the AppEnsure product (required to collect the data) from the AppEnsure website.
  • ExtraHop has integrations with Splunk, VMware Log Insight, and Elasticsearch. It is also notable for having stated that it intends to make its data available broadly in big data stores. This was covered in News: ExtraHop Announces Open Data Stream — Sets Its Data Free.
  • AppDynamics has an integration with Splunk and has also announced its support for open big data back ends. This was covered in News: AppDynamics Goes Big Data with Its Summer 2014 Release.
Summary

Log analysis solutions cannot manage applications and application performance all by themselves. They need to be complemented by either or both an APM and an AA-IPM solution in order to get the right data into the back end data store.

Share this Article:

The following two tabs change content below.
Bernd Harzog
Bernd Harzog is the Analyst at The Virtualization Practice for Performance and Capacity Management and IT as a Service (Private Cloud). Bernd is also the CEO and founder of APM Experts a company that provides strategic marketing services to vendors in the virtualization performance management, and application performance management markets. Prior to these two companies, Bernd was the CEO of RTO Software, the VP Products at Netuitive, a General Manager at Xcellenet, and Research Director for Systems Software at Gartner Group. Bernd has an MBA in Marketing from the University of Chicago.

Related Posts:

Leave a Reply

1 Comment on "Managing Application Performance with Log Analysis"

Sort by:   newest | oldest | most voted
Guest

Great article, I was surprised though that you missed Stackify (www.stackify.com I believe). We’ve been using their error and log management solution for quite sometime and it has been invaluable. It helped us to find memory leaks and track how new releases are behaving.

wpDiscuz