Over at readwrite.com, Matt Asay published a blog post entitled “In A World Of Open Source Big Data, Splunk Should Not Exist.” He then does a pretty good job of debunking his own thesis and explaining why customers continue to pay Splunk big bucks to do what it does. However, since there is so much noise around the question of open-source big data tools as alternatives to Splunk, this question deserves further exploration.
What Is Splunk?
Some people just look at Splunk as a proprietary big data back end for log data, one with a nice user interface for searching, querying, and dashboarding that data. These people are missing a couple of very important points about what Splunk is and what its value is to its customers:
- Splunk is not just a big data back end with a search engine and a user interface. It is a big data back end tuned to ingest large amounts of structured, semistructured, and unstructured data in near real-time and to make that data available almost immediately to searches and queries. This real-time write with simultaneous real-time access is a use case that most open-source big data back ends designed for batch queries against historical business data cannot meet.
- Splunk is actually a solution to the Security Event Management problem, and many customers buy it precisely to solve this problem. In fact, Gartner rates Splunk a leader in the Security Information and Event Management (SIEM) Magic Quadrant. There are many customers who find that it is both less expensive and more functional than traditional SIEM offerings, which should give pause to those who decry Splunk as “expensive” (see the Gartner Magic Quadrant, below).
- In the security space, Splunk offers tremendous value-added products in addition to its core platform. The Splunk App for Enterprise Security and the Splunk App for PCI Compliance both give meaning to security-specific data and make that data actionable to users in ways in which just dumping it into a big data back end would not accomplish.
- Splunk is not just about logs. For example, the Splunk App for VMware collects all of the same operational data that VMware vCenter Operations collects from the vSphere API (screen shot of the app is below).
- Splunk has a robust ecosystem of vendors who either pass their log data into Splunk or provide bidirectional API integration between the Splunk datastore and the vendor datastore.
- Splunk is a tested, mature, deployable, and fully supportable enterprise-grade software solution. The data collectors are robust and easily deployable, and the entire back end is scalable and manageable.
The Open-Source Story
While there are a number of open-source alternatives to Splunk, none of them rise to the level of solving the problems that Splunk solves in a manner that is consumable and manageable to an enterprise. Elasticsearch, combined with Logstash (to collect the data) and Kibana (to dashboard the data), works at small scale. But these are built off of three different open-source projects that will take some time to merge into a cohesive set of functionality and user experience. Elasticsearch has historically relied on its own datastore and only recently announced support for Hadoop. The jury is still out on whether or not Hadoop can handle the real-time ingest and query use cases as well as the variety of data types that Splunk can handle. However, Elasticsearch shows signs of increasing the intensity of its focus on Splunk, such as recently hiring a top Splunk product executive as VP of product management.
The other side of the open-source story is that there is a tremendous amount of impressive innovation going on at the datastore level. Hadoop, MongoDB, NuoDB, Couchbase, Cassandra, Druid, InfluxDB, and others are progressing at a rate that no single vendor with a proprietary datastore will be able to match over the long term. So it is only a matter of time before one or more open datastores catch up with what the proprietary datastore from Splunk can do. But Splunk already signaled what it will do when this happens when it delivered Hunk (Splunk for Hadoop) last year. The point is that even if the datastore gets commoditized, filtering the data, structuring the data, relating the data to other data, and making the data consumable to users with various use cases will remain hugely valuable to enterprises worldwide.
Open source alone is not going to kill Splunk. Innovation at the datastore layer may cause Splunk to shift to Hunk over a period of years. Open-source search and dashboarding is not going to kill Splunk either—not until those solutions solve the actual problems that Splunk solves for enterprises worldwide.