Talk Data to Me


At Zenoss GalaxZ 16, there was a button titled “Talk Data to Me.” That got me thinking about the nature of data or, more importantly, what we keep, what we use, and the future of data altogether. Do we throw away data because we have no way to store it or analyze it, or because we consider data to be a renewable resource? Are enterprises embracing data? Or is this just a next-generation application concept?

Infrastructure, application, and security monitoring tools are changing to become data driven. Each of these tools consumes huge amounts of data on a per-minute basis to tell IT professionals about infrastructure problems, along with probable—or more likely possible—causes. Why only possible? Because root cause analysis only goes as deep as the data does. I have seen storage subsystems affected by bad network settings, yet the tools just pointed to storage as the “probable” cause, not networking. Why? Because the data was incomplete.

Perhaps that is the bane of data: it is always incomplete. Sort of like Zeno’s paradox in which a person takes a step halfway to their destination, and halfway again. In essence, you may always be halfway away from your target. But as the engineer says, six inches is close enough. Or better yet, the data we have that covers 80 to 90% of the issues is often good enough. However, when it isn’t good enough, we think data is a four-letter word.

Perhaps the bane of data is also the mass of data that we collect, use once, and throw away. Or is it the data we intentionally never collect? For years, I tried to collect more and more data for a high-performance computing project. On later examination, we discovered we just did not need all that data. It was not useful to us. So, we knowingly threw away terabytes of data a day. We collected a subset of that data daily for use, but we did not collect everything. Should we instead collect all the data? That is an ongoing debate. What would we do with the data? Would it even be usable today? In the end, we believe the data is usable, we just do not know enough about how to query it for information.  In the end, this organization is starting small by introducing ElasticSearch into their environment as an easier platform to query their existing data, then they will start to correlate across data streams.

In the security world, we are seeing more and more analytics involved in making sense of network, event, and log data. We keep what we need for the time being and store the rest for the future. Is this the way to go? How far into the future should we store data?

We hear that many companies are using big data platforms to gain insights on and from their data. However, I wonder how many companies are actually using Hadoop, Elasticsearch, and the like. Yes big name brands are using them, but what about everyone else? Is there an uptick in use? I think it is starting; it is not a huge uptick, but an increasing number of companies are turning to those products as they become easier to use.

That is the real crux of the matter. The early adopters know how to use the tools to manage data. The next big issue is managing all the data that is copied between various tools, clouds, etc. That is where copy data platforms come into play. It is not just about files, but repositories of data, clouds of data, and everything in between. Data is increasing: we see that daily. What we do with that data is changing as well.

It is no longer IT operations, but IT operations analytics (ITOA), built upon the data produced by IT operations. It is no longer intrusion detection, but behavioral analytics. It is no longer log files and SIEM, but quick ways to search data, store queries, and make more complex queries. It is about analytics.

So yes, talk data to me, but do not forget that your current data is incomplete. Don’t forget to apply data protection to the data. Make sure that access to the data is secure. And ensure you know where all your data is. In the end, it is all about the data, and the ITOA, management (performance and application), and security companies of today know this.

Share this Article:

The following two tabs change content below.
Edward Haletky
Edward L. Haletky, aka Texiwill, is the author of VMware vSphere(TM) and Virtual Infrastructure Security: Securing the Virtual Environment as well as VMware ESX and ESXi in the Enterprise: Planning Deployment of Virtualization Servers, 2nd Edition. Edward owns AstroArch Consulting, Inc., providing virtualization, security, network consulting and development and The Virtualization Practice where he is also an Analyst. Edward is the Moderator and Host of the Virtualization Security Podcast as well as a guru and moderator for the VMware Communities Forums, providing answers to security and configuration questions. Edward is working on new books on Virtualization. [All Papers/Publications...]
Edward Haletky

Latest posts by Edward Haletky (see all)

Related Posts:

Leave a Reply

Be the First to Comment!