Big Data

Have you noticed lately that the term “big data” is being used with increasing frequency? It seems that working with big data is one of the more desired and in-demand skill sets in the technology space. What you think “big data” is, and what do you think it represents? One definition to consider is this one from Wikipedia: “Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time.” So, who benefits the most from its use? Have you stopped to consider just what makes up big data? Let’s explore that question a little deeper.

One of the biggest beneficiaries of the use of big data is the field of science. The Large Hadron Collider experiment is just one project among many that are reaping the benefits. Astronomers use big data techniques to analyze the petabytes of data generated daily by the world’s most powerful telescopes. Near and dear to me as a resident of the great state of Florida are advancements in the use of big data to help predict, monitor, and track the weather and meteorological events like hurricanes. And of course, you cannot mention science without getting into a discussion about what big data has brought to the medical community. Consider that decoding the human genome originally took ten years. Now it can be achieved in less than a day. DNA sequencers have divided the sequencing cost by ten thousand over the last decade, which is one hundred times cheaper than the reduction in cost predicted by Moore’s Law. There is no question that big data provides benefits for humanity, whether by helping eradicate disease, providing earlier warning of catastrophic events, or offering opportunities yet unimagined. This can give us all hope for a better tomorrow.

Yet, you cannot have the good without the bad. Outside of the scientific fields, in my humble opinion, big data is really people data. Even in medicine, big data is still data on people. Companies like Google collect huge amounts of data from search engines and many other sources. Let’s face it, Google has its fingers in just about everything we do when we are on the Internet—and in the twenty-first century, just about everything is connected to the Internet. We see this data collection in use when we see those targeted ads geared specifically toward our own personal interests. This is marketing’s holy grail: the ability to target marketing messages at the specific people who have an interest in the marketed products or services.

Now Google is working on a driverless car. How long do you think it will be before billboards present marketing messages geared toward the passengers of particular groupings of cars driving by? Take that a step further: how long will it be before science fiction becomes reality and billboards greet you by name with those specialized targeted ads? This seems to be the way of the future. I have not heard from anyone who seems overly concerned about targeted ads so far, but where will the line be drawn between acceptable targeting and targeting considered too intrusive? The answer depends entirely on the individual, but in all honesty, I believe it won’t really matter where the individual believes that line to be.

There are some obscure sources of information out there that try to offer insight into which data and how much of it is gathered on each of us. However, you really have to dig deep to find them, and they don’t cover everything. I am sure companies at least attempt to inform users about their data collection in the legal terms of use that most of us quickly glance over while looking for the “Next” button. What will it take for the public to demand to have what information is gathered clearly spelled out for all to see? Do you think there will ever be an option, when discontinuing the use of a service, to be presented with all the information that has been gathered from you and to have that data purged? Facebook has already basically stated that it does not have the ability to completely delete a person and the data on that person from its systems. Has Facebook established the precedent on data retention that other companies will follow? You would think there would be more push from the people to understand this type of data gathering, but it does seem that we as people have been programmed to accept this without too much question.

Now for the king of centralized big data collection and the data mining that comes with it: the government (though not any government specifically, considering that they all are collecting data on their people). The world’s governments have built most of its biggest and fastest computer systems for this very purpose. The data they collect is not just what we do on the Internet. How many of you have an EZ-pass to automatically pay the tolls as you fly by in the express lanes? In the most highly congested areas, this is something that helps make life easier and increases our most valuable commodity, our time. If you look around while you are driving around town, you might notice all the transponders that are nowhere near any toll roads or bridges. (Things that make you go “hmm.”) And now, EZ-passes are being accepted in areas all around the country, far beyond the boundaries of the area you reside in. This is just one example. I think Snowden has done plenty to bring other systems and tools to light.

Although big data helps us all, I also wanted to point out that for the most part, information on us, the people—our interests, our shopping habits, our driving habits, and so much more—is what is being collected for this “big data.” Maybe a better term to use would be “people data.” At least then, people might open their eyes to see and understand what that data is.

One final thought: Since companies like Google are selling the information on us that they have collected, shouldn’t we be getting a cut of those profits? After all, it is our proprietary data. Maybe we should be getting a piece of the pie.

Posted in SDDC & Hybrid CloudTagged ,