How much is your data worth? How much does it cost to store your data? I doubt that you have numbers for either of these things. But maybe it is time to start thinking carefully about both of these numbers. If the cost of storing your data exceeds its value, then you probably shouldn’t be storing the data. The trigger for this thought is the oncoming Internet of Things (IoT) data tsunami. A few guestimates I’ve seen suggest that we will see around 50 Zettabytes (ZB) of data generated in the next five years. That is 50 Million Terabytes. One place to store 50ZB of data is on AWS’s cheapest storage, Glacier. At the cheapest published price for Glacier, your monthly bill would be over half a billion dollars. I wonder whether knowing the temperature inside my refrigerator every minute of the day is worth that much money.
So where is all this data coming from? Historically there have been waves of data. As compute power and network bandwidth both grew the size of data created grew with them. When processing and bandwidth were very limited we were careful with data, small amounts of structured data. With the rise of Intel processors and the Internet, we started storing emails, PDF documents, and web pages. Over time, we added pictures, sound, and video. Recently mobile devices have multiplied our ability to create these media files. The sensors in our devices have added quite a bit of machine-generated data to the media. Photos now come with GPS coordinates embedded. Parking infringement reports include a photo of the infringing vehicle. The coming tsunami is of machine-generated data that is generated without human involvement.
Internet of Things is about sensors embedded in everyday appliances that report back to some central service. The appliances could be conventional fridges and washing machines. But many of them will be more commercial. Vending machines have long been able to phone home. Street lights reporting when their bulb is getting hot. All of the sensors that might be built into a new office block. Many everyday objects are getting an embedded computer and reporting back to base. These things are all getting connected to the Internet and creating lots of new data. This IoT data will need to be stored somewhere if its value is ever to be released.
Naturally the value of data is a very variable thing. Data that enables my business to make a sale might be worth the value of the sale. Data that enables me to avoid a large legal settlement is worth the value of the legal settlement. But we don’t know for sure that the sale will occur even if we have the data or that the legal case will ever take place. So there is a probabilistic part to the value of the data. What is the probable value of my fridge’s temperature and power consumption data? What is the value of the photo of my car parked in the wrong place? The value of this data also changes over time. Once I’ve accepted I parked in the wrong place and paid the fine the picture is far less valuable. I suspect that many companies who are putting their things on the Internet have little idea of the value of the data. I suspect many have been promised that value will be found if only there were data collected. The fear is that any data not collected is value that has been lost forever. Today the data is being grabbed because its value will come only after enough data is stored. How valuable does that make the data?
The cost of storing data is also a variable thing. Staying with AWS as the location for storing 50ZB of data there are different performance options. Glacier is great for storing a lot of data but accessing the data is extremely slow. To store my data on an accessible platform, like EBS, costs ten times as much as storing it on Glacier. Now my monthly bill just to have the data accessible is $5 Billion. Then there is the cost to extract value from the data. It is easy to get the value out of a word document or an invoice. Getting value from the temperature data on a million refrigerators and thermostats and whatever else IoT means is much harder. Extracting value from the data will probably involve data scientists. If they are smart (they are) these data scientists will be charging a lot for their time. Particularly when thousands of companies need to extract lots of value from IoT data.
If you are generating and storing zettabytes, or even petabytes, of IoT data then you need to know what real potential value might be extracted. If there proves to be no value, then there will be a lot of wasted spending on the gathering and storing of this data. Make sure you are not buying over hyped promises like a bunch of tulip bulb futures, or international reply coupons.