Google, VMware, Hadoop, and Nutanix have all voted at least in part for a scale-out architecture that does not rely on enterprise storage. VMware and Nutanix are important because they apply these concepts to traditional enterprise workloads. Hadoop represents the future of how data-intensive computing will be done.
At the recent Misti Big Data Security conference many forms of securing big data were discussed from encrypting the entire big data pool to just encrypting the critical bits of data within the pool. On several of the talks there was general discussion on securing Hadoop as well as access to the pool of data. These security measures include RBAC, encryption of data in motion between hadoop nodes as well as tokenization or encryption on ingest of data. What was missing was greater control of who can access specific data once that data was in the pool. How could role based access controls by datum be put into effect? Why would such advanced security be necessary?
As we look at privacy of big data within any cloud, on premise, or mixed, we need to realize that the amount of data could be so large that retroactively redacting data may be itself a big data problem and that redacting well defined PII is a possibility on ingest as well as using tools like DataGuise to redact, encrypt, tokenize, etc. such data retroactively can be accomplished as another big data task, but that only handles well known PII. How do we handle derived PII?
At EMCworld 2013, one of the big stories was Pivotal and it’s importance to the EMC2 family and the future of computing. Pivotal is geared to provide the next generation of computing. According to EMC2 have gone past the Client-Server style to a scale-out, scale-up, big data, fast data Internet of Things form of computing. The real question however, is how can we move traditional business critical applications to this new model, or should we? Is there migration path one can take?
We recently had a conversation with DataStax regarding their DataStax Enterprise product, which got us to thinking a little about the nature of Big Data and Cloud. DataStax is the company behind the Open Source Cassandra NoSQL database. It provides technical direction and the majority of committers to the Apache Cassandra project.