At the recent Misti Big Data Security conference many forms of securing big data were discussed from encrypting the entire big data pool to just encrypting the critical bits of data within the pool. On several of the talks there was general discussion on securing Hadoop as well as access to the pool of data. These security measures include RBAC, encryption of data in motion between hadoop nodes as well as tokenization or encryption on ingest of data. What was missing was greater control of who can access specific data once that data was in the pool. How could role based access controls by datum be put into effect? Why would such advanced security be necessary?
As we look at privacy of big data within any cloud, on premise, or mixed, we need to realize that the amount of data could be so large that retroactively redacting data may be itself a big data problem and that redacting well defined PII is a possibility on ingest as well as using tools like DataGuise to redact, encrypt, tokenize, etc. such data retroactively can be accomplished as another big data task, but that only handles well known PII. How do we handle derived PII?