As we look at privacy of big data within any cloud, on premise, or mixed, we need to realize that the amount of data could be so large that retroactively redacting data may be itself a big data problem and that redacting well defined PII is a possibility on ingest as well as using tools like DataGuise to redact, encrypt, tokenize, etc. such data retroactively can be accomplished as another big data task, but that only handles well known PII. How do we handle derived PII?
On the May 30th Virtualization Security Podcast, Michael Webster (@vcdxnz001) joined us Live from HP Discover to discuss what we found at the show and other similar tools around the industry. The big data security news was a loosely coupled product named HAVEn which is derived from several products: Hadoop, Autonomy, Vertica, Enterprise Security, and any number of Apps. HAVEn’s main goal is to provide a platform on top of which HP and others can produce big data applications using Autonomy for unstructured data, Vertica for structured data, Enterprise Security for data governance and hadoop. HP has already built several security tools upon HAVEn, and I expect more. Even so, HAVEn is not the only tools to provide this functionality, but it may be the only one to include data governance in from the beginning.