Data and Metadata Everywhere


In a recent conversation, I was considering the data about data that abounds for any business, organization, or person. A great deal of data is stored and classified as public information. The metadata around that data is becoming increasingly more valuable. Is this data maintained, curated, monitored, and controlled in any fashion? The answer varies among people and organizations. Yet, the real question concerns not the data we know about, but the data we do not know about: the “unknown unknowns.” Is this data a risk to our business, to our family, or to our livelihoods?

First, how do you find all the unknown unknowns? There are a number of services that scan the web for your data. You can “Google Hack” yourself, your organization, and your family. Google Hacking allows you to find previously hidden information by making search requests. You used to be able to see this data in real-time. Google Hacking will find unknown unknowns about your organization, whether that organization is a business or a family. If the data is publicly available and searchable, then Google can find it, and so can you. The simplest hack is to place your organization name in the search and see what comes up. Some data that should show up is:

  • Your organization’s home page
  • Social media sites (are there any you didn’t know about?)
  • Mentions on public forums (are there complaints on any forums?
  • Mention in Press Releases

This information can be seen by others and therefore used to find information about your organization. A complaint that is not responded to could lead to a missed sale. Worse, you might see exposed data that could lead to a direct attack. Additional search tools can lead you down the path of finding all your internet-facing devices (, and still others can discover information about the people within your organization. Some of these search services require a fee.

Given this abundance of data, how could it be used against you? This is where you need to put on your security hat and look at the data from a risk perspective. You need to think “How will this data impact my organization directly?” You also should look at how the data can impact your organization indirectly. Consider the following:

An attacker finds public records, such as press releases, about the organization, including a growing list of partners.

How is this risky? In many of the Verizon Breach Investigation Reports, partner-based attacks are a major way into an organization. Should you, then, limit listing partners publicly, should you ask your partners to prove their security, or should you limit what partners can access? How they access any internal data? How you share data with your partners? Could this be a method by attackers to spearfish for particular information, or perhaps get approvals for large sums of money to be transferred?

An attacker finds public records, such as on social media, about the travel of key people within the organization.

How is this risky? Could hackers use this to spearfish targets regarding their trip and destination, and use this to infiltrate the organization or discover classified information about a deal, intellectual property, or something else?

There is a growing amount of data and metadata about data for each organization. Whereas in the past it took ages to do the research, it now takes minutes. The speed of attacks and the speed of data have increased. Speed has helped us for legitimate uses, but it also has helped those wishing to gain by illicit acts: the bad actors. The law is also not helping here.

There have been passed a number of freedom of information laws, transparency directives, etc. These require government organizations to make available public information in an easy-to-consume way. This in itself has been opening up new data sources daily—even hourly in some cases. These sources of data often used to be very hard to find, but today that is not the case.

Protecting yourself, your organization, and your family will require you to understand these sources of data, how to search for new data, and how such data could be used against your organization through side-channel, social engineering, and other attacks. The goal is to come up with methods to minimize your risk for data outside your direct control.

Share this Article:

The following two tabs change content below.
Edward Haletky
Edward L. Haletky aka Texiwill is an analyst, author, architect, technologist, and out of the box thinker. As an analyst, Edward looks at all things IoT, Big Data, Cloud, Security, and DevOps. As an architect, Edward creates peer-reviewed reference architectures for hybrid cloud, cloud native applications, and many other aspects of the modern business. As an author he has written about virtualization and security. As a technologist, Edward creates code prototypes for parts of those architectures. Edward is solving today's problems in an implementable fashion.
Edward Haletky

Latest posts by Edward Haletky (see all)

Related Posts:

Leave a Reply

Be the First to Comment!