Scale and Engineering

When we scale things up to handle ever-larger quantities of data, we also scale up the number of issues related to the increasing pace. We’re dealing with this with fewer tools and, quite frankly, less knowledge We’ve seen changes in security (visit our latest podcasts on security and scale). We have seen changes in operations. We have also seen changes in development. Scale changes everything. But how so?

Let us define scale as roughly 40 billion queries a day (~1.6 billion queries per hour, ~27 million queries a minute, and ~460 thousand queries a second). These could be API calls, sessions, or any other web-based service. The service in question is a broker of an API. At this scale, a problem shows up very quickly and has a drastic impact if not caught within one to five minutes. This is an impact that can affect the bottom line. At this scale, the business is heavily involved in the success of the product. It is driving new changes and even innovations, all to meet demand and to scale up as required.

When this service first appeared, it was lucky to do half a million queries in a day and was comparatively unknown. As word of the service propagated, demand grew steadily. The first milestone was 1 billion queries per day. It grew from there. Engineering is constantly looking at a cycle of improvements, one dictated by the scale required by the growing business. Scale has impacted the development cycle quite a bit. We often cycle through the following as we aim to increase scalability. Eventually, you reach the limits of technology and are forced to either change technologies or change how you do things.

Scale impact on Dev Cycle

As you can see, we have to look at the most basic of things: code and databases. We also need awareness of how the network, file system, and underlying security impact our scale. Your mindset shifts the more you look at things. Your understanding also shifts. The best way to consider such a scale is to change from thinking about the application as a singleton working alone to thinking about it as a high-performance computing environment in which many parts are doing the same thing over and over and over again.

Once you make that fundamental shift in perspective, other engineering principles come into play as well, such as small perturbation theory. Small perturbation theory looks at small changes, as opposed to large ones, within a system. A small change in any one of the areas of our development cycle above can lead to massive changes to the entire project. Some of those changes are positive; others are negative. If you make too many changes at once, you have no way to figure out what has happened.

All in all, this directly impacts how code is put out. In this case, code is only put out to one subsystem at a time, which minimizes impact to the entire system. This is exactly what Agile development is designed to do. It is designed to take a nibble out of the byte of changes required by the business. The end goal is to limit the impact of any one change by limiting the change actually made to just the one item. Granted, during any sprint, there could be dozens of changes, but hopefully to just one subsystem at a time.

Scale also leads to testing issues. It is hard to test any product at the scale we’re discussing without using production systems. The main problem is cost. The second is knowledge; the third is resources. If the product cost is fine, then resources or knowledge may not exist, and vice versa. So, for some organizations, it is best to sacrifice a twentieth or one-hundredth of their capacity to run a live test of all changes. This is the traditional A/B testing done by many companies.

Taking nibbles out of your problems allows you to focus on the solution. While understanding that component, you will understand how it is used and interacted with. You will find that a nibble can span lots of subsystems. This is where Slack, Moogsoft, and other ChatOps-style tools come into play; they allow you to document issues and concerns so as to retain institutional knowledge.

IoT is forcing us to consider scale. It is forcing us to rethink how we look at security, data, code, and the network. It is forcing us to move through our development cycles at an ever-increasing pace in order to deliver products faster and faster. Those nibbles are coming at us faster, but we cannot do the entire byte until we finish the nibbles first.

Just as an aside, the customer that does 44 billion queries a day is a mid-cap company. Scale is no longer the realm of the Googles, Facebooks, or Netflixes of the world. It is becoming more common. Have you considered how scale will impact your business? Processes? Security? Monitoring? Data protection?

Posted in IT as a Service, Security, Transformation & AgilityTagged , ,