Pets vs Cattle Is Not Reality

Yes, the title is a bit caustic, but I have been giving some serious thought about the attitude of pets vs cattle within a hybrid cloud environment, and every time, it boils down to the conclusion that we shoot cattle because the underlying infrastructure is just not robust enough to treat our cattle like a herd. Instead, we treat them as singletons. I do not know a rancher today who will just shoot their cattle because they strayed into the wrong pasture, or because they ate the wrong thing and got sick. They herd the cattle back to where they belong and often call the veterinarian first. Yet, our clouds do not seem resilient enough to handle this type of behavior.

So, why do we just kill our containerized apps when they do something wrong or when the underlying infrastructure does something wrong? Why are we more interested in death than in resiliency? A resilient environment implies we can herd our cattle from here to there to fix the underlying infrastructure. Ranchers do not look at their cattle and decide to kill them all just because part of their fence is broken. No, they herd the cattle to a new pasture, repair the fence, and herd them back. This is not what third-generation developers are really talking about. They would rather have cattle everywhere, and if one falls ill, ignore it completely—kill it. No rancher can survive this, as they depend on the cattle to bring them money to survive.

The same is true for modern software development. In order to have a resilient system, we either need to have resources everywhere or use limited resources in a better way. One of the things I look for in a cloud is a resilient infrastructure, one that will keep my workloads alive while it undergoes normal maintenance, such as repairing a system. It would politely herd my cattle to a new pasture, then work on the existing system and politely herd it back. Instead, what we get are systems that are shut down or outright destroyed to fix things. My hybrid cloud instances run on a cloud that will just shut systems down without first herding my workloads off to another system. Thereby, I suffer availability and trust issues. But if they just herded my cattle, then availability would be unaffected, and trust would not be impacted.

However, you are saying that I should have that resiliency in my application and that I am at fault for not thinking about this case? No, I am not. My cattle are not pets, but I want to treat my cattle as a herd, not as singletons. Modern distributed applications are coming slowly. As they are developed, they will eventually replace those more traditional applications, but data centers are stuffed full of traditional applications. Applications that depend on resiliency within the infrastructure and not within the application itself. Are these applications considered to be pets? Not really. We may look after the infrastructure, but that is what any good rancher does. A broken fence gets repaired; a sick cow gets quarantined until the veterinarian can take a look.

If we just kill the application and restart a new one, are we restarting a new application with a supply chain security issue within it? Or would it be better to quarantine the bad cow, have a security specialist look at it, decide the root cause, and then kill or reload as necessary? This seems to be the piece that is missing from the pets vs cattle approach. It assumes that the cow on the ranch died in some perfect way or that we killed it on purpose, and that no review of the death is required. An autopsy should be performed. Recreating a system that will corrupt itself once more is really pretty silly.

Ah, but you say, continuous integration (CI) and continuous deployment (CD) are helpful here. And you may be correct, but ONLY if the reason for the failure (either by administrator or by nature) is understood before you propagate even more issues into your application suite.

Then there is the cost of having cattle spread all over everywhere instead of within a resilient cloud. It tends to be expensive to setup cloud instances. If you can afford multiple EBS zones or multiple clouds for your application and have worked out every failover issue related to your data, then please use this method. However, if you do not have the funds for a fully resilient application within the cloud, then you may wish to discover a cloud that does provide resiliency, one who treats my cattle as a herd and not singletons.

In many ways, IT as a Service, CI, and CD are not only about the application but also the infrastructure as code developed around that application. If there is limited infrastructure as code due to cost and other considerations (such as a traditional application), then the resiliency needs to be found elsewhere, generally within the cloud or within the management tools used to manage the hybrid cloud. Tools such as HotLink, CloudVelox, Virtustream, and others. Tools that will improve the resiliency of your hybrid cloud by either moving data between clouds, or by allowing you to use tools that understand herding your cattle vs. killing your cattle.

Or it may be possible to gain the ability to herd your cattle by employing clouds that understand that modern data centers are no longer about killing systems, but are about migrating workloads, such as those from VMware vCloud Air,  any other based on VMware vSphere, or ones that have Live Migration enabled within Xen, KVM, or Hyper-V. While the infrastructure is not pets, it is a fence that occasionally needs to be repaired. Herding your cattle from pasture to pasture is far better than killing them all and starting over.

I wish to herd my cattle, not kill them just because a node is about to fail, an update will be made, or something else is happening to the infrastructure. This would solve one of the hidden dependencies of using the cloud. It would also allow developers to concentrate on workloads and not on reinventing the infrastructure within a distributed environment. That is happening because there is a lack of resiliency in the cloud. This needs to continue, of course, but how do you handle traditional applications within a cloud?

Posted in SDDC & Hybrid Cloud, Transformation & AgilityTagged , , ,