In “Beware of the Franken-Monitor,” we outlined why systems management frameworks have become Franken-Monitors and the dangers of building your own Franken-Monitor. Unfortunately, the reality for most enterprises is that they have probably done some combination of the two. They have bought a framework that became a Franken-Monitor because the vendor of the framework never integrated its acquired components. And then they layered more “Frankeness” on top of their framework by buying, in some cases, hundreds of point monitoring solutions, none of which are integrated with each other or the frameworks.
Getting Rid of Your Franken-Monitor: The Why
Frameworks failed because they made a promise that could not be kept. They promised that one product could monitor everything. This promise could not be kept because the world kept changing so fast that the vendor of the framework could not keep it up to date. They could not keep it up to date with internal software development. And they could not keep it up to date by acquiring small companies that met new requirements and integrating the new product into the old.
However, in the last five years, three things have occurred that have increased the pace of change in IT and that now create an imperative to replace your Franken-Monitor with a new approach:
- Data Center Virtualization: Pioneered and successfully commercialized by VMware, data center virtualization allowed IT organizations to become fundamentally more agile. Since each server was now just a file, and many different kinds of servers could be managed in a similar manner, it became possible to introduce new technologies into the data center more quickly. For example, Veeam was able to solve the problem of being able to back up and restore many different kinds of virtual machines in one backup product, something that would have been much more difficult to do with disparate physical systems. VMTurbo was able to solve the problem of workload prioritization, something that was impossible to do in the physical world because the interfaces to control how shared resources were allocated across workloads simply did not exist.
- Distribution of Workloads: The IT environment has become fundamentally more distributed. Commodity hardware has allowed for workloads to be scaled out across hundreds and thousands of servers. Servers are distributed across multiple data centers owned by enterprises. Servers are distributed across internal data centers in private clouds, and shared data centers across hybrid and public clouds.
- Explosion in Demand for Business Software: Enterprises worldwide in many different industries have realized that if they do not automate in software and compete online in software, they face extinction at the hands of the competitors who master these skills. This has lead to an explosion in the demand for custom-built software. This in turn has lead to a proliferation of languages and tools as well as new methodologies like Agile Development and DevOps, designed to deal with the rapid development and rapid iteration of application systems.
The bottom line is that for most enterprises, most if not all of the management software in use today is not equipped to deal with the three dynamics listed above. In order to cope with the these dynamics, monitoring products must meet the requirements below:
- Products whose job it is to measure and assess infrastructure performance should approach this problem by measuring infrastructure latency. Any product that attempts to infer infrastructure performance from resource utilization statistics is useless and should be retired. Resource utilization metrics are essential for capacity planning, but they are useless for performance management of the infrastructure.
- Similarly, products whose job it is to measure and assess application performance should approach this problem from the perspective of application response time. You cannot infer the performance of an application running in a virtualized environment or a cloud by looking at resource utilization metrics; products that use this approach should be thrown out.
- Monitoring products need to work in scaled out and distributed environments. They need to discover those environments with zero to minimal configuration, and they need to work in environments that are distributed across internal data centers and hybrid/public clouds. If your management solution cannot treat a virtual machine in your data center and an image running at Amazon in the same way in the same console, throw it out.
- Monitoring products need to be built from the ground up for highly dynamic infrastructure and rapidly changing applications. That means that data needs to be collected frequently and with high fidelity. Monitoring products need to automatically adjust to changes in the infrastructure and the application. For example, if an application moves from an internal data center to a cloud, the application performance management solution should just continue to work without a requirement for a massive new deployment or set of configuration changes.
Getting Rid of Your Franken-Monitor: The How
Now that you know whey you need to get rid of your legacy Franken-Monitor based management solutions, let’s address how. First of all, let’s state the obvious. For many enterprises, Frameworks are like asbestos in the walls of the data center. It would be cheaper to tear down the data center and build a new one than to try to remove the asbestos. This is the case because many legacy management solutions are embedded in so many processes that it seems like a hopelessly complex task to replace all of those processes and the tools that go along with them (in fact, the vendors of legacy Franken-Monitors are banking upon this complexity being a barrier to the replacement of their products).
With the above realities in mind, the suggested process for replacing your legacy management frameworks and Franken-Monitors is:
- You are in a hole. Stop digging. This means stop making the Franken-Monitor problem worse by continuing to add to the list of point tools that address point problems but that are not integrated with anything.
- Stop digging by putting in place a monitoring strategy, a monitoring architecture, and an architectural review board that all new monitoring solutions must pass before purchase is approved. Of course, you should allow exceptions when warranted, but the idea is to define your future state and allow the procurement of solutions that move you along the road toward that future state.
- Pick a green-field environment where you can start fresh. For many enterprises, the virtualization team declared that their VMware environment was such a green-field opportunity to do things differently that they uninstalled all of the legacy monitoring and management tools and started over. This jump started a new management software industry. For enterprises that did not start over when they virtualized servers, perhaps the hybrid cloud is the place to start. Certainly when you put production workloads in a public cloud like Amazon, you should start with a clean sheet of paper when it comes to your entire management and monitoring tool set.
- Hollow out the legacy management environment over time. If you can declare your VMware and Hyper-V environments as “new” and your physical environments as “old,” then build a new management stack for the new environment and relegate the frameworks and the Franken-Monitors to the old environment. Over time, as the new environment replaces the old environment, you can get rid of the old management tools.
- Demonstrate the benefits of the new environment to management. Show them the improved availability that comes from frequent and high-fidelity monitoring. Show them the improved ability to manage infrastructure and application performance through a focus on latency and response time. Show them the reduction in the number and duration of blamestorming meetings.
- Recognize that no single tool from any single vendor will ever be able to monitor or manage everything. Pick an ecosystem-based approach where you know you can buy tools from multiple vendors that work together.
- Let people buy the management tools that they want, IF they conform to your management tool reference architecture. For example, if you declare that Splunk is going to be the back end data store for ALL of your monitoring data, then let groups and departments buy any tool they want as long as it puts its data into Splunk.
The steps above are designed to accomplish two objectives. The first is to put frameworks and Franken-Monitors on a glide path to oblivion. The second is to give everyone in the organization a way to participate in the new world that is immediately and obviously better than the old world. This will allow you to slowly remove all of the oxygen from the room that contains the framework, while feeding the new environment that really meets your needs.
Getting rid of your Franken-Monitor right now is both possible and desirable if you take an incremental approach. Such a method should slowly phase out the Franken-Monitor while replacing it in new environments with an approach that is flexible and scalable enough to meet your future needs.