Applications Performance Equals Response Time, not Resource Utilization

When VMware announced vCenter Operations, it combined performance management, capacity management, configuration management with self-learning analytics into one product (right now this is achieved by bundling three VMware products, vC OPS, vC CapacityIQ, and vC Configuration Manger, but integration over time will likely reduce three databases and three consoles into one). VMware now joins the ranks of many vendors who can monitor virtual (and through integration adapters – physical) environments, and who provide performance and capacity management features.

The purpose of this article is explore a very basic question. What are (or what should be) the definitions of performance and capacity in virtualized and cloud based environments? In the physical world the following view has been widely accepted for years:

  • Performance = F(Resource Utilization)
  • Capacity = F(Resource Utilization)

In other words, the performance of a system and the applications running on it is a function of whether or not the resources on the system are being over-used or not. The capacity of a system is a function of whether or not total resources are in danger of running out either during a load spike, or within the procurement cycle for new hardware. This view underlies how the vast majority of the performance and capacity tools at the infrastructure layer work – both in physical and virtualized/cloud based environments. And this is a valid view because it is certainly true that if and when a hardware bottleneck occurs, there will be impacts upon the performance of the applications running on that infrastructure. This is true whether or not that infrastructure is supporting a purely dedicated software stack, or whether it is supporting a virtualized software stack.

Performance and Capacity Management for Virtualized Infrastructures

But does virtualization change how performance and capacity management at the infrastructure layer should be done? Let’s tackle Capacity first. In the physical world, Capacity Planning was done mostly on a quarterly basis because workloads were growing rather slowly and predictably, and the focus was really on making sure that the applications systems stayed aggressively over provisioned as load grew. So Capacity Planning was really about planning future hardware purchases. In a virtualized environment we need to replace Capacity Planning with Capacity Management. This is essential since virtualized environments are sufficiently dynamic that  it is possible for capacity constraints to arise in moments as opposed to months or quarters. Capacity Management has to therefore be done on a continuous basis.

The fact that Capacity Management has to be done continuously in virtualized environments then links it inextricably with performance management for virtualized environments. The two questions become different sides of the same coin. The question of are we running out of capacity right now, and is the infrastructure causing a performance issue for the applications become the same question. The real question is then whether applications performance is suffering due to resource constraints in the infrastructure.

Answering this question through the collection and analysis of resource utilization metrics (for example the vCenter API data used by every monitoring vendor in the VMware ecosystem as well as by vC OPS itself) is an extremely challenging exercise. It is challenging due to the existence of false positives and false negatives. False positives are situations where the utilization of a resource appears to be constrained, but it is not really having any impact upon the application. For example and application could be using 95% of the memory allocated to it, and this could generate a resource utilization alarm, but it could be the case that the application is running quite happily with that amount of memory, and there really is no problem. A False Negative is a much more challenging problem. The problem is whether or not every applications performance issue (meaning the response time of the application) shows up in resource utilization metric going out of bounds. Even knowing every abnormality in resource utilization and every change in configuration is not enough to know whether your have caught every applications performance issue.

The Response Time View

Rather than trying to infer the performance of an application running on VMware vSphere by looking at granular resource utilization and configuration change data, there is a far more straightforward approach. That approach is to directly measure the actual performance (response time) on an end to end basis of every application running in your virtualized infrastructure. The benefit of this approach is that rather than trying to infer what the performance (or lack thereof) is by looking at resource utilization and configuration change data, you know exactly what the performance of the application is. To achieve this you need an applications performance management solution that meets the following criteria:

  • End-to-End and hop-by-hop measurement of every action within an application system on a comprehensive (you get all of the transactions), continuous (you do not sample), and deterministic (you get the actual response time for each one, not an average of N of them).
  • Broad Applications Support. This needs to be true for every application in your environment. You will certainly have specialized tools for applications built to certain applications frameworks. But you need a more general capability that allows you to know what the end to end response time is for every application and that works independently of how the application is built (what language), and what it runs on (what framework, applications run time, and operating system).
  • Zero Configuration. In a dynamic environment where parts of an application come and go, get moved from host to host, and where application are added on the fly by ITaaS initiatives, the performance management solution just needs to work by itself. If a new web server or a new application comes up monitoring for that new component or applications system just need to auto-instantiate.
  • Dynamic and Continuous Application Topology Discovery. It is critical to know what is talking to what, where those pieces are running, what the volume of traffic is between the pieces, and what the response times are as the pieces move around and get created and destroyed.

The Applications Centric View of Performance and Capacity

The single most important issue that virtualization teams must address in order to further the penetration of virtualization to include business critical and performance critical applications is how to assure the performance of these applications once virtualized. The incumbent method of inferring the performance of applications from how the resources are utilized on the servers and networks that support them falls down for the reasons mentioned above. Therefore what is needed is a new view of performance and capacity based upon the one metric that matters – applications response time and how load (transaction rate) affects response time.

Therefore, the new formulas are:

  • Performance = F(response time, transaction rate)
  • Capacity = F(response time, transaction rate)

Consider the graph below. In this graph the required response time for the application of interest is 1 second. This response time threshold is reached when the transaction rate reaches 8900 transactions per second. This is the way in which both performance and capacity need to be assessed in both virtualized and cloud based environments. This approach takes a top down view of applications performance and says, rather than trying to infer applications performance from resource utilization metics, measures it directly and then use abnormalities in both resources utilization and configuration change to point to the probable cause.

Response Time vs Capacity

The graph above points out two essential and new ways to look at performance and capacity. Performance is equal to the response time for the application of interest at the required transaction rate. Capacity is equal to how many transactions can be supported on the infrastructure at the required response time level.

This is the only feasible way in which performance and capacity can be calculated and assessed if virtualization and cloud computing are to grow into addressing business critical and performance critical applications. In fact SLA’s will have to rewritten around this exact concept. Applications owners will (correctly) not consent to the virtualization of their applications unless the team that supports the virtualized infrastructure can guarantee an SLA in the terms outlined above.

The next step will be to define new SLA metrics that incorporate the concepts above, the variability of response time, and the allowed variation therein. The notion of transaction rate, an average response time goal, and an allowable upper bound, combined with transaction rate will have to come to define performance and capacity. Finally, these new metrics will get monetized allowing applications owners to shop for price/performance curves across internal and external service providers on a rational and economic basis.