In general you cannot infer the performance of an application running in a VM from looking at the resource utilization profile of that application or of the VM itself. There are many reasons for this documented in the Performance and Capacity Management White Paper available on this site.
However there is one well understood exception to the above rule. That exception is the CPU Ready metric which measures how long a VM is waiting in a queue for the CPU resource profile it has been allocated. It turns out that if CPU Ready rises, it means that VM’s and the applications that run in them are waiting for their turn to execute. Having VM’s wait around for their turn to get CPU time does have a fairly dramatic effect upon applications performance. Now if you are a sophisticated VMware Admin you are probably saying “tell me something that I do not already know”.
Well this post is not about telling you something that you did not already know, but rather helping you prove something to applications owners that you understand but that they find counter-intuitive. What is the case with virtual CPU’s (vCPU’s) is that less can often be more. In a physical environment, the application owner is biased towards throwing as much CPU horsepower at their application as he can get away with from a budget standpoint. However translating this behavior into how one configures vCPU’s creates problems.
The problems get created when you configure an application as needing more than one vCPU. The issue is that in a VMware environment getting access to one vCPU is typically very easy. But getting access to four vCPU’s at the same time can be a lot harder (especially if there is any prospect for CPU overcommitments in the overall configuration of the load and the allocation of that load across hosts).
The nice folks at VMTurbo provided the graph below that illustrates exactly this point. The point is that in a reasonably busy host, the more vCPU’s you configure as required for an application the more likely that application is going to have to wait around for that set of vCPU’s to become available. In the example below, an application configured for 4 vCPU’s was waiting for 8 seconds out of every 20 seconds for a 4 vCPU allocation to become available (you have to divide the numbers on the Y axis by 2 for the 2vCPU case, and 4 for the 4vCPU case).
So how can this help you with your friends the applications owners? In the physical world, applications owners fight to get as much physical resource for their applications as they can. They buy the largest servers that they can get away with from a CPU and memory perspective because they believe it is cheaper to do this than to risk intermittent performance problems due to spikes in load and temporary constraints in capacity.
In a virtual system the tenancy to translate over-provisioning physical CPU’s into over-provisioning virtual CPU’s can be very harmful as the graph above shows. Assigning four vCPU’s to a VM makes it harder for that VM to get scheduled in as the hypervisor has to wait for four vCPU’s to become available at the same time. It is therefore the case that configuring a smaller number of vCPU’s for an application can actually improve the amount of CPU resource that it actually gets and therefore improve its performance. Investing in tools (like VMTurbo) that do this work for you automatically can help you convince applications owners of this, and thereby help their applications perform better.