VMware (and Microsoft) continue to make excellent progress driving the penetration of their data center virtualization offerings. Over half of the servers run by VMware customers are now virtualized. The progress has been so good that now it is time to ask two important questions. Is what is left to virtualize different that what already has been virtualized? And, if what is left is virtualizing business critical applications, will running them on the virtualization platform be any different than what we experience today?
VMware’s Virtualization Progress
At its VMworld conference in the Fall of 2012 VMware presented the following graph depicting its progress to date and its expected progress. This graph shows that VMware is approaching virtualizing 60% of the servers in its customer base, and expects to get to 73% in two years.
% of Workloads Virtualized (click image to expand)
Progress Virtualizing Business Critical Applications
Even more impressive than the fact that the overall virtualization penetration is approaching 60% is the fact that VMware has started to accelerate the progress in virtualizing business critical applications. Notice that VMware has accelerated the progress in virtualizing Oracle databases from 3% year over year to 7% year over year, and accelerated the progress virtualizing SAP from 10% year over year to 12% year over year. It is fair to say that SAP is a business critical application, and that in most cases and Oracle database is part of a business critical application so this demonstrates tremendous progress with both the most important applications, and the ones where the business owners have the greatest levels of concern regarding the impact of virtualization upon the operation of the applications.
% of Workload Instances that are Virtualized (click image to expand)
Does the P2V Process Have to Change for Business Critical Applications?
The progress virtualizing business critical applications, and the obvious push to continue that progress raises a very important question. That question is how should the P2V process be modified to ensure that the applications deliver acceptable performance once they are moved from dedicated physical hardware to shared virtualized hardware?
Let’s briefly review how the performance and capacity management part of the P2V process is done today. The most common process is to use the VMware Capacity Planner to assess how the workloads use physical resources and to then map those physical resources into how virtual resources are allocated to those workloads.
VMware Capacity Planner (click image to expand)
The approach of measuring how much CPU, memory, network I/O and disk I/O a workload generates or consumes on physical hardware and then translating that into virtual resource allocations worked reasonably well for tactical low hanging fruit applications. However when it comes to trying to assure the performance of a business critical application on a virtual infrastructure, the resource utilization inference process has the following flaws:
- Virtual resources are not the same as physical resource. A virtual CPU is not exactly the same thing as a physical CPU. The key difference is that the virtual CPU is managed by the hypervisor which imposes some load upon the CPU.
- A virtual resource may not perform in the exact same manner as its physical counterpart. If your workload is doing 100 IOPS and you size the virtual environment so that the capacity for that many I/O operations is there for that workload, there is no guarantee that the end-to-end latency of those I/O operations will be the same in the virtual environment as it was in the physical environment.
- Sharing of resources introduces serious problems. Let’s assume that a workload runs on a server with four cores. The natural step would be to assign that workload four virtual CPU’s. But if that workload is running on a server with eight cores, and there are four other workloads that each want two virtual CPU’s (the server is over-committed), then the hypervisor will find slots with two available virtual CPU’s much more frequently than it will find a slot with four virtual CPU’s at the same time. Therefore the right thing to do might well be to assign fewer virtual CPU’s so that the workload gets schedule in more frequently.
- Finally and most importantly, all of the resource allocation in the world is not going to guarantee acceptable performance. The only way to guarantee acceptable performance is to measure it over on the physical side, use that measurement as the baseline for what is expected on the virtual side and proceed accordingly.
The Role of APM in the Virtualization Process
Measuring Performance and Throughput of Business Critical Applications
Here is an important assertion. If you are going to successfully run business critical and performance critical applications in a shared and dynamic virtualized environment, you will need to instrument those applications for response time and throughput before you virtualize them, use the results of that pre-virtualization assessment as the baseline for the definition of a successful virtual deployment of that application, and then continue to monitor that application with the very same tool in production. If you do not do this then you are setting yourself up for a painful virtualization process (with application owners acting as server huggers), and even of you overcome those objections, a substantial amount of your time spent in blamestorming meetings.
The good news is that there is a great set of new APM tools to choose from. These tools focus upon being easy to implement and easy to operate, which gets around the problems with the previous generation of APM tools. They fall into two categories, which map to what kinds of applications you have. If you have custom developed applications and you need to rapidly find problems in your code, then you want a DevOps focused solution. If you have mixture of purchased, custom developed and compound applications than you are going to want an AppOpps focused tool.
The vendors of two categories of tools are profiled below. Remember that the right time to start with these tools is while the application is still running on physical hardware. Only then will you be able to establish a baseline that will keep you out of blamestorming meetings once the application has been virtualized.
The DevOps Category of APM Tools
| Vendor/Product | Product Focus | Deployment Method | Data Collection Method | Supported App Types | Application Topology Discovery | Cloud Ready | “Zero- Config” | Deep Code Diagnostics |
| AppDynamics | Monitor custom developed Java and .NET applications across internal and external (cloud) deployments | On Premise/SaaS | Agent inside of the Java JVM or the .NET CLR | Java/.NET | ||||
| dynaTrace (Compuware) | Monitoring of complex enteprise applicatons that are based on Java or .NET but which may include complex enterprise middleware like IBM MQ and CICS | On Premise | Agent inside of the Java JVM or the .NET CLR | Java/.NET, Websphere Message Broker CICS, C/C++ |
|
|||
| New Relic RPM | Monitor custom developed Java, .NET, Ruby, Python, and PHP applications across internal and external (cloud) deployments | SaaS | Agent inside of the Java JVM, NET CLR, or the PHP/Python runtime | Ruby/Java/ .NET/PHP/Python |
|
|||
| VMware vFabric APM | Monitor custom developed Java applications in production. Strong integration with the rest of the VMware product line including automated remediation and scaling. | On Premise | Mirror port on the vSphere vSwitch and an agent inside the Java JVM | HTTP/Java/.NET/SQL |
|
The AppOps Category of APM Tools
| Vendor/Product | Product Focus | Deployment Method | Data Collection Method | Supported App Types | Application Topology Discovery | Cloud Ready | “Zero- Config” | Deep Code Diagnostics |
| AppEnsure | Monitor every application in production irrespective of source and deployment | SaaS | Agent inside of the Windows or Linux Operating System | All TCP/IP on Windows or Linux | ||||
| AppFirst | Monitor every application in production irrespective of source and deployment | SaaS | Agent inside of the Windows or Linux Operating System | All TCP/IP on Windows or Linux | ||||
| BlueStripe FactFinder | Monitor every application in production irrespective of source and deployment | On Premise | Agent inside the Windows, Linux, AIX or Sun Operating System | All TCP/IP on Windows, Linus, AIX, or Sun OS |
|
|
||
| Boundary | Monitor the impacts of network flows upon the application | SaaS | Agent inside of the Windows or Linux operating system | All Linux TCP/IP applications | ||||
| Confio Software IgniteVM | Monitor database performance especially in conjunction with the performance of the underlying storage | On Premise | Agentless collection of detailed database data and storage latency data from vSphere | DB2, Oracle, and SQL Server Database applications running on vSphere | ||||
| Correlsense | Monitor every application in production irrespective of source and deployment | On Premise | Agent inside the Windows, Linux, AIX or Sun Operating System | All TCP/IP on Windows, Linus, AIX, or Sun OS | ||||
| ExtraHop Networks | Monitor every application in production irrespective of source and deployment | On Premise | From a mirror port on a physical switch or the vSphere vSwitch | All TCP/IP regardless of platform | ||||
| Splunk | Collection of logs and many other metrics into an easily searchable “big data” database | On Premise/ SaaS | A wide variety of collectors that interface to log sources and other sources of data | An application for which a log of some type is generated |
Conclusion
The process of virtualizing business critical applications should start with using a modern APM tool to establish a response time and throughput baseline for the application while it is still on physical hardware. That baseline should then serve as the reference SLA once the application is virtualized. Performance needs to be defined as response time and throughput, not resource utilization.













