VMware (and Microsoft) continue to make excellent progress driving the penetration of their data center virtualization offerings. Over half of the servers run by VMware customers are now virtualized. The progress has been so good that now it is time to ask two important questions. Is what is left to virtualize different that what already has been virtualized? And, if what is left is virtualizing business critical applications, will running them on the virtualization platform be any different than what we experience today?

VMware’s Virtualization Progress

At its VMworld conference in the Fall of 2012 VMware presented the following graph depicting its progress to date and its expected progress. This graph shows that VMware is approaching virtualizing 60% of the servers in its customer base, and expects to get to 73% in two years.

% of Workloads Virtualized (click image to expand)

Virtualizing Business Critical Applications: Progress

Progress Virtualizing Business Critical Applications

Even more impressive than the fact that the overall virtualization penetration is approaching 60% is the fact that VMware has started to accelerate the progress in virtualizing business critical applications. Notice that VMware has accelerated the progress in virtualizing Oracle databases from 3% year over year to 7% year over year, and accelerated the progress virtualizing SAP from 10% year over year to 12% year over year. It is fair to say that SAP is a business critical application, and that in most cases and Oracle database is part of a business critical application so this demonstrates tremendous progress with both the most important applications, and the ones where the business owners have the greatest levels of concern regarding the impact of virtualization upon the operation of the applications.

% of Workload Instances that are Virtualized (click image to expand)

Virtualizing Business Critical Applications: SAP and Oracle

Does the P2V Process Have to Change for Business Critical Applications?

The progress virtualizing business critical applications, and the obvious push to continue that progress raises a very important question. That question is how should the P2V process be modified to ensure that the applications deliver acceptable performance once they are moved from dedicated physical hardware to shared virtualized hardware?

Let’s briefly review how the performance and capacity management part of the P2V process is done today. The most common process is to use the VMware Capacity Planner to assess how the workloads use physical resources and to then map those physical resources into how virtual resources are allocated to those workloads.

VMware Capacity Planner (click image to expand)

Virtualizing Business Critical Applications: Capacity Analyzer

 The approach of measuring how much CPU, memory, network I/O and disk I/O a workload generates or consumes on physical hardware and then translating that into virtual resource allocations worked reasonably well for tactical low hanging fruit applications. However when it comes to trying to assure the performance of a business critical application on a virtual infrastructure, the resource utilization inference process has the following flaws:

  1. Virtual resources are not the same as physical resource. A virtual CPU is not exactly the same thing as a physical CPU. The key difference is that the virtual CPU is managed by the hypervisor which imposes some load upon the CPU.
  2. A virtual resource may not perform in the exact same manner as its physical counterpart. If your workload is doing 100 IOPS and you size the virtual environment so that the capacity for that many I/O operations is there for that workload, there is no guarantee that the end-to-end latency of those I/O operations will be the same in the virtual environment as it was in the physical environment.
  3. Sharing of resources introduces serious problems. Let’s assume that a workload runs on a server with four cores. The natural step would be to assign that workload four virtual CPU’s. But if that workload is running on a server with eight cores, and there are four other workloads that each want two virtual CPU’s (the server is over-committed), then the hypervisor will find slots with two available virtual CPU’s much more frequently than it will find a slot with four virtual CPU’s at the same time.  Therefore the right thing to do might well be to assign fewer virtual CPU’s so that the workload gets schedule in more frequently.
  4. Finally and most importantly, all of the resource allocation in the world is not going to guarantee acceptable performance. The only way to guarantee acceptable performance is to measure it over on the physical side, use that measurement as the baseline for what is expected on the virtual side and proceed accordingly.

The Role of APM in the Virtualization Process

First of all let’s define success when it comes to the virtualization of a business critical or performance critical application. Success means that the application owner is satisfied with the response time and throughput of the application. Success means that the users of the application are getting their jobs done without the application getting in their way. Success means that users are not calling the application owner complaining about slowness in the application or stalled transactions. Success means that the application owner is not pulling expensive and busy IT operations staff and architects into blamestorming meetings (the objective of a blamestorming meeting is to assign the blame for an application problem).  The important thing is that none of these success criteria can be met by ensuring that the application is getting a sufficient or normal amount of resource.
The success criteria can only be met by measuring the response time and throughput of the application and using those metrics (and not resource utilization) as the SLA baselines for performance. So the critical first step is to define performance as response time and throughput and not as resource utilization. Once you have redefined performance as response time and throughput, you then need to measure it. Modern APM tools contain response time vs. load Scalability Analysis as a feature. The graph below is from the Scalability Analysis done by New Relic of the TVP site which shows that as load increases, response time is stable below the desired threshold of 1000MS. This is an example of a web site that has to always be up, always deliver acceptable response time, and is running in a production VMware vSphere environment.
Response Time vs Requests Per Minute Scalability Analysis (click image to expand)
Scalability.Analysis

Measuring Performance and Throughput of Business Critical Applications

Here is an important assertion. If you are going to successfully run business critical and performance critical applications in a shared and dynamic virtualized environment, you will need to instrument those applications for response time and throughput before you virtualize them, use the results of that pre-virtualization assessment as the baseline for the definition of a successful virtual deployment of that application, and then continue to monitor that application with the very same tool in production. If you do not do this then you are setting yourself up for a painful virtualization process (with application owners acting as server huggers), and even of you overcome those objections, a substantial amount of your time spent in blamestorming meetings.

The good news is that there is a great set of new APM tools to choose from. These tools focus upon being easy to implement and easy to operate, which gets around the problems with the previous generation of APM tools. They fall into two categories, which map to what kinds of applications you have. If you have custom developed applications and you need to rapidly find problems in your code, then you want a DevOps focused solution. If you have  mixture of purchased, custom developed and compound applications than you are going to want an AppOpps focused tool.

The vendors of  two categories of tools are profiled below. Remember that the right time to start with these tools is while the application is still running on physical hardware. Only then will you be able to establish a baseline that will keep you out of blamestorming meetings once the application has been virtualized.

The DevOps Category of APM Tools

Vendor/Product Product Focus Deployment Method Data Collection Method Supported App Types Application Topology Discovery Cloud Ready “Zero- Config” Deep Code Diagnostics
AppDynamics Monitor custom developed Java and .NET applications across internal and external (cloud) deployments On Premise/SaaS Agent inside of the Java JVM or the .NET CLR Java/.NET
dynaTrace (Compuware) Monitoring of complex enteprise applicatons that are based on Java or .NET but which may include complex enterprise middleware like IBM MQ and CICS On Premise Agent inside of the Java JVM or the .NET CLR Java/.NET, Websphere Message Broker CICS, C/C++

New Relic RPM Monitor custom developed Java, .NET, Ruby, Python, and PHP applications across internal and external (cloud) deployments SaaS Agent inside of the Java JVM, NET CLR, or the PHP/Python runtime Ruby/Java/ .NET/PHP/Python

VMware vFabric APM Monitor custom developed Java applications in production. Strong integration with the rest of the VMware product line including automated remediation and scaling. On Premise Mirror port on the vSphere vSwitch and an agent inside the Java JVM HTTP/Java/.NET/SQL

 

The AppOps Category of APM Tools

Vendor/Product Product Focus Deployment Method Data Collection Method Supported App Types Application Topology Discovery Cloud Ready “Zero- Config” Deep Code Diagnostics
AppEnsure Monitor every application in production irrespective of source and deployment SaaS Agent inside of the Windows or Linux Operating System All TCP/IP on Windows or Linux
AppFirst Monitor every application in production irrespective of source and deployment SaaS Agent inside of the Windows or Linux Operating System All TCP/IP on Windows or Linux
BlueStripe FactFinder Monitor every application in production irrespective of source and deployment On Premise Agent inside the Windows, Linux, AIX or Sun Operating System All TCP/IP on Windows, Linus, AIX, or Sun OS

Boundary Monitor the impacts of network flows upon the application SaaS Agent inside of the Windows or Linux operating system All Linux TCP/IP applications
Confio Software IgniteVM Monitor database performance especially in conjunction with the performance of the underlying storage On Premise Agentless collection of detailed database data and storage latency data from vSphere DB2, Oracle, and SQL Server Database applications running on vSphere
Correlsense Monitor every application in production irrespective of source and deployment On Premise Agent inside the Windows, Linux, AIX or Sun Operating System All TCP/IP on Windows, Linus, AIX, or Sun OS
ExtraHop Networks Monitor every application in production irrespective of source and deployment On Premise From a mirror port on a physical switch or the vSphere vSwitch All TCP/IP regardless of platform
Splunk Collection of logs and many other metrics into an easily searchable “big data” database On Premise/ SaaS A wide variety of collectors that interface to log sources and other sources of data An application for which a log of some type is generated

Conclusion

The process of virtualizing business critical applications should start with using a modern APM tool to establish a response time and throughput baseline for the application while it is still on physical hardware. That baseline should then serve as the reference SLA once the application is virtualized. Performance needs to be defined as response time and throughput, not resource utilization.

Share this Article:

Share Button
Bernd Harzog (326 Posts)

Bernd Harzog is the Analyst at The Virtualization Practice for Performance and Capacity Management and IT as a Service (Private Cloud).

Bernd is also the CEO and founder of APM Experts a company that provides strategic marketing services to vendors in the virtualization performance management, and application performance management markets.

Prior to these two companies, Bernd was the CEO of RTO Software, the VP Products at Netuitive, a General Manager at Xcellenet, and Research Director for Systems Software at Gartner Group. Bernd has an MBA in Marketing from the University of Chicago.

Connect with Bernd Harzog:


Related Posts:

Leave a Reply

Your email address will not be published. Required fields are marked *


7 + eight =