There are many lines and silos in an IT organization. In many IT organizations, the people who care about servers, networks, and storage are in fact three different teams that try hard not to talk to each other. There is often an OS team (for each major OS), which has to talk to the teams that provide the hardware that supports their OS. Virtualization has served as a forcing function to get many of these teams to talk to each other. But what about those applications teams?

In the physical (pre-virtualized) data center, the server, network, storage and OS teams may not have worked closely together. Once you drop virtualization into the picture, you will find that you really cannot get virtualization to work very well unless you get what used to be separate infrastructure fiefdom’s to communicate well with each other.

Virtualization also has a profound impact upon the management of applications. For the following reason. In the pre-virtual physical world the teams (and their business units) that own the applications usually have the budget to go buy more hardware as they need it. Once that application is virtualized (or is threatened to become virtualized) the ability to “throw hardware at the problem” goes away as the application starts to run in a shared virtual data center.

This causes many applications owners to actively resist the virtualization of their applications. In fact it results in the “stall” of many virtualization projects, as differing and irreconcilable political interests are thrown into the mix. IT Operations wants to run every application on a homogeneous shared virtual infrastructure. The owner of Application A demands a certain amount of guaranteed resource, and a guaranteed response time for his critical transactions. Other applications owners place similar but competing demands upon the virtualization team.

Once this occurs whoever is running the virtualization project needs to call timeout, and ask two  very simple questions:

  1. How is the performance of this application that you care about so much being measured in the physical environment?
  2. Does that method of measuring and therefore ensuring the performance of that application translate over into its new virtualized environment?

Now what you are likely to find is this. For all of the yelling and screaming about applications performance, you will likely find that very few of the applications that are being yelled about, are in fact truly instrumented for real performance – which is response time of the application system as delivered by the system to its edge (forgetting about the network between the edge of the application system and the end user, and the end user screen paint time). What you will find is that the performance of most applications is being inferred by looking at whether or not they are using normal amounts of resources.

The reason for all of the yelling and screaming that goes on when the discussion of virtualization occurs, is that the applications owners know that they really do not have the question of true performance (response time) under control before the application gets virtualized, and they (rationally) expect the problem to get worse once it gets virtualized.

The Gordian Knot

A Gordian Knot is a mythical knot that cannot be untied. Trying to virtualize business critical applications in the absence of real data about their performance before and after virtualization can feel like a Gordan Knot – an intractable problem that feels like it has no solution.

There is in fact a solution. The solution is for the IT team that wishes to virtualize an application to take responsibility for the response time of that application end-to-end and hop-by-hop. In fact the Virtualization Team should offer to take responsiblity for the the end-to-end and hop-by-hop performance of EVERY application running on the virtual infrastructure.

Since the silo’ed IT organization over on the physical side never offered to do this, this would constitute a major benefit to the virtualization of these applications and offer a path to the untying of that Gordian Knot.

But How

The key to this process is to find a tool that can do the following things:

  1. The tool works in physical and virtual environments. Why physical as well you say? Because if response time is not being measured today, you will want to baseline the response time profile of the application before you virtualize it so that you can prove that you are delivering equal or better service once it is virtualized.
  2. The tool works across the operating systems that your applications use (Windows, Linux, maybe a few others)
  3. The tool works for all of your applications no matter how they are procured (purchased or developed), and if developed works irrespective of how they were developed (C, C++, VB, ASP, Java, .NET, PHP, Ruby, Perl, Python, etc., etc.) – as long as the application runs on a supported operating system.
  4. The tool automatically discovers your applications and automatically discovers how each part of an application is talking to other parts of the application. This is essential for the IT use case, as no one in IT has the time to configure tools for each application.
  5. Finally, of course, the tool figures out end-to-end and hop-by-hop response time automatically with no manual per application configuration required.

The P2V Process

Once you have such a tool, the process for using it is simple. Deploy it on the physical servers that are running an application that you want to virtualize. Profile the response time, transaction load, network load, and storage load of the application in its physical environment (this will be invaluable information for the virtualization engineering team as it will help the successful design of the virtualized resources supporting the application).

Now comes the most important part. Show the response time and resource utilization profile to the applications owner. Come to an agreement about what is required for them to be happy with the performance of the application. You might want to go so far is to commit to a response time based SLA for the application. Note that since no one in IT has probably ever talked to this applications owner about response time before, IT will be perceived as being very enlightened and forward thinking for framing the conversation in this terms.

Now you can effectively untie that Gordian knot and virtualize that application – without getting screamed at.

Managing Performance Critical Applications in Production

Everything up until now was the appetizer – this is the main course. The goal is to be able to manage the performance of business critical and performance critical applications on your virtualized infrastructure, promise better response time profiles (the trade-off between response time and load), and deliver it. This means using your new tool to automatically discover applications as they arrive in your environment, automatically discover their topology (and keep this up to date as things move around), and measure end-to-end and hop-by-hope response times.

In many enterprises virtualization is stuck at 35% or 40%. Moving beyond this set of applications requires that the virtualization team step up to managing the actual response time profile of the applications that are yet to be virtualized. This will create a benefit to virtualization that is relevant to the applications owners and facilitate the virtualization process.

If you are wondering what tools you should choose from, come back next week where a review of the tools that can do this for you will be performance management post of the week.

Summary

Applications Performance Profiling is an essential step in the process of virtualizing business critical and performance critical applications. In this case “performance” means response time not resource utilization. The virtualization team should go even further and commit to meeting response time based SLA’s for business and performance critical virtualized applications. This process will dramatically accelerate the virtualization of these applications and remove many of the current political objections to the virtualization of these applications.

 

Share this Article:

Share Button
Bernd Harzog (336 Posts)

Bernd Harzog is the Analyst at The Virtualization Practice for Performance and Capacity Management and IT as a Service (Private Cloud).

Bernd is also the CEO and founder of APM Experts a company that provides strategic marketing services to vendors in the virtualization performance management, and application performance management markets.

Prior to these two companies, Bernd was the CEO of RTO Software, the VP Products at Netuitive, a General Manager at Xcellenet, and Research Director for Systems Software at Gartner Group. Bernd has an MBA in Marketing from the University of Chicago.

Connect with Bernd Harzog:


Related Posts:

7 comments for “Should VMware Administrators Care About Applications Performance

  1. June 30, 2011 at 7:21 PM

    Bernd

    I’m not sure that I agree with you on the lack of application performance monitoring. Yes, application resource monitoring is frequently used as a proxy for application performance, but every user facing application is (by definition) being monitored by its users.

    The user doesn’t know what resources an application uses, but they certainly know and are not afraid to report any performance problem that any part of the application incurs. Integrating that type of unstructured performance feedback into data center operations is difficult, but the end-user/helpdesk alerting interface is a starting point that should not be overlooked.

    FTI – Integralis (now part of VMware) have been working on solutions that can learn the inter-relationship between application performance and resource utilization (amongst other things) abd may be worth looking at in this context.

    Regards

    Simon

  2. July 2, 2011 at 6:39 AM

    Simon,
    I agree that user complaints is one of the best indications if application performance is ok or not.

    However, relying on users to call you to report a problem has several disadvantages:
    1. For customer facing apps who do not function properly, you risk losing customers to your competition. For example, an insurance agent that can’t generate a new policy with a certain insurance company because the application does not respond, is more likely to try the competition than calling the support/helpdesk.

    2. Internal users do not always pick up the phone and dial. They are busy, or simply “lost faith” so they don’t call to report new problems.

    3. Relying on end-user reports lacks accurate measurements that are necessary to understand if there are real problems, what is the scope of the problem, how many users are affected etc. Sometimes it is even a challenge to realize what application they refer to, or what activity within the application is slowing down, and sometimes they just think there’s something wrong while there isn’t.

    A key solution in such complex, hybrid environments, should be composed of several major capabilities:
    1. Real user monitoring – that allows you to see real time and historical data on the performance experienced by your end-users, allowing slice-and-dice by geographical location, applications used, type of user activity (a.k.a “the business transaction”), etc
    2. A transaction management tool that provides the capability to explain any user experience by providing accurate data for how the user experience time, for each type of user activity (“list stocks”, “get quote”, “check out”,…) is broken down across all physical, virtual and logical elements of the application layers, through proxies, web servers, application servers, web services, databases, EAIs etc (as Bernd mention – “hop by hop” metering of transactions)

    They two capabilities will assure you that:
    1. You are always aware of end-user experience problems, on real time, before many users are affected
    2. You are able to quickly isolate the element (physical server? virtual server? specific JVM? database? LDAP?) that is causing the performance degradation, whether for the entire application or for a specific type of user activity

    The “But How” section of this post mentions critical criteria for choosing such a tool and assuring a successful, low-cost and fast deployment.

    Regards,
    – Nir

  3. Bharzog
    July 2, 2011 at 12:44 PM

    Simon,

    I guess my position is that if you wait for the user to complain about the performance of the application you have lost the battle and lost the war. You have saddled yourself with a call center to field user complaints, and a manual and reactive process that relies upon incomplete and mis-leading data (information from users about the problem), that is guaranteed to take too long and cost too much.

    So in my view, relying upon users to tell you if the application is working well or not is the worst of all alternatives. The second worst one is to try to infer the performance of the application from resource utilization metrics (like the ones collected by vCenter) – that will not work either. The only thing that will work is something that passively monitors every request for work as it comes into the edge of the application system, and times how long it takes for that request to be completed by the application and its supporting infrastructure.

    Regards,

    Bernd

  4. July 4, 2011 at 6:19 AM

    Simon, I think you mean Integrien (now powering vCenter Operations) not Integralis (a telecoms/hosting company). :)

  5. July 5, 2011 at 7:12 PM

    Rob – thanks for the correction. it was Integrien that I was thinking of; an almost forgivable imstake given the similarity of names involved.

  6. July 5, 2011 at 7:26 PM

    Nir – I agree with all the points your are making, my comment was limited to calling attention to the contribution that end-users can make to the application performance monitoring picture.

    Developing a comprehensive application performance monitoring system can take longer to implement than the application it is monitoring, and may in some cases not offer a reasonable ROI. Even where automated systems can be quickly deployed and are provably cost effective the value of end-user in the performance monitoring picture should not be overlooked and should be actively encouraged.

    Regards

    Simon

  7. Henrik Magnusson
    July 13, 2011 at 3:11 AM

    Good article.
    Apart from the benefit of proactive-ness of monitoring user performance, not server performance would perhaps be the ability to break this down into logical steps where each could have its own SLA. So in the case of when the helpdesk phones go warm or before IT Service Management would know not only that the business application is running slow but also what part of it? I’m thinking here of certain database queries taking too long to run, web server loadbalancing that has gone askew etc etc.

Leave a Reply

Your email address will not be published. Required fields are marked *


four × 8 =