Resource/Availablity Monitoring & Infrastructure Performance – Trends and Issues

As VMware 3.x took the enterprise virtualization market by storm in 2008, following by the successful introduction of VMware vSphere 4.0 in 2009, many enterprises discovered the managing the utilization of the key resources on their virtualized systems had some unique challenges associated with it – especially when this problem was compared with resource management on physical servers. VMware early on took a significant step towards solving this problem by collecting a rich set of resource utilization data from its hypervisor and making this data available via the Virtual Center (now vCenter Server) API’s. Many new and established vendors in the resource management business built integration with the VMware API’s.

As these products have matured, it has become clear that the problem of managing resource utilization on virtual servers is different from managing resource utilization on physical servers in the following important respects:

Capacity Planning is Important but Different

Capacity Planning is all about making sure that you are forecasting the growth in the usage of the key resources in your system (CPU, memory, network capacity, SAN capacity, disk capacity, and I/O operations capacity) far enough into the future so that that if a shortfall is forecasted the forecast window out into the future is long enough so that additional physical resources can be procured in implemented within the forecast window. The key differences created by virtualization are that:

  1. With virtualization, many previously separate resources are now aggregated into resource pools which are shared by many applications and workloads.
  2. The linkage between applications and resources is broken due to this aggregation and sharing
  3. Finding the cause in the growth of the resources can be made more difficult by this aggregation and sharing

Capacity Management emerged as a need

Capacity Planning is an activity that most organizations can afford to do on a monthly or even a quarterly basis (assuming no large unplanned growth in resources). However, Capacity Management is defined as the day to day management of the relationship between capacity utilization and the impact of utilization upon infrastructure and applications performance. Capacity Management is crucial in a virtualized system for the following reasons:

  1. The ROI for virtualization came from consolidating N servers into a smaller number of hosts, resulting in a higher level of average resource utilization
  2. Boosting the average level of resource utilization, makes the system more sensitive to peaks in utilization from any one instance of any one application
  3. Due to the shared nature of the virtualized resource pool, peaks in utilization affect more than the one application experiencing the peak – the peak affects all applications which share any of the underlying resources.
  4. Very small and transitory peaks in resource utilization can have repetitive and annoying affects upon applications performance and end user experience – leading to a costly and time consuming cycle of “blamestorming” meetings focused upon hunting for the underlying issue.
  5. The dynamic nature of virtualized systems makes capacity management even more crucial as capacity management solutions include the dynamic and continuous discovery capabilities needed to keep up with automated workload management decisions made by features like VMotion and DRS.

Virtualization is a New Platform

With physical servers, the questions of platform where focused upon whether it was an Intel based server or not, and what operating system was running on the server – as these factors often determined what kinds of Capacity Management solutions were available for the platform. Virtualization has become a platform unto itself, usurping some of the roles that the operating system previously performed. For example the virtualization platform will (in the case of VMware) provide the layer of software that interfaces to the hardware in the data center. The virtualization platform will also schedule the application of resources like CPU execution time and memory usage to the guests which host the applications.

The emergence of virtualization as a platform has created a new set of challenges and opportunities for vendors in this space. The opportunities arise from the fact that VMware continues to build out and deliver ever richer API’s which third party vendors in this space are taking advantage of. These API’s include:

  • The original vCenter API’s which every vendor in this space leverages to get per host and per guest CPU, memory, network I/O and disk I/O statistics from the hypervisor
  • The virtual mirror port (spanned port) on the VMware vSwitch and the Cisco Nexus 1000v. These ports provide management appliances with a read only copy of all of the data flowing through the vSwitch via the mirror port. This provides resource and availability management vendors with a rich stream of network performance data that includes visibility into interactions between guests on a host.
  • The vStorage API’s in the hypervisor. These API’s were originally created to allow strorage array vendors (EMC, NetApp, etc.) to integrate more tightly with VMware and to more easily surface their value added features like replication and de-duplication to the VMware hypervisor. However, it is also the case that a great deal of storage configuration and I/O operations load data is available through these interfaces. This interface will likely be leveraged by some of the resource and availability monitoring vendors to add a more robust view of storage utilization constraints to their capacity planning and capacity management models.
  • The VMSafe API’s. These API’s were designed so that security vendors could plug their firewalls, firewall management, and configuration management products into VMware. However the attractiveness of these API’s is not just limited to the security vendors. Due to the incredibly rich set of configuration and configuration change data which is available through these interfaces, and due to the frequency with which configuration is the root cause of performance issues, it is highly likely that these interfaces will be utilized by resource, availability and performance management vendors as well.
  • As VMware fully integrates the SpringSource technology, in particular the Java run time platforms, it is likely that a new applications level performance management opportunity will arise. The opportunity for performance measurement to be built into the Java run time is so great that it is hard to believe that the smart folks at SpringSource will let this one pass for very long.

Cross Platform Issues

The degree to which third party vendors take advantage of VMware specific interfaces for which clear analogues do not exist in other platforms, has the effect of tying these vendors ever more closely to VMware. There are two problems with this from the vendors’ perspective. The first is that these vendors all would like to, or plan to support other virtualization platforms – the minimum starting with Microsoft Hyper-V. The more they build VMware specific functionality into their products, the more difficult it will be to deliver the same functionality on platforms that do not provide the same interfaces as VMware does. The second issue from a vendor perspective is that VMware is actively competing with the third party vendors in this space with its Hyperic (acquired via SpingSource), CapacityIQ and AppSpeed products. Therefore the third party vendors need to support non-VMware virtualization platforms to establish a point of significant differentiation with respect to the VMwware products, and to position themselves effectively with customers looking for cross virtualization platform performance management solutions.

Resource/Availability Management and Infrastructure Performance Management are Fundamentally Different

In this post, we explain how virtualization has split up what was previously one category into two different categories with different vendors in each. The fundamental difference between the two categories is that Resource/Availability Management looks at how resources are used in hosts and guests. Infrastructure Performance Management is an offshoot of this category whose vendors all collect the same resource data as the resource management vendors, but who also collect additional data that allows infrastructure response time (IRT) to be calculated. IRT is a critically important metric since in a virtualized system it can be relied upon to accurately represent the performance of the virtual infrastructure, while in that same virtual infrastructure resource utilization can no longer be relied upon to be a credible proxy for performance.

Who is Who in Resource/Availability Management and Infrastructure Response Time Management

The table below depicts the vendors that participate in these segments. Note that all of the vendors who participate in the Infrastructure Response Time segment also participate in the Resource/Availability Management segment. This is not to suggest that the IRT vendors are a superset of the Resource/Availability Management vendors – merely that they use some of the same data used by the Resource/Availablity Management vendors.

Vendor Product Resource/Availability Mgmt. Infrastructure Performance Management
Akorri BalancePoint Yes Yes
Netuitive SA Yes No
NetQos Performance Center Yes Yes
up.time Software Uptime 5 Yes No
Veeam Monitor Yes No
Veeam nworks Yes No
Virtual Instruments NetWisdom Yes Yes
Vizioncore vFoglight Yes No
VKernel Capacity Analyzer Yes No
VMware Hyperic Yes No
VMware CapacityIQ Yes No


Capacity Planning and Capacity Management are essential activities for any production virtualization deployment, and should be supported with appropriate tools that support the target hypervisor(s). However, the emerging need in this area is for true Infrastructure Performance Management – as these solutions give the IT Operations staff the information that they need to be able to confidently support Tier 1 applications in production – while being able to demonstrate the performance of the virtualized system to the applications owners and business constituencies.

Posted in IT as a Service, SDDC & Hybrid CloudTagged , , ,