All posts by Bernd Harzog

Bernd Harzog is the Analyst at The Virtualization Practice for Performance and Capacity Management and IT as a Service (Private Cloud).Bernd is also the CEO and founder of APM Experts a company that provides strategic marketing services to vendors in the virtualization performance management, and application performance management markets.Prior to these two companies, Bernd was the CEO of RTO Software, the VP Products at Netuitive, a General Manager at Xcellenet, and Research Director for Systems Software at Gartner Group. Bernd has an MBA in Marketing from the University of Chicago.

Virtualization and Cloud Management Upend the Traditional Management Software Business

CloudComputingBig changes are afoot in the management software business. First Quest Software agrees to get acquired by a private equity firm – usually a sign that some cuts need to be made that would trash a company’s stock if it were publicly held.  Then rumors crop up that Dell was going to acquire Quest which would certainly transform the virtualization management business as discussed in this post.  Now comes news from Bloomberg, that the Dell/Quest deal is off, at least for the time being. So what is really going on here? Continue reading Virtualization and Cloud Management Upend the Traditional Management Software Business

Dell Transforms Virtualization Management with Quest Buy

Dell_Circle_100x30In “Dell a Virtualization Management Leader?” posted almost a year ago, we explored how Dell might combine the product assets that it has licensed from DynamicOps (sold by Dell as VIC Creator – see the product review here). The basic idea was the monitoring of the virtualized environment would be combined with the ability of VIS Creator to dynamically provision services so that dynamically provisioned services could be offered with performance and availability assurances. The idea that Dell could bring the entire portfolio of Quest assets to bear fundamentally transforms both the notion of automated service assurance of dynamically provisioned services, and the entire systems management business. Continue reading Dell Transforms Virtualization Management with Quest Buy

The IaaS Cloud Performance Management Problem

PerformanceManagementInfrastructure as a Service (IaaS) clouds allow you quickly provision and scale up operating systems images (that you can then do with what you want). However, the nature of IaaS offerings is that the cloud provider purposely obscures that is really going on in his hardware environment from his customers. This leads to the “noisy neighbor” problem, as well as many other problems the customer of the cloud provider is left guessing as to what is really going on.

The Cloud Customer’s View of IaaS Cloud Performance

Using Amazon as an example, the only real view into “performance” that the customer of Amazon has is through the CloudWatch services provided by Amazon to its customers. At first glance CloudWatch seems to provide a wealth of information. The basic list of what is provided is below:

  • Basic Monitoring for Amazon EC2 instances: seven pre-selected metrics at five-minute frequency, free of charge.
  • Detailed Monitoring for Amazon EC2 instances: seven pre-selected metrics at one-minute frequency, for an additional charge.
  • Amazon EBS volumes: eight pre-selected metrics at five-minute frequency, free of charge.
  • Elastic Load Balancers: ten pre-selected metrics at one-minute frequency, free of charge.
  • Amazon RDS DB instances: thirteen pre-selected metrics at one-minute frequency, free of charge.
  • Amazon SQS queues: eight pre-selected metrics at five-minute frequency, free of charge.
  • Amazon SNS topics: four pre-selected metrics at five-minute frequency, free of charge.
  • Amazon ElastiCache nodes: twenty-nine pre-selected metrics at one-minute frequency, free of charge.
  • Amazon DynamoDB tables: seven pre-selected metrics at five-minute frequency, free of charge.
  • AWS Storage Gateways: eleven pre-selected gateway metrics and five pre-selected storage volume metrics at five-minute frequency, free of charge.
  • Amazon Elastic MapReduce job flows: twenty-three pre-selected metrics at five-minute frequency, free of charge.
  • Auto Scaling groups: seven pre-selected metrics at one-minute frequency, optional and charged at standard pricing.
However, there are three huge flaws with the entire approach and structure of CloudWatch:
  • In general, what CloudWatch is providing is a view into the resource consumption and activity of your instances. This view is virtual in nature, and is abstracted from the underlying physical reality of supporting the environment. In other words, when CloudWatch tells you how much CPU your instance is using, that is a function of how much the instance is using divided by how much has been allocated to that instance. The actual amount of CPU that is available on the physical server that your instance is running on never factors into the equation at all. The same is true for all of the CloudWatch metrics – they are from a virtual perspective, and do not surface any contention that may be occurring at the physical layer in the infrastructure.
  • As you can see above, you can get some metrics for free at 5 minute intervals, and more metrics at 1 minute intervals if you pay for them. In The Real-Time Big Data vSphere Management Problem,  we discussed the need for real time metrics, not one minute or five minute metrics (as way too much can go wrong in 59 seconds, or 4 minutes and 59 seconds). This same need presents itself here. Running performance critical applications in a shared tenant public cloud and only getting, in the best case, visibility every one minute is just not going to work for a lot of people.
  • The focus upon resource utilization metrics as a proxy for infrastructure performance is fatally flawed. In Timekeeping in VMware Virtual Machines, VMware gets credit for being brutally honest about what happens to time based metrics collected from the perspective of the virtual machines (hint, opening Task Manager in a virtualized instance of Windows Server is an exercise in futility). The only way around this is to measure end-to-end Infrastructure Latency and to surface it as the metric that definitively demonstrates the Quality of Service that the cloud provider is providing his or her customer. The absence of such an approach by cloud vendors to date (including Amazon) will limit the adoption of public cloud services, as the metrics provided by CloudWatch are a poor substitute for measuring what is really going on and how long it is taking.

The Cloud Providers View of IaaS Cloud Performance

Now from the perceptive of the cloud vendor all is well and good. Supposedly the cloud vendor is making sure that resource bottlenecks are not impacting workload performance, but we really do not know if they are or not. There is no SLA that commits the cloud provider to this, and no data is forthcoming as to how will it is being done if at all. Furthermore, it is well understood that the way that the cloud provider makes money is by sharing his underlying physical hardware  to a greater degree (and not disclosing this fact) than the enterprise customer would like be comfortable with.

Summary

The IaaS Cloud Performance Management Problem will continue to be one of two major factors impeding the adoption of public cloud services (multi-tenant security being the other one). Inferring performance from resource utilization metrics does not work in a simple single tenant virtualized environment (vSphere in your data center). It is worse than useless in muti-tenant public cloud environments that are build up upon a virtualization platform. The only known fix for this issue is for the cloud vendors to embrace end-to-end infrastructure latency as the quality of service metric and to surface this metric on a per tenant and per image basis to their customers.

The Real-Time Big Data vSphere Management Problem

Virtualization ManagementA very interesting thing happens as your vSphere environment scales up. That every interesting thing is that the larger your environment gets, the more frequently you need data about its performance, capacity and configuration state. This is simply because the more things that there are in the environment, the more likely it is that something is wrong with one of them at any moment in time. Continue reading The Real-Time Big Data vSphere Management Problem

News: VMTurbo Delivers End-To-End Automated Service Assurance

VMTurboVMTurbo is the only vendor offering automated service assurance in the virtualization ecosystem today. Automated service assurance means that you identify the applications that are the most important to you (and the ones that are not), you  assign them budgets of virtual resources, and VMTurbo ensures that the service level of the most important applications is not negatively impacted by the resource requirements of less important applications or workloads. Continue reading News: VMTurbo Delivers End-To-End Automated Service Assurance

On Premise vs. Monitoring as a Service – Considerations and Tradeoffs

PerformanceManagementWe all pretty much know that we can buy Infrastructure as a Service (IaaS), Development/Run time Platforms as a Service (PaaS), Software as a Service (Saas), Security as a Service, Cloud Storage as a Service, among other things – but we can also buy monitoring as a service. We can buy monitoring at both the infrastructure level and the application level as a service. This is an intriguing idea, and one that is rapidly gaining traction. However Monitoring as a Service (MaaS) carries with it some unique benefits, but it also carries with it some trade-offs especially when evaluated against on-premise solutions. Continue reading On Premise vs. Monitoring as a Service – Considerations and Tradeoffs

Plugin by Social Author Bio