Cloud Performance – Learning from

One of the concerns that many organizations have about putting business critical applications into public clouds is how to ensure the performance and delivered end user experience of those applications. This is an area where the cloud providers need to show some real leadership, and with the exception of have not done so.

The issue when it comes to applications hosted in the cloud, or applications delivered via a SaaS model to customers is that the traditional method of inferring performance by looking at resource utilization statistics simply does not work. It does not work because in a public cloud the resource utilization statistics that are often surfaced by the cloud vendor to the customer of the cloud vendor are worse than useless. The reason for this is simple. If your cloud vendor tells you that you are using X% of your memory or x% of your CPU what does that number really mean?

What it means is that you are using X% of what you have been allocated, not X% of what is truly available in terms of hardware resources. The cloud vendor is not going share with you how they are allocating their hardware resources among their customers, as this is how the cloud vendors make their money – by sharing those resources to a degree that the customers do not realize. The data provided by the Amazon CloudWatch service is probably the best example of being misleading to the point of being worse than useless in this regard.

It is also the case the customers really should not care how the cloud vendor is allocating resources. All customers should care about is what is the level of performance that the application system is delivering from the perspective of the edge of the cloud vendor’s environment. A crucial insight here is that performance is equal to response time, not any other metric having to do with load or activity. So what is needed in the Infrastructure as a Service cloud business, the Platform as a Service cloud business and the Software as a Service is for more people to follow the lead of who is being admirably open about the service that they are actually delivering to their customers. publishes a daily summary of the transaction load and response time for its application system at A screen shot of the summary status screen is shown below.

Cloud.performance.SalesforceWhen a particular server has an incident, SalesForce is even open about the nature of the incident, what caused it, and how they fixed it. Note that you can see which of the servers you are using by looking in the URL of  your browser and match that up with the particular row on the web site.

Since we are Industry Analysts it is our job to never be satisfied with anything, so there are obviously some things that can be pointed out in the area of room for improvement:

  • Showing the aggregate number of transactions and response time per day is nice – showing it for each server so that each customer knows how “their” server has performed would be better
  • Taking this one step further, a proper application of performance management technology should allow SalesForce to tell each customer how has performed for that customer and all of their users. This would obviously be hard to do on a real time basis (but not impossible), but it would easily be doable in a backward looking monthly report
  • Average response time is one thing, but variation in response time is just as important. A consistent .3 seconds is not that horrible different from a consistent .5 seconds. However most users would rather have a consistent .5 seconds that and average of .3 that randomly spikes up to 1 or 2 seconds.
  • Response time and its consistency is not the only measure of service quality. There is also the question of whether the application returned what the user was looking for or whether an error of some kind was encountered. This gets into the question of not just service performance but also service quality which is an entirely new bridge that needs to be crossed.
The focus upon sharing real response time and transaction load data by is notable when compared with the pre-historic approach to performance that is used by many cloud vendors (and for a matter of fact many enterprise IT organizations). Response Time correlates directly to end user experience and at the end of the day that is all that matters. Hopefully the industry will learn from and advance this concept further.