Understanding whether a virtualized or cloud based system or application is available and delivering acceptable response times to its end users is one of the most important tasks that must be addressed by any strategy for monitoring the performance of business critical applications.Let’s first review the nature of virtualized and cloud based systems, so that we can assess how these needs should be met:
- Virtualized and cloud based systems are dynamic. The location of guests on hosts can automatically change, the number of guests that comprise a tier of an application may rise or fall with demand, and via IT as a Service initiatives new applications may get created in a completely automated manner.
- It is possible for applications in these environments to be arbitrarily distributed. Pieces may run in one virtual data center, then in a different one, and then finally also in a public cloud.
- If the enterprise in question is implementing IT as a Service then then with this initiative will come the notion of tenants who would most likely be internal constituents like business units.
- Although Agile Development is not necessarily a characteristic of virtualized and cloud based environments, it is a trend that is occurring concurrently with the virtualization of application systems. Agile Development means that new functionality gets delivered into production on a frequent basis – sometimes as often as every few days.
Given that we have an environment that is rapidly changing and possibly distributed and we may have applications that are rapidly changing how do we monitor availability and and performance of these applications. There are some important principles that should be adhered to when designing a monitoring strategy for applications in these environments:
- It is critical for the instrumentation of the application not to be effected by the dynamic nature of the infrastructure. This means first and foremost that whatever is being done to monitor these applications should not break, or have to be configured as instances of the application move around, or are dynamically created and destroyed.
- It is difficult to impossible to infer the performance of an application by looking at granular resource utilization metrics collected from the infrastructure. In other words, if all the monitoring product collects is the data from vCenter (which is granular resource utilization and configuration data), you cannot infer applications performance reliably from this information. It is true that resource contention can and will cause applications performance problems, but there will be many kinds of performance problems that will not show up in this data.
- If applications are going to be changing, and get created on the fly, then the monitoring approach needs to self configure and discover these applications as they change and are automatically provisioned by IT as a Service initiatives. Any approach that requires manual configuration is doomed to failure as there will be no time to change the monitoring solution every time an application changes or gets created.
- Given the dynamic nature of these systems, it is best to observe the performance of applications running on them from the “outside-in”. What this means is that rather than digging deep into the virtualized infrastructure, it is best to watch transactions come into the edge of the applications system, and then see how long it takes for the application in the underlying infrastructure to respond to those requests for work or data.
If “outside-in” is the right approach to take, then lets drill down into the different ways in which this can be accomplished.
The “Network Sniffer” Approach
This approach relies upon the fact that almost all physical IP network switches support the idea of a “span port” or a “mirror port” which is a read-only port to which all traffic on the switch is copied. Attaching a physical appliance to this port that knows how to crack open HTTP and other application layer protocols allows the appliance to observe the behavior of the applications running on the network without touching the applications or their physical/virtual servers. In the vSphere world, the VMware vSwitch supports a virtual mirror port, which can be access with a virtual appliance. This approach is support by Quest Foglight, OPNET, Optier, Coradiant, and the VMware AppSpeed product.
The Client Agent Approach
Client agents reside on the end user PC’s of the users that use the application. This is perhaps the ultimate in the outside part of “outside-in” as the monitoring is done from the perspective of the user, and does not reside in the data center or the cloud at all. The benefit of this approach is the you get a true picture of what the end user is really experiencing including wide area network impacts and local screen paint times. This approach also has the potential to be very automatic as all a user has to do is start using a new application for monitoring of that application to be instantiated. The obvious downsides of this approach are that if you do not own the user (the user is a consumer, not an employee) then you cannot put an agent on their PC, and as end user devices proliferate (iPads, smartphones, etc,) coverage will require a set of agents that do not exist yet. The two vendors that have mature solutions in this area are Knoa and Aternity.
The Synthetic Transaction Approach
This approach has been around since applications on the Internet started addressing commercial use cases like e-commerce and financial trading. This approach involves having a computers spread out throughout the world that run scripts against the application. This approach is great for ensuring that applications are up before actual users start using them. However synthetic transactions have severe limitations in dynamic applications environments. Those limitations are it is impossible to mimic everything that a user does with an application in a script, that there is no tie to what the script is doing with what a user who is having a problem is doing, and most importantly that the script needs to be manually created and updated. There are many vendors of synthetic transaction products and services but the two most well known are Keynote and Gomez (now part of Compuware).
If synthetic transactions are dead as an approach for determining availability and performance from the perspective of the end users of an application then something has to take their place. The two candidates are approaches that analyze data on the IP network, and client side agents. Both will likely rise in prominence as more applications become more dynamic.