Caching throughout the Stack

One sure way to improve performance is to cache the non-dynamic data of any application. We did this to improve the overall performance of The Virtualization Practice website. However, there are many places within the stack to improve overall performance by caching, and this got me to thinking of all the different types. At the last Austin VMUG, there were at least three vendors selling caching solutions that were designed to improve overall performance by as little as 2x to upwards of 50x improvements. That is quite a lot of improvement in application performance. Where do all these caching products fit into the stack?

First we should show the stack and the different caching layers available. Figure 1 shows those layers including a theoretical location. In essence, we can apply caching mechanisms at each level of the stack to improve performance, and there are some cache mechanisms built into the hardware we use.

Figure 1: Caching Through the Stack
Figure 1: Caching Through the Stack

If we start at the bottom of Figure 1, we notice that the the bottom three items as including in almost all hardware we purchase. We could improve performance by buying devices with larger disk, controller, and switch buffers and caches, but these components are often removed from where the actual data is being processed.

Just moving to Solid State Drives (SSD), which is usually a huge win, may not be price effective given the cost of those drives. We may instead want to take advantage of these expensive resources in different ways.

Moving Data Closer to Processing

The goal is really to move the data closer to the location data is processed, this is done at the layers above the switch fabric in use by using hardware caches such as EMC VFCache, FusionIO, Violin Memory,  GreenBytes IO Offload Engine, etc. Each of these are hardware add ins to the system to cache either write, read, or both accesses to underlying disk whether presented locally or over some form of storage fabric. This type of caching moves cached data into the compute hardware for faster access.

Cache Level: Drivers

As we move up the stack, we enter the realm of drivers that provide better control of SSD style resources. They do this by using the SSD as a persistent cache while still writing to standard hard drives over the storage fabric. These SSDs could be local or remote (but you do get more bang for the buck using them locally). There are several companies that provide this type of single host caching mechanisms: SanDisk FlashSoft, PernixData, Proximal Data, VMware with vFlash, etc.  By managing using SSD or FusionIO devices more efficiently, your application developers receive the advantage of using SSD without the need to purchase SSD for every byte of data to be stored. If you are operating at petabyte scale then it is far cheaper to have lots of HDDs and a few well managed SSDs to act as a caching mechanism within each host. Big array vendors offer SSD caching tiers but you still have the data removed from compute in this model. SanDisk FlashSoft is the only vendor that performs this type of cache management for more than just vSphere, they also support Windows and Linux and through their Windows support gain a form (but not complete) Hyper-V support. Linux support also gives SanDisk FlashSoft the ability to provide some level (but not complete) support for Xen and KVM as well. With SanDisk FlashSoft on all but vSphere, live migration and other higher order hypervisor functions would not be supported. All the other options are VMware vSphere specific at this caching level.

Cache Level: vSCSI Filter

With VMware vSphere there is an addition tier which makes use of the vSCSI introspective API, to offload reads and writes to some form of cache. So instead of going direct to the hardware, we follow the orange path in Figure 1, which makes use of a per host virtual appliance that manages the cache resources. While this is one way to implement a caching mechanism, I have not heard of anyone actually doing it yet as it is vSphere specific. In Figure 1, this is depicted by the orange lines.

Cache Level: OS

We move up to OS Caching next, and there are many tools that work within the guest operating system to provide improved disk cache and IO. However two are fairly knew and make use of the same mechanisms used lower in the stack. Could you gain a better caching experience by using two or more mechanisms. At the Austin VMUG there were two companies that fit this cache level within the stack: SanDisk FlashSoft and Condusiv Technologies.  By being platform independent these products can work across hypervisors.

Cache Level: Application

The top of the stack is the application where you can apply many different caching algorithms to store non-dymanic and dynamic data in memory. I have seen database query caches, full output caches, partial output caches, image caches, and byte code caches. All move the reading of data from disk (even SSD) to somewhere within memory for easy and very fast retrieval. Which mechanism to use depends on the application framework as well as the application itself. For general web applications there are tools like Varnish which are highly configurable to meet your needs. For PHP there is Xcache and other bytecode cache mechanisms.

Cache Level: Caching VSAs (cluster wide)

There are also cache mechanisms that can be used cluster wide that live above the actual storage hardware involved.  They represent the storage as either iSCSI, CIFS, or NFS and cache all reads and writes from and to the hardware by either using lots of memory (such as Datacore SanSymphony), or by making better use of SSDs such as GreenBytes vIO. In Figure 1, this method of caching is depicted by the red lines. Granted, if the Caching VSA is not running on the same host, the traffic would go out to the network fabric. These types of devices also have limitations on what type of hardware you can run, they require SSD as well as 10G fabric whenever possible.

Concluding Thoughts

No matter where you put your caching, you get an immediate bang for your buck. In our case, implementing a simple application cache improved performance such that our response time dipped to below half the average and has stayed there. You may even want to layer your cache mechanisms to gain better performance, eventually however there will be diminishing returns as you add more and more cache mechanisms. How you cache also depends on the application you want to improve. For example, if you look at virtual desktops as an application you may want either a Caching VSA like GreenBytes vIO or use some sort of driver or hardware cache on top of your existing spinning hard disks or solid state drives.