By Greg Schulz, Server and StorageIO @storageio
Keeping in mind that the best server and storage IO is the one that you do not have to do, then second best is that which has the least impact combined with best benefit to an application. This is where SSD, including DRAM- and NAND-flash-based solutions, comes into the conversation for storage performance optimization.
The question is not if, but rather when, where, what, and how much SSD (NAND flash or DRAM) you will have in your environment, either to replace or to complement HDDs.
Having said that, let’s follow up from some previous posts that laid the framework for the above-mentioned themes. The next step is to discuss what to use when, where, and why when it comes to NAND flash SSD, including server- or storage-side approaches. Not surprisingly, depending on who you talk to or what you read and watch, you will find differing views on the best place, or in some cases the only place to deploy NAND flash SSD. Also not surprising: some of those approaches, pitches, or beliefs are tied to the limits or fit of a particular vendor’s products.
IMHO, the best place to deploy NAND flash SSD will depend on different factors, although one theme is constant. That constant theme is that the best IO is that which you do not have to do, followed that which has the least impact combined with the best benefit, and that location matters. While keeping it closer to the application (which means in the server) is a good answer and important, one must also keep in mind location and locality of reference in a shared or distributed environment.
Remember that cache and memory, including NAND flash SSD, are like real estate. Location matters as it ties to locality of reference. Thus if IOs are local to a server, keeping them there can be a good idea. Examples might include on a PCIe flash card or SSD drive. On the other hand, there may be many reads that are local to one or more servers with data stored on a shared SAN, which is where local caching comes into play when storage systems contain some SSD.
You have many options for locating NAND flash SSD, along with different packaging options. Locations include in servers, appliances, and storage systems or a combination of those in a cooperative and complementary configuration. Packaging options include DIMMs accessible via SAS and SATA, PCIe cache and target cards, and drive form factors.
Ok, so all of the above is fine and dandy; how about deciding where and when to use what?
Your primary options for placing NAND flash SSD include in your server, in a storage system, or in an appliance (assuming you are not using a cloud service). Storage systems and appliances can be all SSD or hybrid with a mix of SSD and HDDs. Some vendors are focused on specific areas such as servers or storage, hardware or software, adapters or systems, while others have a broad portfolio of solutions. Not surprisingly, the startups tend to be more focused, while larger established vendors such as EMC, HP, HDS, IBM, NetApp, and Oracle, among others, have more diverse solution sets. It should also not be a surprise that those with broader solution portfolios will take a more consultative approach to align the applicable solution to your needs, while those with few or single offerings may take a one-type-fits-all approach.
From a server perspective, your options are to install a NAND flash SSD packaged in a 2.5-inch or 3.5-inch drive form factor, a PCIe card, or if the server supports it, a special form factor module (e.g. see Oracle); more on these in a moment. Storage systems and appliances have options for NAND flash including as a PCIe card, drive form factor, or special form factor module that will vary by make and model. Since many storage systems leverage general-purpose servers as hardware platforms, it should not be a surprise that they also have common capabilities. For example, storage systems and appliances based on industry standard servers can use mix of PCIe cards, drives, and other options, just like application servers.
The following is a simple generic look at what to use where.
Performance, localized IO (e.g. closer to application), address IO issues closer to the source (server locality of reference)
Heat and power draw, CPU and DRAM consumption, need for HA and data protection. Not shared across servers without extra software
How many and what type of PCIe slots available vs. what is needed by a card, OS and hypervisor driver support? DRAM for caching of writes with battery backup unit (BBU).
Integration with other management tools or drivers as well as with caching, I/O optimization and tiering software. How is HA or clustering handled? Is the cache read-only with write-through or write-back.
Shared locality of reference, shared storage across multiple servers, HA and resilience, snapshots, replication
Architecture, implementations, and performance varies. Keep IO performance in proper context. More benefit with larger systems or many servers.
How many servers and applications that benefit from some amount of shared SSD? What software for tiering or caching works with the system?
Some solutions are all SSD while others are hybrid with both SSD and HDDs. Keep IOP performance in proper context including latency.
Best of both worlds, leverage tiered storage. Use server side for read cache and storage for shared writes.
May not be applicable to all environments. Interoperability of various software for caching and tiering.
Good fit for when there are some servers with high locality of reference and others that can leverage shared.
Look for solutions that also provide software integration with server side caching and tiering.
Some additional characteristics to consider when deciding what to use when or where:
Arrays and Appliances
Improve locality of reference, eliminate need for external storage, keeping IOs close to the application,. Extra measures for resiliency needed for HA, BC, or DR.
Improve locality of reference keeping IOs local and close to the application. Complements existing SAN or NAS or DAS storage.
Shared benefit, leverage HA and RAS, some are known entity vs. others that are startups. Some have snapshot, replication, tiering, thin provision, dedupe. Some are new others are legacy, some are all SSD and others are hybrid (HDD and SSD).
Server and storage interoperable, ease of deployment, many options, volume pricing. Can be used in servers, storage systems, converged appliances.
I/O intensive, small capacity
VDI, Web, cloud, video, small DB
I/O cache friendly, large space capacity, VDI, legacy apps
Legacy app IO consolidation large storage capacity, VDI, database, cloud, web, email
Fits in HDD slot web, database, email, file serving, VDI, big data
Where IO isolated to server
Cache for external storage
Use as external cache and as storage
Use as a target or as a large cache
Dedicated to installed server
Some have bottlenecks with SSD
Interface may limit full performance
EMC, Intel, FusionIO, LSI, WD/Stec, Micron, Virident, others
EMC XtremSF, Micron, LSI, NetApp and many others
Intel, Micron, Seagate, SANdisk
Samsung, WD and many others
How to avoid common mistakes pertaining to SSD selection:
- Location matters, as does locality of reference. Sometimes server-side is better; other times shared storage will be. In many situations, a combination complementing approach will be in order.
- Avoid comparing SSD technologies and solutions strictly on a cost-per-space capacity basis. Instead, look at value benefit such as cost per IOP, transaction, video, or file served. This also means looking at what productivity benefit can be achieved vs. viewing as a cost overhead.
- Speaking of comparing SSD, also look beyond just IOPS unless that is all your applications and environments need. This means looking at latency and bandwidth along with other metrics that matter, keeping those in their proper contexts.
- Instead of looking for problems to solve with a particular SSD solution, turn the equation around, looking at the problem and then applicable solutions along with what is best for your needs.
- Watch out for unrealistic expectations of the SSD solution, which can often be the result of metrics that do not matter or are taken out of context.
- While hardware can fix or mask many application or software problems, ultimately, to derive full benefit and value, you should go back and find and repair software or application issues.
- The best benchmark or simulation is your own environment or one that most closely resembles it, rather than the one somebody wants you to use and compare to.
- Understand what the cache coherency features are to prevent stale reads from occurring with server-side cache software and SSD devices along with shared storage systems.
- Consider what type of profiling and analysis tools are available to assess an environment as well as to provide insight into what to use, when, where, as well as how much is needed.
- What hypervisors and bare metal operating systems are supported?
- A little SSD (DRAM- or NAND-flash-based) can go a long way…
Speaking of SSD, regardless of where located, type of packaging and performance, particular IOPs, can we get some context with them metrics?
Also, keep in mind that SSD and other storage solutions can be complemented by caching and IO optimization software including those from EMC, FusionIO, IBM, NetApp, Pernix, Sandisk and VMware among many others.
Ok, nuff said (for now).