Keeping in mind that the best server and storage IO is the one that you do not have to do, then second best is that which has the least impact combined with best benefit to an application. This is where SSD, including DRAM- and NAND-flash-based solutions, comes into the conversation for storage performance optimization.
One sure way to improve performance is to cache the non-dynamic data of any application. We did this to improve the overall performance of The Virtualization Practice website. However, there are many places within the stack to improve overall performance by caching, and this got me to thinking of all the different types. At the last Austin VMUG, there were at least three vendors selling caching solutions that were designed to improve overall performance by as little as 2x to upwards of 50x improvements. That is quite a lot of improvement in application performance. Where do all these caching products fit into the stack? Continue reading Caching throughout the Stack→
In part II of this series we covered some of the differences between various Hard Disk Drive (HDD) including looking beyond the covers at availability, cache and cost. Let us pick up where we left off on our look beyond the covers to help answer the question of which is the best HDD to use.
Form factor (physical attributes)
Physical dimensions including 2.5” small form factor (SFF) and 3.5” large form factor (LFF) HDDs. 2.5” HDDs are available in 7mm, 9mm and larger 14mm height form factors. Note that taller drives tend to have more platters for capacity. In the following image note that the bottom HDDs is taller than the others are.
The above tall or “thick” (not to be confused with thick or thin provisioned) is a SFF 5.4K RPM 1.5TB drive that I use as an on-site backup or data protection target and buffer. The speed is good enough for what I use it for, and provides good capacity per cost in a given footprint.
Also, note that there is a low profile 7mm device (e.g. middle) that for example can fit into my Lenovo X1 laptop as a backup replacement for the Samsung SSD that normally resides there. Also shown on the top is a standard 9mm height 7.2K Momentus XT HHDD with 4GB of slc nand flash and 500GB of regular storage capacity.
Functionality include rebuild assist, secure erase, self-encrypting device (SEDs) without or without FIPS, RAID assist, support for large file copy (e.g. for cloud, object storage and dispersal or erasure code protection). Other features include intelligent power management (beyond first generation MAID), native command queue (NCQ), and Advanced Format (AF) 4Kbyte block and 512 byte emulation). Features also include those for high-density deployments such as virtualization and cloud such as vibration management in addition to SMART (Self-Monitoring, Analysis, and Reporting Technology) reporting and analysis.
Drives can also depending on vendor, make and model support various block or sector sizes including standard 512, 520, 524 and 528 for different operating systems, hypervisors or controllers. Another feature mentioned above is the amount of volatile (DRAM) or persistent (nand flash) cache for read and read-ahead. Some drives are optimized for standalone or JBOD (Just a Bunch of Disks) and others for use with RAID controllers. By the way, put several SSD drives into an enclosure without a controller and you have Just a Bunch Of SSDs or JBOS. What this means is that some drives are optimized to work with RAID arrays and how they chunk or shard data while others are for non-RAID use.
Speaking of RAID and HDDs, have you thought about your configuration settings, particular if working with big data or big bandwidth and large files or objects? If not you should including paying attention to stripe, chunk or shard size of how much data gets written to each device. With larger IO sizes, revisit what the default settings are to determine if you need to make some adjustments. Just as some drives are optimized for working with RAID controllers or software, there are some drives being optimized for cloud and object storage along with big data applications. The differences is that these drives are optimized for moving larger chunks or amounts of data usually associated with distributed data dispersal, erasure coding and enhanced RAID solutions. An example of a cloud storage optimized HDD is the Seagate Constellation CS (Cloud Storage).
Moving on, some drives are designed to be spinning or in constant use while others for starting and stopping such as with a notebook or desktop. Other features appearing in HDDs support high-density, along with hot and humid environments for cloud and managed service provider or big data needs. The various features and functionality can be part of the firmware enabled for a particular device along with hard features built into the device.
Interface type and speed
The industry trend is moving towards 6Gb SAS for HDDs similar to that for SSD drives. However, there is also plenty of 6Gb SATA activity, along with continued 4Gb Fibre Channel (4GFC) that eventually will transition to SAS. There is also prior generation 3Gb SAS and 3Gb SATA and you might even have some older 1.5Gb SAS or SATA devices around, maybe even some Parallel ATA (PATA) or Ultra320 (Parallel SCSI). Note that SATA devices can plug into and work with SAS adapters and controllers, however not the other way around.
Note that if you see or hear about a storage system or controller with back-end 8Gb Fibre Channel, chances are the HDD would auto-throttle negotiate down to 4GFC. In addition to the current 6Gb speed of SAS, there are improvements in the works for 12Gb and beyond, along with many different topology or configuration options. If you are interested in learning more about SAS, check out SAS SANs for Dummies sponsored by LSI that I helped write.
Notice I did not mention iSCSI, USB, Thunderbolt or other interfaces and protocols? Some integrators and vendors offer drives with those among other interfaces, they are usually SAS or SATA with a bridge, router or converter interface attached to them externally or as part of their packaging (See following image).
Performance of the device
A common high-level gauge of drive performance is the platter rotational speed. However there is other metrics including seek time, transfer rate and latency. These in turn vary based on peak and sustained, read or write, random or sequential, large or small IOPS or transfer requests. There are many different numbers floating around as to how many IOPS a HDD can do based on its rotational speed among other factors. The challenge with these numbers or using them is putting into context of what size the IOP is, was it a read or write, large or small, random or sequential relative to your needs. Another challenge is how those IOPs are measured, for example were the measured below a file system to negate buffering, or via a file system.
Rotational speed such as 5,400 (5.4K) revolutions per minute (RPM), 7.2K, 10K and 15K RPMs. Note that while a general indicator of relative speed, some of the newer 10K SFF (e.g. 2.5”) HDDs provide the same or better performance of earlier generation 3.5” 15K devices. This is accomplished with a combination of smaller form factor (spiral transfer rate) and improvements in read/write electronics and firmware. The benefit is that in the same or smaller footprint, more devices, performance and capacity can be packaged as well as the devices individually using less power. Granted if you pack more devices into a given footprint, the aggregate power might increase, however so too does the potential performance, availability, capacity and economics depending on implementation. You can see the differences in performance using various HDDs including an HHDD in this post here that looked at Windows impact for VDI planning.
This wraps up this post, up next part IV, we continue our look beyond the covers to determine the differences and what HDD is best for your virtual or other data storage needs.
Unless you are one of the few who have gone all solid-state devices (SSDs) for your virtual environment, hard disk drives (HHDs) still have a role. That role might be for primary storage of your VMs and/or their data, or as a destination target for backups, snapshots, archiving or as a work and scratch area. Or perhaps you have some HDDs as part of a virtual storage appliance (VSA), storage virtualization, virtual storage or storage hypervisor configuration. Even if you have gone all SSD for your primary storage, you might be using disk as a target for backups complimenting or replacing tape and clouds. On the other hand, maybe you have a mix of HDD and SSD for production, what are you doing with your test, development or lab systems, both at work and at home.
Despite the myth of being dead or having been replaced by SSDs (granted their role is changing), HDD as a technology continues to evolve in many areas.
General storage characteristics include:
Internal or external to a server or, dedicated or shared with others
Performance in bandwidth, activity, or IOPS and response time or latency
Availability and reliability, including data protection and redundant components
Capacity or space for saving data on a storage medium
Energy and economic attributes for a given configuration
Functionality and additional capabilities beyond read/write or storing data
Capacity is increasing in terms of aerial density (amount of data stored in a given amount of space on HDD platters, as well as number of platters stacked into a given form factor. Today there are two primary form factors for HDDs as well as SSDs (excluding PCIe cards) which are 3.5” and 2.5” small form factor (SFF) widths available in various heights.
On the left is a 2.5” 1.5TB Seagate Freeplay HDD with a USB or eSATA connection that I use for removable media. On the right, a couple of 3.5” 7200 HDDs of various capacities size, in the center back, an older early generation Seagate Barracuda. In the middle, a stack of HDD, HHDD and SSD 2.5” devices including thin 7mm, 9mm and thick 15mm heights. Note that thick and thin refer to the height of the device as opposed to thin or thick provisioned.
In addition to form factor, capacity increases and cost reductions, other improvements include reliability in terms of mean time between failure (MTBF) and annual failure rate (AFR). There have also been some performance enhancements across the various types of HDDs, along with energy efficiency and effectiveness improvements. Functionality has also been enhanced with features such as self-encrypting disks (SEDs) or full disk encryption (FDE).
Data is accessed on the disk storage device by a physical and a logical address, sometimes known as a physical block number (PBN) and a logical block number (LBN). The file system or an application performing direct (raw) I/O keeps track of what storage is mapped to which logical blocks on what storage volumes. Within the storage controller and disk drive, a mapping table is maintained to associate logical blocks with physical block locations on the disk or other medium such as tape.
When data is written to disk, regardless of whether it is an object, file, Web database, or video, the lowest common denominator is a block of storage. Blocks of storage have been traditionally organized into 512-bytes, which aligned with memory page sizes. While 512-byte blocks and memory page sizes are still common, given larger-capacity disk drives as well as larger storage systems, 4KB (e.g., 8 × 512 bytes or 4,096 bytes) block sizes are appearing called Advanced Format (AF). The transition to Advanced Format (AF) 4KB is occurring over time with some HDDs and SSDs supporting it now along with emulating 512-byte sectors. As part of the migration to AF, some drives have the ability of doing alignment work in the background off-loading server or external software requirements. Also related to HDD drive size are optional format sizes such as 528 byte used by some operating systems or storage systems.
Larger block sizes enable more data to be managed or kept track of in the same footprint by requiring fewer pointers or directory entries. For example, using a 4KB block size, eight times the amount of data can be kept track of by operating systems or storage controllers in the same footprint. Another benefit is that with data access patterns changing along with larger I/O operations, 4KB makes for more efficient operations than the equivalent 8 × 512 byte operations for the same amount of data to be moved.
At another detailed layer, the disk drive or flash solid-state device also handles bad block vectoring or replacement transparently to the storage controller or operating system. Note that this form or level of bad block repair is independent of upper-level data protection and availability features, including RAID, backup/restore, replication, snapshots, or continuous data protection (CDP), among others.
There are also features to optimize HDDs for working with RAID systems, or for doing for file copies such as for use with cloud and object storage systems. Some HDDs are optimized for start/stop operations found in laptops along with vibration damping, while others support continuous operation modes. Other features include energy management with spin down to conserve power, along with intelligent power management (IPM) to vary the performance and amount of energy used.
In addition to drive capacity sizes that range up to 4TB on larger 3.5” form factor HDDs, there are also different sizes of DRAM buffers (measured in Mbytes) available on HDDs. Hybrid HDD (HHDDs) in addition to having DRAM buffers also have SLC or MLC nand flash measured in GBytes for even larger buffers as either read, or read/write. For example the HHDDs that I have in some of my laptops as well as VMware ESXi servers have 4GB SLC for a 500GB 7,200 RPM device (Seagate Momentus XT I) or 750GB with 8GB SLC (Seagate Momentus XT II) and are optimized for reads. In the case of a HHDD in my ESXi server, I used this trick I learned from Duncan Epping to make a Momentus XT appear to VMware as a SSD. Other performance optimization options include native command queuing, and target mode addressing which in turns gets mapped into for example VMware device mappings (e.g. vmhba0:C0:T1:L0).
Other options for HDDs include speed with 5,400 (5.4K) revolutions per minute (RPM) being at the low end, and 15,000 (15K) RPMs at the high-end with 7,2K and 10K speeds also being available. Interfaces for HDDs include SAS, SATA and Fibre Channel (FC) operating at various speeds. If you look or shop around, you might find some parallel ATA or PATA devise still available should you need them for use or nostalgia. FC HDDs operate at 4G where SAS and SATA devices can operate at up to 6Gb with 3Gb and 1.5Gb backwards compatibility. Note that if supported with applicable adapters, controllers and enclosures, SAS can also operate in wide modes. Check out SAS SANs for Dummies to learn more about SAS, which also supports attachment of SATA devices.
Ok, did you catch that I did not mention USB or iSCSI HDDs? Nope, that was not a typo in that while you can get packaged HDDs or SSDs with USB, iSCSI, Firewire or Thunderbolt attachments, they utilize either a SAS or SATA HDD. Inside the packaging will be a bridge or gateway card or adapter that converts from for example SATA to USB. In addition to packaging, converters are also available as docking stations, enclosures or cables. For example, I have some Seagate GoFlex USB to SATA and eSATA to SATA cables for attaching different devices as needed to various systems.
Besides drive size (form factor) and space capacity, interface and speed, along with features, there are some other differences which are enterprise class (both high performance and high capacity) along with desktop and laptop, internal and external use. These drives can be available via OEMs (server and storage vendors) or systems integrators with their own special firmware or as generic devices. What this means is that not all SATA or SAS HDDs are the same from enterprise to desktop across both 2.5” and 3.5” form factors. Even the HDDs that you can buy for example from Amazon will vary based on the above and other factors.
So which HDD is best for your needs?
That will depend on what you need or want to do among other criteria that we will look at in a follow-up post.
With Virtual Desktop Infrastructures (VDI) initiatives adoption being a popular theme associated with cloud and dynamic infrastructure environments a related discussion point is the impact on networks, servers and storage during boot or startup activity to avoid bottlenecks. VDI solution vendors include Citrix, Microsoft and VMware along with various server, storage, networking and management tools vendors.
A common storage and network related topic involving VDI are boot storms when many workstations or desktops all startup at the same time. However any discussion around VDI and its impact on networks, servers and storage should also be expanded from read centric boots to write intensive shutdown or maintenance activity as well. Continue reading Windows boot IO and storage performance impact on VDI→