Following up from yesterday’s post and the comments received, I thought it was worth considering how hard it would actually be to implement quality of service within a storage array.
Almost all storage arrays available today (whether physical or virtual) work on the assumption that I/O should be delivered as fast as possible. This is not an unreasonable premise, especially considering previous history, where storage was the slowest component in the architecture – why bother slowing things down deliberately? Where different service levels were required, this could be implemented by deploying cheaper hardware or different tiers (speeds/capacities) of disk. Each tier provides a set performance level in terms of IOPS and response time and of course is matched by different cost.
Cloud environments are different. There’s a desire to virtualise everything – network, compute, storage because when fine levels of granularity can be achieved, this results in a more efficient service and enables the ability to charge the customer for every increment in their use of resources. It would be too impractical to expect to design a private or public cloud with many fixed storage tiers as even with economies of scale there would be significant wastage (storage arrays in the cloud context could simply mean a server with multiple disks in it). It would also be poor service to expect a customer to migrate their server to another tier every time they needed another 100 IOPS to cope with growth. So, the answer is creating an infrastructure that offers variable IOPS and response times. How? Let’s start by thinking about the I/O process itself.
The I/O Process
Storage arrays have evolved to cater for the slowest component in the architecture – the hard drive. As a result, we see techniques employed to manage what are unpredictable response times of an order of magnitude (or more) greater than the processor and memory within the array itself. A typical I/O will be received on a front-end storage port (FC, iSCSI, FCoE, it doesn’t matter) and added to a queue in cache. The cache performs a number of functions; it batches up the requests that are pending to disk; it enables I/O acknowledgements to go to the host before data is committed to disk; it stores read requests so they can be serviced faster out of memory on subsequent repeat requests (and may do prefetch of reads too) and in the latest architectures the cache manages compression and de-duplication before data is permanently stored. I’ve simplified the functionality here for clarity – obviously I/O processing involves many other tasks.
From end-to-end, an I/O is received into cache, queued, eventually gets to be processed, is read from or written to disk, then put back into cache for forwarding to the originating host. During that time, I/O may be processed out of sequence in order to get better disk throughput. I/O can also be delayed by local and remote replication and of course as already mentioned, deduplication and compression.
So how could we implement QoS? Firstly for block storage, QoS could be applied to a LUN and tracked at that level. As the I/O comes in, either it is delayed before processing or delayed before confirmation to the user. It seems to be more logical to delay before processing, as any pending I/O wouldn’t be committed to disk and require back out in the case of cache failure. What that means is the QoS component would need to delay processing of the I/O until the prescribed time interval had elapsed, minus the processing time. For example, if a 5ms response time was desired and processing the I/O takes 1ms, then the I/O would be processed 4ms after being received.
With spinning disks, the ability to guarantee that an I/O could be processed consistently would be difficult. Hard drives don’t deliver consistent I/O response, especially with mixed sequential and random workloads. Solid state devices however, are more predictable and have response times significantly faster than disk media. An SSD array would be much more suited to delivering QoS, when an I/O only takes microseconds to complete and that process is 99.999% guaranteed to complete in a predictable time.
One final consideration; delaying I/O when the host is capable of overwhelming the storage array means careful management is required. Fibre Channel implements queue depth processing; no more I/O can be started to a LUN once the queue is full – the same can’t be said for iSCSI.
As we know, SolidFire have implemented QoS. It appears that Nexgen have also implemented a QoS feature called ioControl in their all-flash arrays (thanks Arjan). QoS could be yet another good reason to move away from hard drives and implemented all-flash devices.