Imagine the following scenario, played out in IT departments around the world. As business grows, so does data volume and application traffic. Performance suffers as the application starts creaking around the edges. The inevitable upgrade is required. If the application is relatively new, it may be possible to scale out the architecture, however this isn’t the case for many traditional or legacy apps. Alternatively, the application could be scaled up, with a bigger, faster server using more memory and the latest processors – all at a cost. But in many cases the issue isn’t processor speed, it’s I/O latency.
OK this problem isn’t new and EMC’s introduction of flash drives into the VMAX platform was the start of a process to improve I/O density, that is the number of IOPS available per GB of storage. Introducing flash devices into the mix increases that ratio dramatically. Depending on the application itself, we can now introduce acceleration via software into the hypervisor or server, use PCIe SSD with software, use SSD with software or place flash into a traditional array. As the ultimate solution we can move the application entirely to flash. But in doing this there are issues.
- Cost – Is it cost effective to move my application entirely to flash? As part of a project I’m involved in, we can demonstrate that deploying a capacity increase of only 5% flash into a disk pool can be enough to raise the performance of storage up a whole tier. Implementing flash in most scenarios is about targeting the I/O at the hot data and maintaing that focus over time. Early flash implementations in traditional arrays suffered with the lack of granularity as they were unable to target data at the block level. All flash arrays are difficult to justify, if the improvement case isn’t proven.
- Practicality – Deploying PCIe cards or SSD and caching software into a host may not be practical to achieve. The hardware might not be able to take it; the application may have dependencies on remote and local replication that could be compromised by caching write I/O locally within the server; the solution may be clustered. For many reasons, existing solutions may not work.
So in many ways the best place to add flash with as little disruption as possible is the external storage array. But there are problems there too. Implementations like FAST on EMC’s platform require data to be collected over time and for human interaction to manage the flash and HDD pool ratios. This manual intervention costs time, money and doesn’t scale. Looking back at an article I wrote four years ago, I suggested automated tiering didn’t need to move the data around, but could simply target writes at the flash storage and cascade it down afterwards, based on usage algorithms. This is the way Dell’s Compellent system works. It’s also how Violin’s new Maestro platform accelerates traditional arrays.
Maestro is the re-packaging of the hardware and software in GridIron Systems’ OneAppliance TurboCharger platform. The solution is deployed as a hardware memory appliance with supporting software to non-disruptively integrate flash into an existing Fibre Channel I/O path. The appliance can then either monitor or actively cache data in write back or write through modes (i.e as a read cache only or as a write cache too). In a recent conversation with Violin’s VP of Product Management, Narayan Venkat, he explained how the Maestro solution is integrated into the existing data path and seen as additional paths to the LUN, requiring only extra Fibre Channel zoning. Exactly how this works in practice I’m not clear on, but it means the appliance acts as a target for I/O from the host, inspecting SCSI packets and making decisions on what data to cache from the array in order to speed up I/O not just by caching data that’s active, but by learning what other clusters of data in a similar locality may also become active and bringing it into the appliance. This more real-time approach contrasts to systems like FAST which do data tiering over time in terms of hours or days.
The memory appliance uses custom FPGAs for speed and can manage up to 1 billion “pages” of data across a single HA pair of devices with 10 microsecond latency. I/O granularity is 4KB, but can scale as low as 512 bytes. How does this translate into performance improvement? Violin are claiming between 10-15 times performance improvement using Maestro, but of course that will depend on the data profile.
While this solution seems comprehensive and simple to implement, there are a few caveats. Firstly in write-back mode, where the appliance actively caches write I/O, data is now stored in multiple locations for a single LUN/host, leading to possible data integrity issues in hardware failure scenarios. The solution becomes as reliable as the least reliable component, which may be an issue for high availability. In addition as detailed earlier, if a host is using advanced features like replication, this too could complicate a Maestro solution or make it impractical to implement.
The Flash On-Ramp
What Maestro does do, however is offer customers a non-disruptive opportunity to show what the benefits of flash could deliver in application performance acceleration. It can be operated in a “what if” mode simply observing and providing feedback on how latency could be reduced with flash. For many organisations even getting to this point can be tough if they can’t ascertain what the cause of an application bottleneck actually is.
But the ultimate goal for Violin is to get everyone onto flash storage. Maestro provides the on-ramp for making that move as seamless as possible and for building both the case for and confidence in flash solutions.
The Architect’s View
There are many ways to skin a cat, as the expression goes, and at first glance Maestro could be seen as just another flash acceleration option. However, non-invasive implementation targeting traditional applications is a neat sweet spot for Violin to justify getting a conversation with the CIO or IT head. As the company moves towards IPO (expect something to be announced around the 26th September), these kinds of solutions are needed to build their portfolio and improve penetration of key accounts, making the GridIron acquisition earlier this year a smart one indeed.
- Enterprise Computing: Automated Tiering – Why Move The Data?
- Violin Force 2510 Memory Appliance (Violin Memory -PDF)
Comments are always welcome; please indicate if you work for a vendor as it’s only fair. If you have any related links of interest, please feel free to add them as a comment for consideration.
Subscribe to the newsletter! – simply follow this link and enter your basic details (email addresses not shared with any other site).
Copyright (c) 2013 – Brookend Ltd, first published on http://architecting.it, do not reproduce without permission.