Home | Storage | Compression – Table Stakes for Next Generation Storage Platforms
Compression – Table Stakes for Next Generation Storage Platforms

Compression – Table Stakes for Next Generation Storage Platforms

0 Flares Twitter 0 Facebook 0 Google+ 0 StumbleUpon 0 Buffer 0 LinkedIn 0 Filament.io 0 Flares ×

The storage world of tomorrow will see a divergence of technology into two main camps; storage will be either capacity or performance focused.  At the performance end, today’s newest generations of storage platforms are taking advantage of technologies such as flash to deliver high throughput and low latency I/O for the latest generation of performance hungry applications and hypervisors.  High performance however comes at a cost, as flash is considerably more expensive than the traditional hard drive.  This represents a problem for all vendors as customers continue to base purchasing decisions on the $/GB metric.

The answer to achieving HDD/SDD cost parity has been to implement space efficiency technologies such as thin provisioning and data reduction technologies including data de-duplication and compression.  These features see reductions of up to 10:1 resulting in increased “effective capacity” and a significant closing of the HDD/SDD price gap.  When applied in a flash array not only does the “effective capacity” improve markedly, the performance that flash delivers isn’t lost so you get performance and capacity improvements. This also means a direct “Capex” saving in hardware acquisition costs and the ancillary “Opex” benefits of reductions in data centre space, power and cooling.

Digging a bit deeper into the numbers, we see that de-duplication savings are highest with large volumes of unstructured data due to the commonality of repeated data chunks or blocks.  The reduction rates can be impressive, with the best savings achieved in virtual desktop, virtual server and file-based data.  However, the mainstay of the business world still depends on structured data, which by its nature doesn’t respond well to de-duplication techniques.  Instead, structured data such as OLTP databases (which run the majority of production workloads in the Fortune 1000) see greater data reduction benefit from data compression, which works within a single data set, identifying and removing redundant strings of data.

There will continue to be growth in both data types with much of the growth in structured data coming from machine to machine sources such as sensors.  Unstructured data continues to see new growth from binary objects, files and multimedia such as audio and video.

Structured data will continue to be a burden on F1000 businesses; unfortunately most of today’s compression algorithms are both non-trivial and CPU-intensive processes that impacts performance.  There is a dichotomy between implementing inline compression and delivering low latency, high throughput I/O.  Given the performance trends with flash technologies any data compression process needs to be as fast and efficient as possible.  The table below shows comparisons between vendors in today’s marketplace.  Many don’t offer both compression and de-dupe inline (especially on legacy architectures).  In some instances, processes are completed as background tasks with a resulting performance impact.

Permabit Albireo

IBM Storwize FlashSystem V840

EMC XtremIO

SolidFire Storage Platform

Pure Storage FlashArray

NetApp FAS6080

EMC VNX 7500

Primary Storage Random I/O Use Case

Yes

Yes

Yes

Yes

Yes

No

No

Inline Compression

Yes

Yes

No

Yes[1]

Yes[2]

Yes

No

Inline Deduplication

Yes

No

Yes

Yes

Yes

No[3]

No



[1] Compression performed prior to dedupe

[2] Some post processing compression performed too for 2nd level of additional intensive compression.

[3]  Post processing for dedupe performed – has potential performance and capacity impacts

 

Permabit has recently added a new compression capability called HIOPS™ compression to their existing Albireo VDO (Virtual Data Optimizer) data efficiency technology.  HIOPS compression works in conjunction with De-duplication in VDO to deliver savings of up to 35x (3-5x from compression alone) while still achieving throughput of up to 650,000 4KB IOPS in mixed 70R/30W OLTP environments.

Integrating compression into the data reduction process through HIOPS compression allows data growth for structured data to be managed in a highly efficient manner.  Initial deduplication of the data stream into 4KB blocks ensures only new data is compressed (deduplication followed by compression is unique to Permabit).  Compressed data is consolidated onto flash, resulting in a reduced write footprint and reading of an entire 4KB block from flash allows data to be pre-emptively brought into cache, saving I/O operations.

Summary

Data reduction is an important feature in primary storage and a required one in flash-based systems, without which these platforms are simply not cost competitive.  Optimum savings (and least impact to hosts) are made when the processes of de-duplication and compression are done inline.   This requires highly parallel, optimised algorithms such as HIOPS Compression.  The resulting “effective capacity” improvements and performance from flash technology delivers a 1-2 punch.

The next time you’re looking at buying a storage array, be sure to ask how the vendors’ data reduction features are implemented as you may be surprised to learn that not all early stage implementations of these technologies will be efficient enough to deal with the demands of todays and future IT environments that will be much more flash based.

Disclaimer:  Permabit is a client of the Architecting IT Blog.

About Chris M Evans

Chris M Evans has worked in the technology industry since 1987, starting as a systems programmer on the IBM mainframe platform, while retaining an interest in storage. After working abroad, he co-founded an Internet-based music distribution company during the .com era, returning to consultancy in the new millennium. In 2009 Chris co-founded Langton Blue Ltd (www.langtonblue.com), a boutique consultancy firm focused on delivering business benefit through efficient technology deployments. Chris writes a popular blog at http://blog.architecting.it, attends many conferences and invitation-only events and can be found providing regular industry contributions through Twitter (@chrismevans) and other social media outlets.
  • Jon Smith

    Chris, aren’t you comparing apples to oranges?
    If you want to compare prices you should look at SSD + Compression Vs. HDD + Compression.
    In this case HDD wins on cost as well…

    What am I missing?

    • http://architecting.it Chris M Evans

      Jon

      The point is, all HDD arrays weren’t good or capable of handling compression and/or de-dupe and so these features weren’t implemented. (of course there may have been a case that the vendors simply didn’t want to give away cheaper/higher capacity storage). As a result you can’t compare HDD+features with SSD+features. SSD enables the features more easily and vendors are using these features as a way to achieve parity.

      Chris

      • Jon Smith

        Thanks for your response Chris.
        It’s clear why it is much easier to implement effective dedup with flash as you can keep all the hash table on a fast enough media as flash and save RAM space.
        With compression I would guess the CPU would be the bottleneck. So if you have enough horse power you should be able to effectively implement compression with HDD as well.

        I’m actuallysurprised there is still no real high end solution that effectively implements compression on HDD.

        • Andrew Harrison

          It also rather depends on where you are doing the compression. The article refers to compression and de-dupe being done at the Array level and while de-dupe probably lives as a function in the Array the Jury is somewhat out as to if Compression should exist in the Array of somewhere else.

          Oracle for example has 3 different DBMS compression solutions which while they consume Host CPU resources do have some very significant benefits. For example compressed objects in Oracle are always compressed, ensuring that Memory usage as well as disk usage is reduced. From a performance perspective the best kind of IO is the one you don’t do at all and compressing in the DBMS significantly reduces the IO generated by the DBMS.

          SQL-Server also has compression, which for Microsoft’s Reference DW solutions is enabled as a default.

          If DBA’s have enabled Compression at the DBMS level then any inline compression performed at the Array level will be between 0 and minimally effective.

          Ideally in this scenario the Array should be able to selectively compress only compressing data coming into LUN’s that hasn’t been compressed upstream of the Array.

          There are a whole plethora of Filesystem’s that also do inline compression, some also do de-dupe though this is often done as a post process.

          • http://blog.plein.org/ Bill Plein

            SQL Server compression does not compress all data structures. So a compressing array can give some additional value, but I do agree that compressing at the application level does reduce an arrays ability to compress, at the cost of host CPU that could otherwise be used for application processing.

    • Andrew Harrison

      De-dupe can be problematical for HDD based Arrays because de-dupe if successful serves to randomise the data held in the Array with data being more random the more effective the de-dupe is. This means that what would normally be sequential read’s without de-dupe become random reads post de-dupe where the 200 or so IO’s per second a decent 10,000 – 15,000 RPM SAS Drive can sustain becomes the limiting factor.

      SSD based arrays don’t suffer from this problem, random reads being the thing that flash Arrays should be really really good at.

      • LouisPTC

        Andrew, you are correct that SSD dedupe is easier.
        There are, however, many techniques for ameliorating the issues associated with long-term dedupe fragmentation effects on hard drives and there’s even a fair amount of published research on the subject. It is a tricky engineering problem, but absolutely solvable in production HDD environments. One example of a product that does a great job of this is the HNAS filer from HDS.

  • http://blog.plein.org/ Bill Plein

    Sorry, but XtremIO doesn’t yet have compression. It is promised in the next GA release. In order to get compression, it seems that they are going to have to give up something, and word on the street is that deduplication ratios will change due to a move to an 8K block size.

    Pure Storage (I’m an employee, full disclosure) does compression inline. We do FURTHER further, deeper compression post process. Deep compression takes more time and CPU, so we do it on a future touch of the data. You feel that is significant enough to call us out with an footnote, and yet give full credit to EMC for a code release that they have yet to make generally available, a software upgrade that we understand will be offline, destructive to data (i.e. backup and restore)

    Checkbox comparisons are useful, but don’t tell all the story. When footnoting, please ensure you do so with equal opportunity.

    • http://architecting.it Chris M Evans

      Bill, thanks, you’ll see I’ve amended the text.

      Chris

  • http://virtualstorageguy.com/ Vaughn Stewart

    == Disclaimer: Pure Storage employee ==

    Chris – as always, nice article.

    I believe I’ve spotted a few errors in your chart that you
    may want to investigate and revise.

    1). Today XtremIO does not offer data compression, only 4KB
    data deduplication. Please verify with EMC but I believe this is accurate
    information.

    2). Compression with Pure Storage is 100% in-line. The
    asterisk is incorrect and should be changed. During GC Purity O.E. applies a
    second form of compression, one that is too aggressive for inline processing,
    to further reduce the capacity of the data set.

    You can read more about FlashReduce in Purity here:
    http://purestorageguy.com/2014/03/25/pure-storage-flash-bits-adaptive-data-reduction/

    One might question the validity of including the NetApp
    inline compression in this list. While the feature has existed not much is
    wrote about it, suggesting a correlation with the adoption of the current
    implementation. C’est la vie.

    Cheers,
    v

    • http://architecting.it Chris M Evans

      Vaughn, thanks I’ve amended the table and the footnote.

      Cheers
      Chris

0 Flares Twitter 0 Facebook 0 Google+ 0 StumbleUpon 0 Buffer 0 LinkedIn 0 Filament.io 0 Flares ×