Home | Storage | HDS Top with Latest SPC-1 All-Flash Benchmark
HDS Top with Latest SPC-1 All-Flash Benchmark

HDS Top with Latest SPC-1 All-Flash Benchmark

0 Flares Twitter 0 Facebook 0 Google+ 0 StumbleUpon 0 Buffer 0 LinkedIn 0 Filament.io 0 Flares ×

spc1Following hot on the heels of NetApp and their SPC-1 benchmark figures for the EF560 all-flash array, HDS have announced that the VSP G1000 has delivered the fastest benchmark figures to date, as highlighted in the graph shown here (borrowed from Hu Yoshida’s blog post on the news).  Links to the various SPC-1 results are shown at the end of this article.

At a shade over $2m and almost twice the $1/SPC-1 IOPS of the NetApp solution, this test (and others) may seem like they are out of reach for many customers and not reflective of real world situations.  In fact I believe the benefits of these kinds of tests are much more subtle than that and in most cases misunderstood by many people, especially those new to the industry.  Here’s why.

Operational Efficiency

Pretty much all technology is designed to be operated at lower than its maximum rated tolerances.  That extra “gas in the tank” ensures performance or throughput is there when we need it, but the extra stress of running at 100% all the time would quickly break many mechanical devices.  There are plenty of cases in point to look at; power supplies and fans in PCs are one; the daily commute in your car is another; how many of us (me included) have a car capable of driving much faster than we ever do.  Imagine an aeroplane that couldn’t be powered by a single engine… So why do we buy and pay for this over-capacity?

Similar logic also applies to our storage devices.  If we run an array at 100% of the rated IOPS, what happens when a disk drive fails and data has to be rebuilt?  Well, quite simply, host-level I/O performance degrades and application performance suffers as a result.  Why are Storage Area Networks (SANs) designed to run at 50% bandwidth?  Because the failure of one path means the throughput isn’t affected (you could argue for 4 or 8 paths with less need to oversubscribe, but of course you’d deploy extra hardware anyway).

So that extra capacity, or to put it another way, running at less than 100% is a design feature to provide reliability into our technology.

Viewing VSP G1000 Results In Context

So where does that lead us in terms of how to interpret the HDS VSP G1000 test results?  From the graph we can see that the G1000 data scales pretty much linearly as IOPS increase (at least up to 1,000,000).  From there the graph slopes upwards slightly, with only a slight up-tick at 2,000,000.  Compare this to the “hockey stick” results of the other tests shown.  In addition, the HDS figures are showing consistently low latency, a key feature I raised in the previous post already discussed.

What the graph indicates is that HDS’s solution is more likely to deliver consistent latency, even if 20% of the IOPS capacity had to be dedicated to recovering from a disk (or FMD) failure.  This is an unlikely scenario, but has to be built into an architecture design, especially if you’re a bank, credit card provider, online retailer, online casino, real-time trading application or any other application where a reduction in latency translates directly into lost business.

Two Million VSAN IOPS

Here’s where we get to the point on understanding in the industry.  This month VMware launched vSphere 6.0 which included a demonstration of VMware Virtual SAN achieving a 2 million IOPS benchmark.  You can find details of the configuration and results here.  Reaching the 2m IOPS mark took 32 server nodes in a cluster with a total capacity of 512 Xeon cores, 4TB of DRAM, 246TB of disk capacity and 12.8TB of flash.  The test ran only 32 VMs and achieved the 2m IOPS with 100% read I/O.  No latency figures were quoted for this test.

When the test was repeated with a 70%/30% read/write mixed workload, the results only scaled to 640,000 IOPS.  Latency figures were quoted and showed an average of only 3ms (far higher than most of the SPC-1 benchmarks).  What neither of these tests show are real workload on the VSAN cluster (32 VMs is hardly representative), no performance figures in failure mode (i.e. disk rebuilds, SSD failures) or whether the latency scaled in a similar way to IOPS.  There’s also no mention of the cost of this configuration.

I consistently see comments on how external storage is too latency restrictive for today’s applications and low latency can only be achieved with converged solutions like VSAN.  The SPC-1 and VMware tests show this is simply not true.  That’s not to take anything away from the VSAN testing; achieving 2m IOPS is commendable, given the caveats in how it was achieved (the same applies for the SPC-1 tests, by the way).

The Architect’s View

The idea of SPC-1 is to provide some consistency on storage performance testing.  Although the total IOPS is the headline grabber, the more significant detail is how the hardware scales up to the maximum, especially in failure scenarios.  Storage solutions based on commodity servers can easily reach the 1m, 2m magic IOPS mark, but the question remains as to whether they can achieve that level of performance with consistent low latency – if that is a requirement of your application – including when rebuilding lost data or rebalancing a cluster.

How systems work in failure mode is vitally important, because at this point your business is directly affected.  Designing for failure isn’t just about adding in resilient nodes, it’s about designing in for failed capacity too.  If you are latency/performance rather than cost sensitive, then solutions like VSP will be a better choice for your business.

Related Links

About Chris M Evans

Chris M Evans has worked in the technology industry since 1987, starting as a systems programmer on the IBM mainframe platform, while retaining an interest in storage. After working abroad, he co-founded an Internet-based music distribution company during the .com era, returning to consultancy in the new millennium. In 2009 Chris co-founded Langton Blue Ltd (www.langtonblue.com), a boutique consultancy firm focused on delivering business benefit through efficient technology deployments. Chris writes a popular blog at http://blog.architecting.it, attends many conferences and invitation-only events and can be found providing regular industry contributions through Twitter (@chrismevans) and other social media outlets.
  • klstay

    So many IT organizations focus on efficiency over effectiveness. Does it really matter how much you saved on that car if it is not reliable enough to get you where you need to go? If, like some of the types of companies mentioned, you need this level of reliability and performance then HDS and this system are your new best friend.

    IT does not get paid to save the business money; we get paid to deliver applications, data, and communications to those needing them anywhere, anytime, on any device with a high degree of confidence the user is who they say they are. What kind of fool believes doing that on 4.5% of revenue instead of 5.2% is going to make or break the annual report? Yet, there is constant pressure to shave that 14% of budget by outsourcing or, in this case, going with NetApp for “half the price” of HDS.

  • Pingback: Hitachi's Evolving Data Management Strategy - Langton Blue Ltd()

  • John

    The G1000/VSP platform also offers a 100% SLA (Strangely the HP OEM version is only 99.99999% I think). This array falls into the “performance, and avalability at any cost” segment. Enigmo based systems I (Netapp E) I wouldn’t put on parity with the mother of all symetric path’d enterprise arrays.

  • Pingback: X-IO Technologies Takes SPC-1 Value Top Spot with New All-Flash Array | Architecting IT Blog()

0 Flares Twitter 0 Facebook 0 Google+ 0 StumbleUpon 0 Buffer 0 LinkedIn 0 Filament.io 0 Flares ×