About 10 years ago I worked on a project for a large bank where we put in place the ability to replicate test data from the production environment to a third array deployed in a lower cost data centre. The whole design was perhaps overly complex, with scripting required to re-establish replicated LUNs when switching between production sites and to semi-automate the deployment and replication process. However the benefit of implementing this data copying system was a reduction from days and weeks of copying for the developers to a few hours each time the development environment was refreshed (there were other fringe benefits too, but not worth discussing here). This huge saving in people time is exactly what Catalogic Software is all about; making the process of data copy management more efficient in order to make developers more “agile”. So how big is this market? IDC figures (quoted by Catalogic) show a potential $44 billion market with geometric growth in data copy requirements.
Copy data management doesn’t come across as one of the most interesting parts of the storage industry however CEO Ed Walsh (ex Avamar, ex IBM storage and Storewize) certainly has a passion for the company and the software he is creating (even if he perhaps could do with slowing down a little when presenting!). At the SFD event, Ed presented a great picture of what the product is and why there’s an need to manage the creation of copy data. Part of the problem can be explained when looking at the platform initially supported by the software, NetApp’s Data ONTAP operating system.
Data ONTAP was (from memory) the first platform to introduce flexible read/write snapshots when the software was introduced in the early 1990’s. Snaps can be kept locally or archived using SnapVault, best described as a “snapshot aggregator”. The problem with the ONTAP snapshot process is actually the simplicity with which copies can be created. It’s easy to get into “snapshot sprawl” and that’s where Catalogic Software’s ECX comes in. Once snapshots are under ECX control, they need to be utilised effectively. For Data ONTAP, this means creating read/write accessible volumes that can be used to access the data, which in the first release of the software is VMware’s vSphere platform. The ECX software follows a four step process;
- Catalog Data – storage systems and hypervisors are mapped into ECX
- Copy Data – copy jobs are created and scheduled to take snapshots
- Use Data – snapshots are used to create virtual machines to access data
- Analyse Data – system reports for managing data copies
Most of the abvoe steps may seem trivial, but they do take some effort to manage, including ensuring any replicated VMs don’t conflict as they are brought back into the existing infrastructure.
Today ECX supports only the Data ONTAP and vSphere platforms, however the design of the system is extensible in that it allows other vendors’ arrays and hypervisors to be added into what is a “pluggable” architecture. So theoretically, adding other platforms should be a reasonably trivial task. I say “theoretically” as I know from experience (through working with a company called Storage Fusion) that this process isn’t as simple as it seems. When looking across the way storage vendors have implemented their products, it becomes very difficult to create a single schema to map out the architecture and storage of data in each of those platforms without a lot of upfront work. Even if this work has been done, as new platform emerge or are upgraded, extracting information or driving snapshot creation on platforms that don’t have nice REST APIs can be a constant headache of upgrades and maintenance. Sadly this problem exists on the majority of legacy platforms that also currently have the largest market share.
Of course copy data is just one aspect of the ability to create snapshots and expose this back to the hypervisor. The technology enables other uses, including data protection (e.g. backup/restore), VM orchestration (e.g. copying VMs from master images) and a better insight into the data for compliance or regulatory purposes, such as being able to prove a backup of a set of data actually took place.
The Architect’s View
As a first step, creating copies from existing storage platforms is a good way to get into the market. Lots of data is replicated from traditional storage arrays based on the NAS and SAN protocols. However data growth isn’t in the traditional markets but in unstructured data and increasingly in machine created content. The interesting play for Catalogic will be in how they move from supporting traditional environments to making images or views of data available in other more complex environments based on SQL,NoSQL and object-based platforms. Being able to slice up data lakes into multiple logical views could be a huge future opportunity.
As this was a Tech Field Day event, all of the presentations are online and can be found through the related links.
- Catalogic Presents at Storage Field Day 7 (Tech Field Day Website)
- Storage Field Day 7 – Initial Thoughts
Comments are always welcome; please read our Comments Policy first. If you have any related links of interest, please feel free to add them as a comment for consideration.
Disclaimer: I was personally invited to attend Storage Field Day 7, with the event team covering my travel and accommodation costs. However I was not compensated for my time. I am not required to blog on any content; blog posts are not edited or reviewed by the presenters or Tech Field Day team before publication.
Copyright (c) 2009-2015 – Chris M Evans, first published on http://blog.architecting.it, do not reproduce without permission.