Last week VMware issued this advisory on their knowledge base, recommending the disabling of VAAI Thin Provisioning reclaim in ESXi 5.0. Apparently it seems to be causing “poor performance’ during certain vSphere actions like Storage vMotion. The “cause” section contains the somewhat vague comment as follows:
VAAI Thin Provisioning is enabled by default on devices that adheres to T10 standards. ESXi will identify Thin Provisioned LUNs and issue UNMAP commands to reclaim deleted space on the storage. The implementation and response times for the UNMAP command may vary significantly among storage arrays.
Note the “may vary significantly among storage arrays” comment. There’s no list of who’s arrays are suffering performance issues and clicking through to the VMware Compatibility Guide, I’m unable to find arrays that claim to support the T10 plugin. I’d imagine, based on this post from Chad Sakac and the referenced Scott Lowe blog post that we’re talking EMC arrays being affected here. I haven’t seen any comments so far from other vendors.
This whole discussion brings me back to this post from a week or so ago. End users need to know what controls have been put into storage arrays to control the effect of VAAI primitives on the array. It’s a large risk to simply let hosts issue direct commands to the array that have such an impact on I/O. Imagine having storage DRS also implemented. It would be incredibly easy to create a scenario where far more work is being done to balance environments simply because too many VAAI requests had been thrown at an array.
Now, I’m not anti-VAAI in any way. In fact I think the concept makes total sense. Think back to in-array (clone/snapshots) and remote replication. It makes so much sense for the array to handle that kind of heavy lifting and the same applies to VAAI. Most sites wouldn’t give out the ability for end users and their hosts to perform infinite snapshots and replication failovers at will. This function is best managed centrally, or through a controlled proxy that allows the storage administrator to suspend the use of snapshot commands. This is essential if maintenance needs to be carried out on hardware or if there are performance or other issues being investigated.
What I’m saying is that we need both an understanding of how VAAI workload is prioritised against normal host I/O and an ability for the administrator to control/restrict the workload where required. I still believe that neither of these options are in products from the major storage vendors. I’d like to be proved wrong….
So far, only Hitachi/HDS have responded to my previous post – see Hu Yoshida’s post here – Weighing in on VAAI. Come on the rest of you, I *know* you read what I write and your silence speaks volumes to everyone.