Start a Conversation

Unsolved

This post is more than 5 years old

DJ

1884

October 4th, 2013 10:00

Cannot mount recoverpoint Replicated datastores. VAAI issue? VNX7500

We have VAAI Primitive HardwareAcceleratedMove disabled on 2 of our esx clusters.  This corrected the issue covered below in my Feb forum post and in the recently released emc KB (attached as pdf) .

We have been unable to mount datastores replicated by Recoverpoint on the DR cluster.   We have tried using SRM and Manual methods.  The hosts try to mount the volumes, but are unable.  They act similar to when we have seen SCSI reserve issues on older setups.  Datastores appear and disappear from the host, and the vms cannot start.  HAMove disabled is the only difference between ESX clusters where RP/SRM works fine and this cluster in which it does not.  We have upgraded from 3.5 to 4.0 and replaced an RPA, still no luck.  Does anyone know if Recoverpoint will have issues with this VAAI primitive disabled? 

My earlier post on the VNX forum, containing the details on the VAAI issue:

Thanks!

2 Replies Latest reply: Feb 6, 2013 11:35 PM by Gearoid Griffin

SvMotion causes high latency, datastore connectivity loss. VAAI/RP issue?

This question is Not Answered.

spaceman Feb 6, 2013 5:14 PM

Brand new VNX installs, 2 separate datacenters, 12 node ESXi clusters on HP BL460cG8.  ESXi 5.0U1, RP 3.5SP1, VNX 7500 5.32.000.5.011, PP/VE 5.7 P02, Brocade 5300s with HP/Brocade modules, FOS 7.0.2.   Pretty standard stuff...

Getting datastore latency alarms, datastore connectivity alarms, general all around badness whenever we svmotion or clone.  Svmotion fails, users complain,  RP also loses connection to ports, LUNS.  Almost brings the whole datacenter to a halt, very scary. Acts much like SCSI reserve or APD...but we are mode 4 ALUA, VAAI. 

I've seen some blog entries on this with older Block OE, but I thought all of this was resolved in current block OE and splitter code.  Does anyone know where all this stands and what needs to be done to prevent this from occurring, if this is in fact what I assume it to be?

TIA!

  • 473 Views
  • Tags: none (add) vnx

Content tagged with vnx

, vnx7500

Content tagged with vnx7500

Average User Rating

0

(0 ratings)

·         1. Re: SvMotion causes high latency, datastore connectivity loss. VAAI/RP issue?

Sushant Gulati Feb 6, 2013 9:50 PM (in response to spaceman)

Can you provide some ESX logs which may help pinpoint the issue?

o    Report Abuse

o    Like Show 0 Likes (0)

·         2. Re: SvMotion causes high latency, datastore connectivity loss. VAAI/RP issue?

Gearoid Griffin Feb 6, 2013 11:35 PM (in response to Sushant Gulati)

Sooo more than likely you are hitting a newly documented issue emc313487

May not be customer viewable yet, but it will be shortly (its just going through final approval)

A Service Request would be required for us to verify this

But anyway the issue more than like is an issue with VAAI and Virtual Provisioning

Operations which use VAAI Primitive HardwareAcceleratedMove are considerably slower in operation and extreme spikes in latency can be seen

These include examples like :

Deploying Templates with VMWare

Storage VMotions

Any ESX operations which is used to copy or migrate data within the same physical array

EMC has determined that there is an issue with the usage of VAAI by Virtual Provisioning within VNX Block OE Release 32. This can occur any time an ESX host using VAAI issues a reqest for a Storage vMotion, deploying a VM from a template, or for cloning a Virtual Machine which utilise the Extended Copy (xcopy) SCSI command. The area of concern is pools and it applies to both fully provisioned (thick) or thinly provisioned (thin) pool Luns. The  issue may occur when the request to create a template or clone via the use of VAAI HardwareAcceleratedMove feature to a mapped lun on a non-owning storage processor. If the issue is encounter the symptom will be either response time spikes in VCenter and/or longer than expected completion times for the templates or clones.

Cause:

A memory allocation problem can lead Virtual Provisioning timeouts which lead to timeouts and to high response times on the host.

Workaround

Disable the VAAI Primitive HardwareAcceleratedMove (other VAAI Primitives do not need to be changed)

See VMWare KB http://kb.vmware.com/kb/1033665 for instruction on how to disable VAAI Primitives within ESX

OR

Avoid sending ExtendedCopy/Xcopy request for a LUN to the non-owning SP or clone or deploy a VM from a temalate to a LUN on the non-owning SP. 

FIX:

This issue is due to be fixed in a future release of VNX OE, scheduled Q1 2013

Note as well that you are on Rel 32 P11 which is also vulnerable to ETA EMC308914

1 Attachment

1.1K Posts

October 15th, 2013 06:00

Did you disable and re-enable the CGs after you disabled VAAI before retesting. If not please try this.

No Events found!

Top