Start a Conversation

Unsolved

This post is more than 5 years old

1783

April 7th, 2014 04:00

HELP! Intermittent Issues with Hardware VSS Provider on Hyper-V CSVs


​We have an 8 node Hyper-V cluster on Windows 2008 R2 with a Clariion CX3-10 as storage for Cluster Shared Volumes and Microsoft DPM 2010 as the front-end requestor (don't laugh). Due to stability issues with this conbination since day one, we've ended up trying to mitigate for any possibility of the system having to do anything more difficult than simply create the snapshots. Each Hyper-V host has its own cluster shared volume and the virtual machines are hosted and aligned so that all machines hosted by a particular Hyper-V host is also physically stored on the CSV 'owned' by that host, to avoid ownership changes during snapshots. Each Hyper-V host is in its own storage group and has sight of all 8 CSVs, to allow migrations and fault tolerance and all the wonderful things that VMs are supposed to bring.​

​Our issue is that we are unable to get consistently robust backups via DPM, and the main problem appears to be that , for no obvious reason, the EMC Hardware VSS Provider decides that there are insufficient resources for the snapshot to be created. Some nights we can run snapshots on all 8 CSVs concurrently (we have had variously 8, 16 and 24 LUNs of a good size in the reserved LUN pool) and sometimes we even get what seems to be a 'not enough resources' issue when we are simply trying to re-snapshot one VM on one host.​
​The hwprov.log snippet below is from the latest failed attempt this morning. This is the only snapshot we are trying to create and there are 8 free LUNs in the RLP. I will consider donating internal organs or a case of something alcoholic to the person who can figure out why this fails.​

​Thank you.​

​Richard Davies​

​11:01:04 04/07/2014 CreateFriendlyName = TRUE. Provider will create a friendly name for CLARIION snapview session and lun names. ​

​11:01:04 04/07/2014 EnforceDefaultToClone = FALSE. Provider will create Mirror(default) snapshots for plex. ​

​11:01:04 04/07/2014 EnforceVPSnap not found. Provider will create Timefinder/Snap differential snapshots. ​

​11:01:04 04/07/2014 EnforceSymmMaskingFromList key not found. Provider will use the standard rules for Tigon unmasking operations. ​

​11:01:04 04/07/2014 GetRegistryDefaults: Leaving with SUCCESS. ​

​11:01:04 04/07/2014 OnLoad: Entering. ​

​11:01:04 04/07/2014 OnLoad: EMC VSS Provider (V4.7.1.0-9) is loading. ​

​11:01:04 04/07/2014 OnLoad: Initializing a READ_WRITE local storapi session. ​

​11:01:04 04/07/2014 OnLoad: Initializing a READ_MOD_NOWRITE local storapi session. ​

​11:01:04 04/07/2014 StorUpdate: Entering. ​

​11:01:04 04/07/2014 StorUpdate: Performing a StorDiscover to update the database. ​

​11:01:18 04/07/2014 StorUpdate: Commiting database in memory to file. ​

​11:01:18 04/07/2014 StorUpdate: Leaving with SUCCESS. ​

​11:01:18 04/07/2014 IsLocalHostVM: Entering... ​

​11:01:18 04/07/2014 IsLocalHostVM: Leaving with success. ​

​11:01:18 04/07/2014 OnLoad: EMC VSS Provider (V4.7.1.0-9) has successfully loaded. ​

​11:01:18 04/07/2014 OnLoad: Leaving with SUCCESS. ​

​11:01:18 04/07/2014 AreLunsSupported: Entering. ​

​11:01:18 04/07/2014 AreLunsSupported: lLunCount = 1, lContext = 0x0. ​

​11:01:18 04/07/2014 AreLunsSupported: Iterating through 1 LUNs. ​

​11:01:18 04/07/2014 VDS_LUN_INFORMATION for ​​\\.\PHYSICALDRIVE4​​: ​

​ m_version = 1​

​ m_DeviceType = 0​

​ m_DeviceTypeModifier = 101​

​ m_bCommandQueueing = 1​

​ m_BusType = 6​

​ m_szVendorId = DGC​

​ m_szProductId = RAID 5​

​ m_szProductRevision = 0326​

​ m_szSerialNumber = CK200081900XXX​

​ m_diskSignature = 09b021d7-0000-0000-0000-000000000000​

​ There are 2 device descriptors:​

​[0] m_CodeSet = 1​

​ m_Type = 3​

​ m_cbIdentifier = 16​

​ m_rgbIdentifier = 6006016069022100fa878db74b64e111​

​[1] m_CodeSet = 1​

​ m_Type = 0​

​ m_cbIdentifier = 16​

​ m_rgbIdentifier = 0003000000000000001f000000000000​

​11:01:18 04/07/2014 AreLunsSupported: Calling IsLocalDevSupportable for ​​\\.\PHYSICALDRIVE4​​ ​

​11:01:18 04/07/2014 IsLocalDevSupportable: Entering. ​

​11:01:18 04/07/2014 IsLocalDevSupportable: Device = ​​\\.\PHYSICALDRIVE4​​, lContext = 0x0. ​

​11:01:18 04/07/2014 iPdevShow: Entering with ​​\\.\PHYSICALDRIVE4​​. ​

​11:01:18 04/07/2014 IsLocalHostVM: Entering... ​

​11:01:18 04/07/2014 IsLocalHostVM: Leaving with success. ​

​11:01:18 04/07/2014 IsLocalHostHyperVVM: Entering... ​

​11:01:18 04/07/2014 IsLocalHostHyperVVM: Leaving with success. ​

​11:01:18 04/07/2014 iPdevShow: Calling StorPdevShow for ​​\\.\PHYSICALDRIVE4​​. ​

​11:01:18 04/07/2014 iPdevShow: StorPdevShow successfully completed. ​

​11:01:18 04/07/2014 iPdevShow: Leaving with SUCCESS. ​

​11:01:18 04/07/2014 IsLocalDevSupportable: IsSupported: TRUE. Leaving with SUCCESS. ​

​11:01:18 04/07/2014 AreLunsSupported: Device [\\.\PHYSICALDRIVE4] is supported. ​

​11:01:18 04/07/2014 AreLunsSupported: Leaving with SUCCESS. ​

​11:01:18 04/07/2014 BeginPrepareSnapshot: Entering. ​

​11:01:18 04/07/2014 BeginPrepareSnapshot: SnapshotSet ID = {3FC5E911-182D-4DD9-BCBC-36ACAF8D54E9}, Snapshot ID = {7B8A993D-61EB-462F-8E9D-F0822112B9B8}, Context = 4194304, LunCount = 1. ​

​11:01:18 04/07/2014 BeginPrepareSnapshot: Creating a new SnapshotSet. ​

​11:01:18 04/07/2014 CreateSnapshotSet: Entering. ​

​11:01:18 04/07/2014 CSnapshotSet: Entering constructor ​

​11:01:18 04/07/2014 CSnapshotSet: Leaving constructor with success. ​

​11:01:18 04/07/2014 CreateSnapshotSet: Creating SnapshotSet: 3fc5e911-182d-4dd9-bcbc-36acaf8d54e9. ​

​11:01:18 04/07/2014 CreateSnapshotSet: Leaving with SUCCESS. ​

​11:01:18 04/07/2014 BeginPrepareSnapshot: Calling DoSnapshotResourcesExist for ​​\\.\PHYSICALDRIVE4​​ ​

​11:01:18 04/07/2014 DoSnapshotResourcesExist: Entering. ​

​11:01:18 04/07/2014 DoSnapshotResourcesExist: lContext = 0x400000. ​

​11:01:18 04/07/2014 iPdevShow: Entering with ​​\\.\PHYSICALDRIVE4​​. ​

​11:01:18 04/07/2014 IsLocalHostVM: Entering... ​

​11:01:18 04/07/2014 IsLocalHostVM: Leaving with success. ​

​11:01:18 04/07/2014 IsLocalHostHyperVVM: Entering... ​

​11:01:18 04/07/2014 IsLocalHostHyperVVM: Leaving with success. ​

​11:01:18 04/07/2014 iPdevShow: Calling StorPdevShow for ​​\\.\PHYSICALDRIVE4​​. ​

​11:01:18 04/07/2014 iPdevShow: StorPdevShow successfully completed. ​

​11:01:18 04/07/2014 iPdevShow: Leaving with SUCCESS. ​

​11:01:18 04/07/2014 IsVPSnapSupported: Entering. ​

​11:01:18 04/07/2014 IsVPSnapSupported: Calling StorRepGroupRelCapGet to know the array capabilities. ​

​11:01:18 04/07/2014 IsVPSnapSupported: VP Snaps are not supported for this array. ​

​11:01:18 04/07/2014 IsVPSnapSupported: Leaving. ​

​11:01:18 04/07/2014 DoSnapshotResourcesExist: VP Snaps are not supported on [ ] array. ​

​11:01:18 04/07/2014 DoSnapshotResourcesExist: No context specified, will search for replica relationships and set the context accordingly ​

​11:01:18 04/07/2014 DoSnapshotResourcesExist: Calling StorArraySync(-1) for . ​

​11:01:18 04/07/2014 iArraySync: Entering. ​

​11:01:18 04/07/2014 iArraySync: StorArraySync(-1) for . ​

​11:01:30 04/07/2014 iArraySync: Leaving with SUCCESS. ​

​11:01:30 04/07/2014 DoSnapshotResourcesExist: Replication is Plex and DefaultToClone is [0].device array is [3](2=symm, 3= Clar) ​

​11:01:30 04/07/2014 DoSnapshotResourcesExist: Calling FindNextMirrorReplica... ​

​11:01:30 04/07/2014 FindNextMirrorReplica: Entering. ​

​11:01:30 04/07/2014 IsSnapshotRemote: Entering. ​

​11:01:30 04/07/2014 IsSnapshotRemote: Could not retrieve value, Defaults will be used ​

​11:01:30 04/07/2014 IsSnapshotRemote: Leaving with SUCCESS. ​

​11:01:31 04/07/2014 FindNextMirrorReplica: No local mirror replicas were found for />, so now looking for remote mirror replicas. ​

​11:01:31 04/07/2014 FindNextMirrorReplica: No remote mirror replicas were found for />. ​

​11:01:31 04/07/2014 FindNextMirrorReplica: Leaving with FAILURE. ​

​11:01:31 04/07/2014 DoSnapshotResourcesExist: ERROR = Device is not STOR_DEV_REP_IS_SNAPABLE. ​

​11:01:31 04/07/2014 IsArrayVNXCapable: Array is not capable of VNX Snaps!! ​

​11:01:31 04/07/2014 DoSnapshotResourcesExist: Leaving with SUCCESS. ​

​11:01:31 04/07/2014 BeginPrepareSnapshot: Resources do not exist to support device = ​​\\.\PHYSICALDRIVE4​​. ​

​11:01:31 04/07/2014 BeginPrepareSnapshot: Leaving with FAILURE = VSS_E_VOLUME_NOT_SUPPORTED_BY_PROVIDER. ​

​11:01:31 04/07/2014 BeginPrepareSnapshot: Leaving with FAILURE = VSS_E_VOLUME_NOT_SUPPORTED_BY_PROVIDER. ​

​11:04:39 04/07/2014 OnUnload: Entering. ​

​11:04:39 04/07/2014 CancelSnapshotSet: Entering. ​

​11:04:39 04/07/2014 ~CSnapshotSet: Entering destructor. ​

​11:04:39 04/07/2014 CSnapshotSet::FreeArrayData() entering. ​

​11:04:39 04/07/2014 CSnapshotSet::FreeArrayData() exiting. ​

​11:04:39 04/07/2014 CSnapshotSet::FreeLunData() entering. ​

​11:04:39 04/07/2014 CSnapshotSet::FreeLunData() exiting. ​

​11:04:39 04/07/2014 ~CSnapshotSet: Leaving destructor with success. ​

​11:04:39 04/07/2014 CancelSnapshotSet: Leaving with SUCCESS. ​

​11:04:39 04/07/2014 OnUnLoad: EMC VSS Provider has successfully unloaded. ​

​11:04:39 04/07/2014 OnUnload: Leaving with SUCCESS.​

5 Posts

April 7th, 2014 05:00

Hi Mohammed,

Thanks for your response. Sadly, I'd already seen that article and hoped it was that simple. As stated in my original post, this can happen even when we're only trying to re-run a single snapshot, when the combined size and number of unallocated LUNs in the RLP far exceed what is required It's as though there's some sort of lookup process that's failing, for whatever reason, so that even though the VSS provider SHOULD see that there are enough resources available it is failing to do so. We even snapshot each Hyper-V host serially to ensure there are a maximum of 8 snapshots running at any given time across the whole system, which kind of makes the reason for having a hardware VSS provider a bit redundant.

Any other ideas?

Regards

Richard Davies

213 Posts

April 7th, 2014 05:00

This will need deep investigation Rich. The logs above should be related to RLP issue but from what you are saying you have enough unallocated LUNs in the RLP. I would recommend opening a service request to further investigate the issue as CLARiiON Spcollect will be needed as well to check if naviagent dumps any errors when VSS send the snapshot create command..

Hope it helps.

Mohammed Salem  @yankoora

213 Posts

April 7th, 2014 05:00

Rich, May i ask about the Size of the SRC device and the size of each of your RLP devices?

This error may be related to not enough LUNs in the RLP to take the snap.

Mohammed Salem  @yankoora

213 Posts

April 7th, 2014 05:00

I believe what i guessed above was correct. Kindly find the attached EMC KB78913 written by my colleague Michel for the same symptoms you reported. Like i said, It seems to be RLP LUNs issue. Just add more LUNs to the RLP.

Hope it helps.

Mohammed Salem @yankoora

1 Attachment

4.5K Posts

April 7th, 2014 13:00

For Snaps, the RLP needs a minimum size - something like 0.2 percent of the size of the LUN being snapped. Also you should configure RLP"s to have at least 10-20% of the source LUN. See KB 40472 for general guidelines and KB 75281 for the minimum size requirements.

glen

5 Posts

April 8th, 2014 01:00

Thanks for your response Glen, but as previously detailed this does not appear to be the reason for the failure. The RLP has at least 8 unallocated LUNs which more that meet the size requirements to allow the snapshot to be created.

I suspect the process by which the VSS provider is determining the available resources is in some way failing to recognise that fact and is incorrectly terminating the process. The only difference between a 'successful' and 'failed' attempt in the hwprov.log (set for debug logging) is the line 'DoSnapshotResourcesExist: ERROR = Device is not STOR_DEV_REP_IS_SNAPABLE'. There's no level of logging above 'debug' is there? Some sort of superverbose mode?

Regards

Richard Davies

213 Posts

April 8th, 2014 01:00

Rich, Can you try try the below steps:

  1. Run regedit.exe.   
  2. Go to HKEY_LOCAL_MACHINE\Software\EMC\ShadowCopy.
  3. Change the LogLevel value from ·Error· to ·Debug·.
  4. Close regedit.exe
  5. If running, stop and restart the EMC VssProvider and Volume ShadowCopy Service services. 

Hope it helps

Mohammed Salem @yankoora

5 Posts

April 8th, 2014 07:00

Hi Mohammed,

We already had this log (hwprov.log) set to 'Debug'. It doesn't give any more useful information for troubleshooting than if it's set to 'Error' - a lot more information, but nothing which gives any indication of the problem. There's nothing obvious in the sp_collect files, which isn't surprising as I believe the process is failing at the 'checking' stage and doesn't even try to create a snapshot. Amusingly, if you try to create the snapshot again (sometimes it takes several retries) it eventually works, despite there being no change in the availability of resources on the array.....

Regards

Richard Davies

213 Posts

April 9th, 2014 15:00

Sorry to hear this Rich, Debug logs are always meant to be investigated by EMC Engineering and we usually request them when we need to open a ticket with engineering for further investigation so it is hard to understand them in normal situations.

I still recommend opening a Service Request with EMC Support and they will do their best to help you.

Hope it helps.

Mohammed Salem @yankoora

5 Posts

April 30th, 2014 06:00

Hi Mohammed,

Thanks for your input with this issue. We managed to run for  almost 3 weeks without a single 'STOR_DEV_REP_IS_SNAPPABLE' error, despite making no obvious changes to any part of the system. Sadly, it returned with a vengeance so we're looking at raising this as a service call.

Regards

Richard Davies

No Events found!

Top