PowerPath for Windows 6.1: Microsoft Failover cluster resources go offline during a volume expansion
Summary: Microsoft Failover cluster resources go offline during a volume expansion in PowerPath for Windows 6.1.
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
LUN expansion performed with XtremIO.
Microsoft Failover cluster resources fail or go offline during a volume expansion on XtremIO array and before completing the expansion on the host side.
Microsoft cluster control holds SCSI reservation but goes offline. XtremIO receives SCSI reservation release later, and immediately acknowledges release once it is received. Communication errors are seen causing the node to be removed from the cluster after seeing the LUNs being expanded.
Operating System Version:
Windows Server 2012 Standard
Dell EMC PowerPath Version 6.1 (build 295).
All paths are active and alive with no errors.
System event log:
Microsoft Failover cluster resources fail or go offline during a volume expansion on XtremIO array and before completing the expansion on the host side.
Microsoft cluster control holds SCSI reservation but goes offline. XtremIO receives SCSI reservation release later, and immediately acknowledges release once it is received. Communication errors are seen causing the node to be removed from the cluster after seeing the LUNs being expanded.
Operating System Version:
Windows Server 2012 Standard
Dell EMC PowerPath Version 6.1 (build 295).
All paths are active and alive with no errors.
System event log:
10/16/2017 11:34:48 PM Error XXXXXXX-.activ 7034 Service Control Manager The SQL Server (XXXXX) service terminated unexpectedly. It has done this 1 time(s). 10/16/2017 11:34:22 PM Information XXXXXXX-..activ 7036 Service Control Manager The SQL Server Agent ( XXXXX) service entered the stopped state. 10/16/2017 11:34:20 PM Error XXXXXXX-..activ 1069 Microsoft-Windows-FailoverCluste Cluster resource 'XXXXX' of type 'Physical Disk' in clustered role 'SQL Server (XXXXX)' failed. Based on the 10/16/2017 11:34:20 PM Error XXXXXXX-..activ 1038 Microsoft-Windows-FailoverCluste Ownership of cluster disk 'XXXXX' has been unexpectedly lost by this node. Run the Validate a Configuration wizard 10/16/2017 11:34:17 PM Warning XXXXXXX-..activ 140 Microsoft-Windows-Ntfs The system failed to flush data to the transaction log. Corruption may occur in VolumeId: O:, DeviceName: \Device\HarddiskVolu 10/16/2017 11:34:17 PM Warning XXXXXXX-..activ 151 disk The capacity of Disk 1 has changed. 10/16/2017 11:34:17 PM Warning XXXXXXX-..activ 151 disk The capacity of Disk 1 has changed. 10/16/2017 11:34:17 PM Warning XXXXXXX-..activ 151 disk The capacity of Disk 1 has changed. 10/16/2017 11:34:16 PM Warning XXXXXXX-..activ 151 disk The capacity of Disk 1 has changed. 10/16/2017 11:34:16 PM Warning XXXXXXX-..activ 151 disk The capacity of Disk 1 has changed. 10/16/2017 11:34:15 PM Warning XXXXXXX-..activ 151 disk The capacity of Disk 1 has changed. 10/16/2017 11:27:30 PM Information XXXXXXX-..activ 7036 Service Control Manager The Windows Modules Installer service entered the stopped state.
System event log:
11/13/2017 10:41:22 PM Error XXXXXXX-..activ 1069 Microsoft-Windows-FailoverCluste Cluster resource 'Cluster Disk 1' of type 'Physical Disk' in clustered role 'StorageTest' failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet. 11/13/2017 10:41:22 PM Error XXXXXXX-..activ 1038 Microsoft-Windows-FailoverCluste Ownership of cluster disk 'Cluster Disk 1' has been unexpectedly lost by this node. Run the Validate a Configuration wizard to check your storage configuration. 11/13/2017 10:41:22 PM Warning XXXXXXX-..activ 140 Microsoft-Windows-Ntfs The system failed to flush data to the transaction log. Corruption may occur in VolumeId: X:, DeviceName: \Device\HarddiskVolume17. (A device which does not exist was specified.) 11/13/2017 10:41:22 PM Warning XXXXXXX-..activ 151 disk The capacity of Disk 17 has changed. 11/13/2017 10:41:22 PM Warning XXXXXXX-..activ 151 disk The capacity of Disk 17 has changed. 11/13/2017 10:41:22 PM Warning XXXXXXX-..activ 151 disk The capacity of Disk 17 has changed.
Cluster.log:
00001364.000013f0::2017/11/13-22:41:22.599 INFO [RES] Physical Disk: PNPDEBUG: reset notification handle 0x35b73da0
00001364.000013f0::2017/11/13-22:41:22.599 INFO [RES] Physical Disk: PNPDEBUG: UnregisterDeviceNotification handle 0000001D35B73DA0
00001364.000013f0::2017/11/13-22:41:22.599 INFO [RES] Physical Disk: PNP: \\?\STORAGE#Volume#{7b3fcfa4-c894-11e7-93ff-0025b505a19f}#0000000000100000#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b} volume disappeared
00001364.000013f0::2017/11/13-22:41:22.599 INFO [RES] Physical Disk: PnpRemoveVolume: Removing volume \\?\STORAGE#Volume#{7b3fcfa4-c894-11e7-93ff-0025b505a19f}#0000000000100000#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b}
00001364.00002404::2017/11/13-22:41:22.600 INFO [RES] Physical Disk <Cluster Disk 1>: PNP: HardDiskpSetPnpUpdateTimePropertyWorker: status 0
000008ac.000036d0::2017/11/13-22:41:22.600 INFO [GEM] Node 3: Sending 1 messages as a batched GEM message with gid 7190
00001364.00005034::2017/11/13-22:41:22.600 INFO [RES] Physical Disk: HarddiskpIsDiskCsv: IOCTL_DISK_GET_CLUSTER_INFO: device \Device\Harddisk17\Partition0, IsClustered 1 IsCsv 0 InMaintenance 0
00001364.00005034::2017/11/13-22:41:22.600 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk17\ClusterPartition1, status 0xc0000034
00001364.00005034::2017/11/13-22:41:22.600 ERR [RES] Physical Disk: HarddiskpIsPartitionHidden: failed to open device \Device\Harddisk17\ClusterPartition1, status 2
00001364.00005034::2017/11/13-22:41:22.600 ERR [RES] Physical Disk <Cluster Disk 1>: HardDiskpGetVolumeInfo: Cant tell if disk 17 partition 1 is hidden 2
00001364.00005034::2017/11/13-22:41:22.600 ERR [RES] Physical Disk <Cluster Disk 1>: PnpUpdateDiskConfigThread: Failed to get volume info status 2
000008ac.000023e8::2017/11/13-22:41:22.602 INFO [NM] Received request from client address fe80::6d90:4572:db28:58f7.
00001364.000024d8::2017/11/13-22:41:22.946 ERR [RES] Physical Disk <Cluster Disk 1>: IsAlive sanity check failed!, pending IO completed with status 1117.
00001364.000024d8::2017/11/13-22:41:22.946 ERR [RES] Physical Disk <Cluster Disk 1>: IsAlive sanity check failed!, pending IO completed with status 1117.
00001364.000024d8::2017/11/13-22:41:22.946 WARN [RHS] Resource Cluster Disk 1 IsAlive has indicated failure.Cause
An intermittent issue in PowerPath 6.1 code is causing this issue. A change in the Persistent Reservation IN (PRI) load balancing is leading to cluster disk failure during the LUN expansion.
Resolution
To resolve this issue upgrade to PowerPath 6.3 or above. PowerPath for Windows can be downloaded from the following web link: https://support.emc.com/products/1781
Workaround:
If upgrading is not possible, you can reduce the occurrence of the issue by performing a LUN expansion with active paths that are reduced to half the HBAs. There is still a risk.
Workaround:
If upgrading is not possible, you can reduce the occurrence of the issue by performing a LUN expansion with active paths that are reduced to half the HBAs. There is still a risk.
- Run 'powermt display bus' from the command prompt. That shows the HBA# and target relationship.
- Run 'powermt set mode=standby hba=<hba#> dev=<dev>' command. This command allows to only send half the paths with I/O to a specific device thereby leaving one initiator HBA active and the other in standby mode assuming only two x HBAs in use.
- Expand the LUN as planned.
- Run 'powermt set mode=active hba=<hba#> dev=<dev>' command to restore default state.
Additional Information
Note: There are two x issues that are found during the investigation into this issue.
For example, Thin provisioning threshold crossing, or Inquiry data changes for any reason (LU nice name changes) could lead to problems with how Windows handles these failures.
- PowerPath 6.1: Load balancing of cluster Persistent reservation IN (PRI) commands.
- PowerPath 6.1: Failure of Report LUs Data Changed UA to get a bus rescan.
For example, Thin provisioning threshold crossing, or Inquiry data changes for any reason (LU nice name changes) could lead to problems with how Windows handles these failures.
Affected Products
PowerPathProducts
PowerPathArticle Properties
Article Number: 000064173
Article Type: Solution
Last Modified: 17 Oct 2025
Version: 4
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.