PowerFlex: ESXi hosts lose access to volumes due to IO_FAULT_RESERVATION_CONFLICT
Summary: ESXi hosts lose access to ScaleIO volumes due to IO_FAULT_RESERVATION_CONFLICT.
Symptoms
This can happen when one of the hosts sharing the same volumes has some changes. It can happen when:
- one of the ESXi hosts is rebooted
- The volume is mapped to a new ESXi host
Symptoms
Error messages not limited to the following might be seen in the VMkernel log:
2017-07-10T17:57:40.059Z cpu2:32967)HBX: 2961: Waiting for timed out [HB state abcdef02 offset 3526656 gen 51 stampUS 6291663389308 uuid 5903bfd0-a9cf888e-56c8-1402ec750038 jrnl <FB 1399400> drv 14.60] on vol '<Datastore Name>'
...
2017-07-10T17:57:44.312Z cpu30:32851)ScsiDeviceIO: 2369: Cmd(0x412e88aa3740) 0x2a, CmdSN 0x58dbf4 from world 32813 to dev "eui.<MDM-ID><VOL-ID>" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x0 0x0.
...
2017-07-10T17:58:19.646Z cpu9:32872)HBX: 612: Reading HB at 3527168 on vol '<Datastore Name>' failed: Timeout
...
2017-07-10T17:59:33.661Z cpu3:34122)Fil3: 15438: Max retries (10) exceeded for caller Fil3_FileIO (status 'IO was aborted by VMFS via a virt-reset on the device')
2017-07-10T17:59:33.661Z cpu3:34122)BC: 2288: Failed to write (uncached) object '.iormstats.sf': Maximum kernel-level retries exceeded
2017-07-10T17:59:33.682Z cpu18:34122)Fil3: 15438: Max retries (10) exceeded for caller Fil3_FileIO (status 'IO was aborted by VMFS via a virt-reset on the device')
2017-07-10T17:59:33.682Z cpu18:34122)BC: 2288: Failed to write (uncached) object '.iormstats.sf': Maximum kernel-level retries exceeded
2017-07-10T17:59:33.751Z cpu18:34122)Fil3: 15438: Max retries (10) exceeded for caller Fil3_FileIO (status 'IO was aborted by VMFS via a virt-reset on the device')
2017-07-10T17:59:33.751Z cpu18:34122)BC: 2288: Failed to write (uncached) object '.iormstats.sf': Maximum kernel-level retries exceeded
With newer versions of SDC, the scini module might indicate reservation conflicts:
2017-07-10T21:42:40.839Z cpu12:33503)scini: mapVolIO_ReportIOErrorIfNeeded:361: ScaleIO R2_0:[211482] IO-ERROR comb: <Comb-ID>. offsetInComb 0. SizeInLB 1. SDS_ID <SDS-ID>. Comb Gen b. Head Gen 2653.
2017-07-10T21:42:40.839Z cpu12:33503)scini: mapVolIO_ReportIOErrorIfNeeded:374: ScaleIO R2_0:Vol ID 0x<VOL-ID>. Last fault Status SUCCESS(65).Last error Status SUCCESS(65) Reason (reservation conflict) Retry count (0) chan (1)
2017-07-10T21:42:40.839Z cpu12:33503)scini: blkScsi_PrintIOInfo:3304: ScaleIO R2_0:hCmd 0x4136803d9d40, OpCode 0x28, rc 53 scsiStat 24, senseCode 5, asc 0, ascq 0
2017-07-10T21:42:41.792Z cpu12:32833)ScsiDeviceIO: 2369: Cmd(0x4136803d9d40) 0x28, CmdSN 0x3bb from world 0 to dev "eui.<MDM-ID VOL-ID>" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x0 0x0.
2017-07-10T21:42:41.793Z cpu34:33507)Partition: 423: Failed read for "eui.<MDM-ID VOL-ID>": I/O error
2017-07-10T21:42:41.793Z cpu34:33507)Partition: 1003: Failed to read protective mbr on "eui.<MDM-ID VOL-ID>" : I/O error
2017-07-10T21:42:41.793Z cpu34:33507)WARNING: Partition: 1112: Partition table read from device eui.<MDM-ID VOL-ID> failed: I/O error
2017-07-10T21:42:41.793Z cpu34:33507)ScsiDevice: 3445: Successfully registered device "eui.<MDM-ID VOL-ID>" from plugin "NMP" of type 0
2017-07-10T21:42:41.793Z cpu34:33507)ScsiEvents: 301: EventSubsystem: Device Events, Event Mask: 180, Parameter: 0x410ae65aa220, Registered!
In SDS trc:
10/07 13:58:30.985746 0x7efcf296ceb0:ioh_NewRequest:05490: Write to comb <Comb ID> - Done rc is IO_FAULT_RESERVATION_CONFLICT (Lba 710960 16), volume <VOL ID> (dit)
Impact
The entire cluster may lose access to the volumes, or experience severe performance degradation.
Cause
One ESXi node has reserved the volume, preventing all other nodes from accessing it.
Resolution
Do not unmap volumes from, or remove SDCs. This will not help in clearing the scsi reservation and might make troubleshooting difficult.
As an immediate workaround, release the reservation on the LUN, or reset the LUN:
-
The reservation holder must issue the release command.
The initiator that holds the reservation might be identified following this VMware KB: Investigating Virtual Machine file locks on ESXi Hosts . From the host, issue this command to release the reservation:
vmkfstools -L release /vmfs/devices/disks/eui.<mdm-id vol-id>
Otherwise, from any host that has the volume mapped, issue the following command to reset the volume. This drops all inflight IOs to the volume and should be used with caution. Also see VMware KB: Resolving SCSI reservation conflicts
vmkfstools -L lunreset /vmfs/devices/disks/eui.<mdm-id vol-id>
This issue is seen when the hosts are configured differently regarding datastore locking, or previous configurations on all hosts were not effective, and rebooting one of them made it start to behave differently than others. To avoid this issue in the future, ensure all hosts sharing the same volumes and datastores are configured in the same way and the new configuration is loaded and effective simultaneously. Also, choose ATS over SCSI-2 reservation where possible.
ATS-locking can be configured at different levels.
See VMware KB: Disabling hardware accelerated locking (ATS) in ESXi
-
On a datastore:
This can be seen with this command:
vmkfstools -Ph -v1 /vmfs/volumes/VMFS-volume-name
-
In the output, "Mode: Public ATS-Only"means ATS can be used on the datastore;"Mode: Public" means that SCSI-2 reservation will be used.
- On the host level
This is the "HardwareAcceleratedLocking" parameter in /etc/vmware/esx.conf.
When this parameter is not present in the configuration file, or set to 1, it is enabled and the host can use ATS locking. If it is set to 0, ATS-locking is disabled. Even for "public ATS-Only" datastores, the host would use SCSI-2 reservation.
- Also on the host level, whether to use ATS for a heartbeat on VMFS5 datastores:
This is the "UseATSForHBOnVMFS5" parameter in /etc/vmware/esx.conf.
When this parameter is not present in the configuration file, or set to 1, it is enabled and the host will use ATS for VMFS5 heartbeat. Otherwise, an SCSI-2 reservation is used.
All hosts must be configured in the same way.
After any changes, VMware KB requires to unmount and then mount the datastores manually:
- When complete, mount/unmount the datastores on all ESXi hosts one by one.
- If you reboot the ESXi host to achieve the unmount/remount, you will potentially end up in a mixture, that is, ATS+SCSI locking until all the hosts are rebooted which is not recommended. Therefore, you must unmount/remount manually.
Impacted versions
All PowerFlex versions The issue described is an ESXi cluster configuration issue.