[ISILON] Gen 6: Wrong drive sled slot number reported in drive replacement alert
Summary: Sometimes drive replacement dialhomes (100010051) generated in Gen6 Isilon nodes indicate the wrong sled slot number for replacement.
Symptoms
Gen6 nodes running OneFS version 8.1.2 sometimes generate drive replacement events indicating the wrong sled bay needs replacement. The alert does show the correct LNUM for the drive needing to be replaced.
Example:
-
Event generated by oneFS for Drive Sled A, Slot 1.
X - Disk Repair Complete: The following drive is ready to be replaced. Chassis Serial Number XXXXXX, Node 1, Sled A, Slot 1, Type HDD, LNUM 4.
-
However, Drive Sled A, Slot 1 is healthy and Drive Sled A, Slot 2 is Faulty.
Test-Isilon-1# isi devices drive list
Lnn Location Device Lnum State Serial Sled
---------------------------------------------------------
1 Bay 1 /dev/da2 15 HEALTHY XXXXXX N/A
1 Bay 2 /dev/da1 16 HEALTHY XXXXXX N/A
1 Bay A0 /dev/da3 14 HEALTHY XXXXXX A
1 Bay A1 /dev/da4 13 HEALTHY XXXXXX A >>>>>>>>>>>>>>>>>>> Healthy
1 Bay A2 /dev/da13 4 REPLACE XXXXXX A >>>>>>>>>>>>>>>>>>> Faulty
- After replacement the resolved event will also generate for the wrong sled bay.
Resolved: Disk Repair Complete: The following drive is ready to be replaced. Chassis Serial Number XXXXXX, Node 1, Sled A, Slot 1, Type HDD, LNUM 4.
Cause
Resolution
Workaround:
-
Use the LNUM indicated in the alert to identify which drive to replace. As shown in the example above, the LNUM can be mapped to a sled and bay using the output of the 'isi devices drive list' command.
Resolution:
- This issue has already been fixed in OneFS 8.2.0 and newer, so upgrading OneFS will prevent reoccurrence. Isilon engineering is working to fix this issue in an upcoming rollup patch release for OneFS 8.1.2.