VxRail: Troubleshooting Drive Issues Using The Vxtii Output
Summary: This article will cover troubleshooting VxRail disk/drive issues using the vxtii output.
Instructions
Example 1: Capacity or cache drive failed dedupe disabled
A: Drive is only showing in VDQ
If your example looks like the below output, we have a failed drive that can only be seen in the VDC output. The lclogs have wrapped, and the drive no longer shows in the TSRs. Compare to a known good node to know which drive slot is missing, as they should all match. In this case, slot 6 is empty, and it is the drive in slot 6 with the part number F0VFY that needs to be replaced.
7 of 8 vSAN drives, status=(Bay.Q01:Offline); Capacity used: 24% of total 37558 GB; Deduplication: Disabled
------------------------------------------------------------------------------------------------------------------------
Disk Status Vendor Size(GB) Serial# Firmware Part PDL CMMDS Media Role DGrp Model
------------------------------------------------------------------------------------------------------------------------
00 OK SAMSUNG 1788 S40DNX0M705130 DSF8 F0VFY no Y SSD Cap 4e41 MZILT1T9HAJQ0D3
01 OK SAMSUNG 1788 S40DNX0M705125 DSF8 F0VFY no Y SSD Cap 4e41 MZILT1T9HAJQ0D3
02 OK TOSHIBA 1788 10P0A134TNUF B01C TDNP7 no Y SSD Cap 4e41 KPM5XRUG1T92
05 OK SAMSUNG 1788 S40DNX0M705123 DSF8 F0VFY no Y SSD Cap 31b3 MZILT1T9HAJQ0D3
07 OK TOSHIBA 1788 10P0A136TNUF B01C TDNP7 no Y SSD Cap 31b3 KPM5XRUG1T92
20 OK SAMSUNG 1490 S40ENX0M703701 DWF8 DR0HX no Y SSD Cache 4e41 MZILT1T6HAJQ0D3
21 OK SAMSUNG 1490 S40ENX0M703717 DWF8 DR0HX no T SSD Cache 31b3 MZILT1T6HAJQ0D3
Q01 Offline - 0 No_core_path - - PDL No! HDD Cap - Only_VDQ_entry_found
B: Drive is failed PDL or CMMDS no but still in idrac.
In this case, the slot and the part number are easily shown. We are using slot 7 as the example here. If either PDL Y or CMMDS No! are triggered, the drive should be replaced on the first occurrence.
Part number and slot are shown in the table and can be used for dispatch.
========================================================================================================================
10 of 10 vSAN drives, status=(Bay.7:Unknown); Capacity used: 20% of total 50077 GB; Deduplication: Disabled.
------------------------------------------------------------------------------------------------------------------------
Disk Status Vendor Size(GB) Serial# Firmware Part PDL CMMDS Media Role DGrp Model
------------------------------------------------------------------------------------------------------------------------
00 OK DELL 1788 S37PNX0K701056 GC5B 9W12R no Y SSD Cap 1ed6 MZ7LM1T9HMJP0D3
01 OK DELL 1788 S37PNX0K701040 GC5B 9W12R no Y SSD Cap 1ed6 MZ7LM1T9HMJP0D3
02 OK DELL 1788 S37PNX0K701044 GC5B 9W12R no Y SSD Cap 1ed6 MZ7LM1T9HMJP0D3
03 OK DELL 1788 S37PNX0K700167 GC5B 9W12R no Y SSD Cap 1ed6 MZ7LM1T9HMJP0D3
04 OK DELL 1788 S37PNX0K701053 GC5B 9W12R no Y SSD Cap 5a7f MZ7LM1T9HMJP0D3
05 OK DELL 1788 S37PNX0K701049 GC5B 9W12R no Y SSD Cap 5a7f MZ7LM1T9HMJP0D3
06 OK DELL 1788 S37PNX0K701046 GC5B 9W12R no Y SSD Cap 5a7f MZ7LM1T9HMJP0D3
07 Unknown DELL 1788 S37PNX0K700163 GC5B 9W12R no No! SSD ? - MZ7LM1T9HMJP0D3
08 OK SAMSUNG 745 S39YNX0K803422 1.1.2 - no Y NVMe Cache 1ed6 Dell Express Flash PM1725a 800GB SFF
09 OK SAMSUNG 745 S39YNX0K803397 1.1.2 - no Y NVMe Cache 5a7f Dell Express Flash PM1725a 800GB SFF
Example 2: Capacity drive failed dedupe enabled
In the below example dedupe is enabled and we have a failed capacity drive. It is dedupe enabled we can see that a whole disk group is offline.
This entry here tells us that only the vdq command on the node shows the disk and it is removed from everywhere else.
The No! signifies that the drive is not part of VSAN anymore.
Q01 Offline - 0 No_core_path - - PDL No! HDD Cap - Only_VDQ_entry_found
Looking at the rest of the drives available in the DG we can tell that this drive must be a capacity drive as the cache drive in that DG is available.
7 of 8 vSAN drives, status=(Bay.Q01:Offline); Capacity used: 24% of total 37558 GB; Deduplication: Enabled.
------------------------------------------------------------------------------------------------------------------------
Disk Status Vendor Size(GB) Serial# Firmware Part PDL CMMDS Media Role DGrp Model
------------------------------------------------------------------------------------------------------------------------
00 OK SAMSUNG 1788 S40DNX0M705130 DSF8 F0VFY no Y SSD Cap 4e41 MZILT1T9HAJQ0D3
01 OK SAMSUNG 1788 S40DNX0M705125 DSF8 F0VFY no Y SSD Cap 4e41 MZILT1T9HAJQ0D3
02 OK TOSHIBA 1788 10P0A134TNUF B01C TDNP7 no Y SSD Cap 4e41 KPM5XRUG1T92
05 OK SAMSUNG 1788 S40DNX0M705123 DSF8 F0VFY no No! SSD Cap 31b3 MZILT1T9HAJQ0D3
07 OK TOSHIBA 1788 10P0A136TNUF B01C TDNP7 no No! SSD Cap 31b3 KPM5XRUG1T92
20 OK SAMSUNG 1490 S40ENX0M703701 DWF8 DR0HX no Y SSD Cache 4e41 MZILT1T6HAJQ0D3
21 OK SAMSUNG 1490 S40ENX0M703717 DWF8 DR0HX no No! SSD Cache 31b3 MZILT1T6HAJQ0D3
Q01 Offline - 0 No_core_path - - PDL No! HDD Cap - Only_VDQ_entry_found
Comparing vs another known good node we can see that it is slot 6 with the disk missing.
In this case we should order a replacement drive and we can use the part number from another capacity drive in this case we should order PPID F0VFY.
Once the drive has been ordered we should also recreate the DG without the failed drive.
Known good
8 of 8 vSAN drives, status=OK; Capacity used: 24% of total 37558 GB; Deduplication: Enabled.
------------------------------------------------------------------------------------------------------------------------
Disk Status Vendor Size(GB) Serial# Firmware Part PDL CMMDS Media Role DGrp Model
------------------------------------------------------------------------------------------------------------------------
00 OK SAMSUNG 1788 S40DNX0M705122 DSF8 F0VFY no Y SSD Cap 4451 MZILT1T9HAJQ0D3
01 OK SAMSUNG 1788 S40DNX0M705110 DSF8 F0VFY no Y SSD Cap 4451 MZILT1T9HAJQ0D3
02 OK TOSHIBA 1788 10P0A0J9TNUF B01C TDNP7 no Y SSD Cap 4451 KPM5XRUG1T92
05 OK SAMSUNG 1788 S40DNX0M705108 DSF8 F0VFY no Y SSD Cap 28ed MZILT1T9HAJQ0D3
06 OK SAMSUNG 1788 S40DNX0M705117 DSF8 F0VFY no Y SSD Cap 28ed MZILT1T9HAJQ0D3
07 OK TOSHIBA 1788 5070A07KTNUF B01C TDNP7 no Y SSD Cap 28ed KPM5XRUG1T92
20 OK SAMSUNG 1490 S40ENX0M703685 DWF8 DR0HX no Y SSD Cache 4451 MZILT1T6HAJQ0D3
21 OK SAMSUNG 1490 S40ENX0M703681 DWF8 DR0HX no Y SSD Cache 28ed MZILT1T6HAJQ0D3
Example 3: Cache drive failed dedupe enabled
When the cache drive is disabled and dedupe is enabled the DG will be offline. We cannot bring it back online without replacing the cache drive.
We can see here we have a drive that is only seen in vdq on the node so we know one is missing from vsan.
Q01 Offline - 0 No_core_path - - PDL No! HDD Cap - Only_VDQ_entry_found
7 of 8 vSAN drives, status=(Bay.Q01:Offline); Capacity used: 24% of total 37558 GB; Deduplication: Enabled.
------------------------------------------------------------------------------------------------------------------------
Disk Status Vendor Size(GB) Serial# Firmware Part PDL CMMDS Media Role DGrp Model
------------------------------------------------------------------------------------------------------------------------
00 OK SAMSUNG 1788 S40DNX0M705130 DSF8 F0VFY no Y SSD Cap 4e41 MZILT1T9HAJQ0D3
01 OK SAMSUNG 1788 S40DNX0M705125 DSF8 F0VFY no Y SSD Cap 4e41 MZILT1T9HAJQ0D3
02 OK TOSHIBA 1788 10P0A134TNUF B01C TDNP7 no Y SSD Cap 4e41 KPM5XRUG1T92
05 OK SAMSUNG 1788 S40DNX0M705123 DSF8 F0VFY no No! SSD Cap 31b3 MZILT1T9HAJQ0D3
06 OK SAMSUNG 1788 S40DNX0M705117 DSF8 F0VFY no No! SSD Cap 28ed MZILT1T9HAJQ0D3
07 OK TOSHIBA 1788 10P0A136TNUF B01C TDNP7 no No! SSD Cap 31b3 KPM5XRUG1T92
20 OK SAMSUNG 1490 S40ENX0M703701 DWF8 DR0HX no Y SSD Cache 4e41 MZILT1T6HAJQ0D3
Q01 Offline - 0 No_core_path - - PDL No! HDD Cap - Only_VDQ_entry_found
Comparing vs another known good node we can see that it is slot 21 with the disk missing.
In this case we should order a replacement drive and we can use the part number from the other cache drive in this case we should order PPID DR0HX.
https://quality.dell.com/ can also be used to check for part number.
Known good
8 of 8 vSAN drives, status=OK; Capacity used: 24% of total 37558 GB; Deduplication: Enabled. ------------------------------------------------------------------------------------------------------------------------ Disk Status Vendor Size(GB) Serial# Firmware Part PDL CMMDS Media Role DGrp Model ------------------------------------------------------------------------------------------------------------------------ 00 OK SAMSUNG 1788 S40DNX0M705122 DSF8 F0VFY no Y SSD Cap 4451 MZILT1T9HAJQ0D3 01 OK SAMSUNG 1788 S40DNX0M705110 DSF8 F0VFY no Y SSD Cap 4451 MZILT1T9HAJQ0D3 02 OK TOSHIBA 1788 10P0A0J9TNUF B01C TDNP7 no Y SSD Cap 4451 KPM5XRUG1T92 05 OK SAMSUNG 1788 S40DNX0M705108 DSF8 F0VFY no Y SSD Cap 28ed MZILT1T9HAJQ0D3 06 OK SAMSUNG 1788 S40DNX0M705117 DSF8 F0VFY no Y SSD Cap 28ed MZILT1T9HAJQ0D3 07 OK TOSHIBA 1788 5070A07KTNUF B01C TDNP7 no Y SSD Cap 28ed KPM5XRUG1T92 20 OK SAMSUNG 1490 S40ENX0M703685 DWF8 DR0HX no Y SSD Cache 4451 MZILT1T6HAJQ0D3 21 OK SAMSUNG 1490 S40ENX0M703681 DWF8 DR0HX no Y SSD Cache 28ed MZILT1T6HAJQ0D3
In this example we can see that the drive is online but has been removed from VSAN. In this case we can dispatch as drive as normal on first time occurrence.
The drive is easy to identify as a failed drive with the part number. There are 10 VSAN drives but we only see 9 use in idrac and the unused drive is the drive removed from VSAN.
======================================================================================================================== 10 of 10 vSAN Drives, status=(Bay.5-1:Unused); Capacity used: 1% of total 629547 GB; Deduplication: Disabled. ------------------------------------------------------------------------------------------------------------------------ Bay-Bus Status Vendor Size(GB) Serial# Firmware Part PDL CMMDS Media Role DGrp Model ------------------------------------------------------------------------------------------------------------------------ 00-1 OK Toshiba 7153 X0L0A04KTNWF B020 GT7GT No Y SSD Cap 6670 KPM5XRUG7T68 01-1 OK Toshiba 7153 X0L0A0G0TNWF B020 GT7GT No Y SSD Cap 6670 KPM5XRUG7T68 02-1 OK Toshiba 7153 X0L0A049TNWF B020 GT7GT No Y SSD Cap 6670 KPM5XRUG7T68 03-1 OK Toshiba 7153 X0L0A04DTNWF B020 GT7GT No Y SSD Cap 6670 KPM5XRUG7T68 04-1 OK Toshiba 7153 X0L0A04GTNWF B020 GT7GT No Y SSD Cap 565b KPM5XRUG7T68 05-1 Unused Toshiba 7153 X0L0A04FTNWF B020 GT7GT No No! SSD ? - KPM5XRUG7T68 06-1 OK Toshiba 7153 X0L0A04JTNWF B020 GT7GT No Y SSD Cap 565b KPM5XRUG7T68 07-1 OK Toshiba 7153 X0L0A04TTNWF B020 GT7GT No Y SSD Cap 565b KPM5XRUG7T68 08-1 OK Toshiba 745 90D0A0U1TPAF B020 DHRVV No Y SSD Cache 6670 KPM5XMUG800G 09-1 OK Toshiba 745 90D0A0Y0TPAF B020 DHRVV No Y SSD Cache 565b KPM5XMUG800G
In this example it looks the same as the above where the drive is online but has been removed from VSAN either due to a performance issue or an offline online event.
In this case the whole Disk Group is offline, due to a failed cache drive, and we would deal with it the same as the above dedupe and compression example.
========================================================================================================================
4 of 8 vSAN Drives in use, Bay.8-1:Offline Bay.0-1:Degraded Bay.1-1:Degraded Bay.2-1:Degraded; ; Deduplication: Disabled.
------------------------------------------------------------------------------------------------------------------------
Bay-Dsk Status Vendor Size(GB) Serial# Firmware Part PDL CMMDS Media Role DGrp Worn Model
------------------------------------------------------------------------------------------------------------------------
00-1 Degraded Samsung 1788 S3BENX0J503318 DSLA 086DD No No! SSD Cap 5016 0% MZILS1T9HEJH0D3
01-1 Degraded Samsung 1788 S3BENX0J502478 DSLA 086DD No No! SSD Cap 5016 0% MZILS1T9HEJH0D3
02-1 Degraded Samsung 1788 S3BENX0K703212 DSLA 086DD No No! SSD Cap 5016 0% MZILS1T9HEJH0D3
04-1 OK Samsung 1788 S3BENX0J502576 DSLA 086DD No Y SSD Cap 5e92 0% MZILS1T9HEJH0D3
05-1 OK Samsung 1788 S3BENX0J502816 DSLA 086DD No Y SSD Cap 5e92 0% MZILS1T9HEJH0D3
06-1 OK Samsung 1788 S3BENX0K701844 DSLA 086DD No Y SSD Cap 5e92 0% MZILS1T9HEJH0D3
08-1 Offline Intel 1490 PHLN9522024D1P6AGN VDV1DP25 58V30 PDL No! NVMe Cache 5016 1% Dell Express Flash NVMe P4610 1.6TB SFF
09-1 OK Intel 1490 PHLN9522018C1P6AGN VDV1DP25 58V30 No Y NVMe Cache 5e92 2% Dell Express Flash NVMe P4610 1.6TB SFF
========================================================================================================================