VPLEX: Extent and Distributed Device component went to critical failure state after creating a Distributed device
Summary: This article talks to when a virtual volume and distributed device (DD) shows major failure whereas the extent and DD component went to critical failure state and what to do.
Symptoms
After creating a distributed device, the extent and distributed device component of the attached device went to "critical-failure" state.
Sample Output:
VPlexcli:/> show-use-hierarchy clusters/cluster-*/virtual-volumes/S_OSL_V1V2-PROD_12 storage-view: OSL-V1V2-TEST_Y3 (cluster-2-Y3) storage-view: OSL-V1V2-TEST_D1 (cluster-1-D1) consistency-group: cg_D1 (synchronous) virtual-volume: S_OSL_V1V2-PROD_12 (2T, minor-failure, distributed @ cluster-1-D1, running) distributed-device: dd_VNX_0914_206_1_vol_1 (2T, raid-1, minor-failure) distributed-device-component: device_VNX_0914_206_1_vol_12019Jun06_095143 (2T, raid-0, cluster-1-D1) extent: extent_VNX_0914_206_1_vol_1 (2T) storage-volume: VNX_0914_206_1_vol (2T) logical-unit: VPD83T3:600601601e203900ec02e3ae3f88e911 storage-array: EMC-CLARiiON-CKM00142500914 distributed-device-component: device_VNX_1278_111_1_vol_1 (2T, raid-0, critical-failure, cluster-2-Y3) <<<<< extent: extent_VNX_1278_111_1_vol_1 (2T, critical-failure) <<<<< storage-volume: VNX_1278_111_1_vol (2T) logical-unit: VPD83T3:6006016013c03900a73fc8313f88e911 storage-array: EMC-CLARiiON-CKM00143801278
Cause
After attaching a mirror leg to an existing device, the extent of the attached device went to critical failure. This could be because a rebuild is initialized and the device is rebuilding.
This can be checked in firmware logs:
128.221.253.37/cpu0/log:5988:W/"0060165465f564526-2":48947:<6>2019/06/06 09:51:46.55: amf/7 Added mirror to amf "device_VNX_0914_206_1_vol_1": added amf "device_VNX_1278_111_1_vol_1" into slot 1 128.221.253.36/cpu0/log:5988:W/"0060165468b17011-2":46412:<6>2019/06/06 09:51:46.56: amf/7 Added mirror to amf "device_VNX_0914_206_1_vol_1": added amf "device_VNX_1278_111_1_vol_1" into slot 1 128.221.252.37/cpu0/log:5988:W/"0060165465f564526-2":48961:<5>2019/06/06 09:59:17.53: amf/21 raid 1 rebuild: device_VNX_0914_206_1_vol_1: child node 1 (device_VNX_1278_111_1_vol_1) rebuild started (full rebuild, rebuild line 2475072 blocks)
Check the status of the Extent. Extent can be marked out-of-date since rebuilding:
VPlexcli:/> ll /clusters/cluster-2-Y3/storage-elements/extents/extent_VNX_1278_111_1_vol_1 /clusters/cluster-2-Y3/storage-elements/extents/extent_VNX_1278_111_1_vol_1: Name Value ----------------------------- ------------------------------------------------ application-consistent false block-count 536870912 block-offset 0 block-size 4K capacity 2T description - health-indications [out of date] <<<<< health-state critical-failure <<<<< io-status alive itls 0x50001442906ca510/0x500601610860538d/9, 0x50001442906ca511/0x500601600860538d/9, 0x50001442906ca510/0x500601680860538d/9, 0x50001442906ca511/0x500601690860538d/9, 0x50001442806c8d11/0x500601600860538d/9, 0x50001442806c8d10/0x500601610860538d/9, 0x50001442806c8d10/0x500601680860538d/9, 0x50001442806c8d11/0x500601690860538d/9, 0x50001442906c8d11/0x500601690860538d/9, 0x50001442906c8d10/0x500601610860538d/9, ... (16 total) locality - operational-status error storage-volume VNX_1278_111_1_vol storage-volumetype normal system-id SLICE:206c8db5c53ed089 thin-capable false underlying-storage-block-size 512 use used used-by [device_VNX_0914_206_1_vol_1] vendor-specific-name DGC
Resolution
-
Check the status of the rebuild to verify if the rebuild is still running on the device:
VPlexcli:/> rebuild status [1] storage_volumes marked for rebuild Global rebuilds: device rebuild type rebuilder director rebuilt/total percent finished throughput ETA --------------------------- ------------ ------------------ ------------- ---------------- ---------- --------- device_VNX_0914_206_1_vol_1 full s1_6985_spa 1.44T/2T 72.13% 171M/s 57.1min
-
You must wait until the rebuild completes to do this run the command in step 1 again after the allotted time shown for the ETA to see if the rebuild has completed:
VPlexcli:/> rebuild status Global rebuilds: No active global rebuilds. Local rebuilds: No active local rebuilds.
-
Then once you see that the rebuild has been completed, run the following command again to ensure that the critical-failure state has cleared:
show-use-hierarchyVPlexcli:/> show-use-hierarchy clusters/cluster-*/virtual-volumes/S_OSL_V1V2-PROD_12 storage-view: OSL-V1V2-TEST_Y3 (cluster-2-Y3) storage-view: OSL-V1V2-TEST_D1 (cluster-1-D1) consistency-group: cg_D1 (synchronous) virtual-volume: S_OSL_V1V2-PROD_12 (2T, distributed @ cluster-2-Y3, running) distributed-device: dd_VNX_0914_206_1_vol_1 (2T, raid-1) distributed-device-component: device_VNX_0914_206_1_vol_12019Jun06_095143 (2T, raid-0, cluster-1-D1) extent: extent_VNX_0914_206_1_vol_1 (2T) storage-volume: VNX_0914_206_1_vol (2T) logical-unit: VPD83T3:600601601e203900ec02e3ae3f88e911 storage-array: EMC-CLARiiON-CKM00142500914 distributed-device-component: device_VNX_1278_111_1_vol_1 (2T, raid-0, cluster-2-Y3) <<<< extent: extent_VNX_1278_111_1_vol_1 (2T) <<<< storage-volume: VNX_1278_111_1_vol (2T) logical-unit: VPD83T3:6006016013c03900a73fc8313f88e911 storage-array: EMC-CLARiiON-CKM00143801278
-
If after the "rebuild status" shows the rebuilds have completed and you run the "show-use-hierarchy," the distributed device component and device still show in a "critical-failure" state you must verify the health of the storage-volume. If the storage-volume is in a "critical-failure," refer to the Knowledge Base articles:
Open a Live Chat with Dell Technologies Customer Service for additional assistance and refer to this article.