Article Number: 532192

printer Print mail Email

Dell EMC Unity: How to safely replace a disk in a Dynamic Pool (User Correctable)

Summary: This KB article outlines how to identify when a disk that belongs to a Dynamic Pool is ready to be replaced.

Primary Product: Dell EMC Unity All Flash

Product: Dell EMC Unity Family more...

Last Published: 13 Mar 2020

Article Type: How To

Published Status: Online

Version: 5

Dell EMC Unity: How to safely replace a disk in a Dynamic Pool (User Correctable)

Article Content

Instructions In the GUI, when a disk faults, the Dynamic Pool status will indicate (as shown in the figure below) a status of "Degraded" "The pool performance is degraded. Check the storage system for hardware faults. Contact your service provider., A pool is rebuilding because it lost a drive. System performance may be affected during the rebuilding."

If unbound drives are available, the array will claim an unbound drive to rebuild the faulted drive to.  If there are no unbound drives available, the array will rebuild to the spare Extents within the Dynamic Pool itself.

Note : Please reference KB 540961 before proceeding
              https://support.emc.com/kb/540961

NOTE: The GUI can take up to 10 minutes to reflect the Pool in the degraded state due to GUI polling and browser caching.

User-added image



With Free Drives in the array to rebuild to: 

During the rebuild process, the initial drive that faulted displays a yellow "warning" error under My Dynamic Pool Properties.

Once the array has identified a unbound drive of the proper type, the array starts a permanent copy operation to the new drive.

During this process, the pool disk properties are updated to show the new drive that is replacing the faulted drives.

NOTE: This does not indicate that the copy has completed, this only indicates that the operation has been started.

During the rebuild operation, the pool indicates a degraded status. The pool updates its status back to "OK" once the rebuild has completed.

User-added image

To verify that the rebuild has completed, establish an SSH session to the array and run the following command grepping for the pool name:  

Command >> 15:57:32 service@spb spb:~/user> cat /EMC/backend/log_shared/EMCSystemLogFile.log | grep -i "<pool name>"

Review the output from that command to see the logging of the pools rebuild and its subsequent statuses.

Example of the output:  
 
service@spb spb:~/user> cat /EMC/backend/log_shared/EMCSystemLogFile.log | grep -i "My Dynamic Pool"


"2019-04-10T15:16:12.574Z" "spb@APMx" "Kittyhawk_safe" "25230" "unix/spb/root" "WARN" "1:12d4602" :: "RAID protection for Storage Pool My Dynamic Pool is degraded. Please resolve any hardware problems. Internal information only: Pool OID 0x300000005." :: Category=System Component=mlu TimeZone=UTC

"2019-04-10T15:17:13.265Z" "spb@APMx" "Neo_CEM" "23864" "N/A" "WARN" "14:6032d" ::
"Storage pool My Dynamic Pool is degraded." :: Category=User Component=Health TimeZone=UTC

"2019-04-10T15:21:21.551Z" "spb@APMx" "Neo_CEM" "23864" "N/A" "WARN" "14:60343" ::
"Storage pool My Dynamic Pool is rebuilding due to the loss of a drive." :: Category=User Component=Health TimeZone=UTC

"2019-04-10T15:41:48.007Z" "spb@APMx" "Kittyhawk_safe" "25230" "unix/spb/root" "INFO" "1:12d0508" :: "RAID protection has been upgraded for Storage Pool My Dynamic Pool. Internal information only. Pool OID 0x300000005." :: Category=System Component=mlu TimeZone=UTC

"2019-04-10T15:42:16.938Z" "spb@APMx" "Neo_CEM" "23864" "N/A" "INFO" "14:60344" ::
"Storage pool My Dynamic Pool has finished rebuilding." :: Category=User Component=Health TimeZone=UTC

"2019-04-10T15:42:16.963Z" "spb@APM" "Neo_CEM" "23864" "N/A" "INFO" "14:60326" ::
"Storage pool My Dynamic Pool is operating normally" :: Category=User Component=Health TimeZone=UTC

 

The Pools' RAID protection goes through a degraded status during the initial fault of the drive followed by the pool status being updated for the degraded drive as well.

Once we see the reference of the pool stating "Finished rebuilding" followed by the subsequent "Operating normally", we start the replacement of the faulted drive.


Without Free Drives in the array to rebuild to: 

Once the Dynamic Sparing process is completed, the GUI references a pool Status of "OK The component is operating normally. No action is required" (shown in the figure below). The pool currently has a reduced amount of spare space. If the pool is currently rebuilding due to a lost drive, the rebuild completes, but there may not be enough space for subsequent failures. Replace the faulted drive or add a drive of the same type and size or larger to the system.

User-added image

The Dynamic Pool rebuild can be verified with the same command as before but the output is a different. This time, an entry stating "Storage pool <Pool Name> does not have enough spare space." appears. This indicates there are no free drives available and the array is going to rebuild the faulted drive to free Extents in the Dynamic Pool as designed.

Example of the output: 

Command >> 15:57:32 service@spb spb:~/user> cat /EMC/backend/log_shared/EMCSystemLogFile.log | grep -i "<pool name>"
 
15:57:32 service@spb spb:~/user> cat /EMC/backend/log_shared/EMCSystemLogFile.log | grep -i "My Dynamic Pool"


"2019-04-13T18:39:06.846Z" "spb@APMx" "Kittyhawk_safe" "25230" "unix/spb/root" "WARN" "1:12d4602" :: "RAID protection for Storage Pool My Dynamic Pool is degraded. Please resolve any hardware problems. Internal information only: Pool OID 0x300000005." :: Category=System Component=mlu TimeZone=UTC

"2019-04-13T18:40:02.213Z" "spb@APMx" "Neo_CEM" "23864" "N/A" "WARN" "14:6032d" ::
"Storage pool My Dynamic Pool is degraded." :: Category=User Component=Health TimeZone=UTC

"2019-04-13T18:44:10.475Z" "spb@APMx" "Neo_CEM" "23864" "N/A" "WARN" "14:60345" ::
"Storage pool My Dynamic Pool does not have enough spare space." :: Category=User Component=Health TimeZone=UTC

"2019-04-13T18:45:12.766Z" "spb@APMx" "Neo_CEM" "23864" "N/A" "WARN" "14:60343" ::
"Storage pool My Dynamic Pool is rebuilding due to the loss of a drive." :: Category=User Component=Health TimeZone=UTC

"2019-04-13T19:04:22.047Z" "spb@APMx" "Kittyhawk_safe" "25230" "unix/spb/root" "INFO" "1:12d0508" ::
"RAID protection has been upgraded for Storage Pool My Dynamic Pool. Internal information only. Pool OID 0x300000005." :: Category=System Component=mlu TimeZone=UTC

"2019-04-13T19:05:15.701Z" "spb@APMx" "Neo_CEM" "23864" "N/A" "INFO" "14:60344" ::
"Storage pool My Dynamic Pool has finished rebuilding." :: Category=User Component=Health TimeZone=UTC

At this point, the faulted Storage Pool Drive still shows a "warning" error. Once the pool rebuild has been confirmed and the pool references "OK The component is operating normally. No action is required.", the faulted drive can safely be replaced. Once the disk has been replaced, the pool must be allowed to rebuild back to the replaced drive and reference "OK" only.
Notes

Instructions
In the GUI, when a disk faults, the Dynamic Pool status will indicate (as shown in the figure below) a status of "Degraded" "The pool performance is degraded. Check the storage system for hardware faults. Contact your service provider., A pool is rebuilding because it lost a drive. System performance may be affected during the rebuilding."

If unbound drives are available, the array will claim an unbound drive to rebuild the faulted drive to.  If there are no unbound drives available, the array will rebuild to the spare Extents within the Dynamic Pool itself.

Note : Please reference KB 540961 before proceeding
              https://support.emc.com/kb/540961

NOTE: The GUI can take up to 10 minutes to reflect the Pool in the degraded state due to GUI polling and browser caching.

User-added image



With Free Drives in the array to rebuild to: 

During the rebuild process, the initial drive that faulted displays a yellow "warning" error under My Dynamic Pool Properties.

Once the array has identified a unbound drive of the proper type, the array starts a permanent copy operation to the new drive.

During this process, the pool disk properties are updated to show the new drive that is replacing the faulted drives.

NOTE: This does not indicate that the copy has completed, this only indicates that the operation has been started.

During the rebuild operation, the pool indicates a degraded status. The pool updates its status back to "OK" once the rebuild has completed.

User-added image

To verify that the rebuild has completed, establish an SSH session to the array and run the following command grepping for the pool name:  

Command >> 15:57:32 service@spb spb:~/user> cat /EMC/backend/log_shared/EMCSystemLogFile.log | grep -i "<pool name>"

Review the output from that command to see the logging of the pools rebuild and its subsequent statuses.

Example of the output:  
 
service@spb spb:~/user> cat /EMC/backend/log_shared/EMCSystemLogFile.log | grep -i "My Dynamic Pool"


"2019-04-10T15:16:12.574Z" "spb@APMx" "Kittyhawk_safe" "25230" "unix/spb/root" "WARN" "1:12d4602" :: "RAID protection for Storage Pool My Dynamic Pool is degraded. Please resolve any hardware problems. Internal information only: Pool OID 0x300000005." :: Category=System Component=mlu TimeZone=UTC

"2019-04-10T15:17:13.265Z" "spb@APMx" "Neo_CEM" "23864" "N/A" "WARN" "14:6032d" ::
"Storage pool My Dynamic Pool is degraded." :: Category=User Component=Health TimeZone=UTC

"2019-04-10T15:21:21.551Z" "spb@APMx" "Neo_CEM" "23864" "N/A" "WARN" "14:60343" ::
"Storage pool My Dynamic Pool is rebuilding due to the loss of a drive." :: Category=User Component=Health TimeZone=UTC

"2019-04-10T15:41:48.007Z" "spb@APMx" "Kittyhawk_safe" "25230" "unix/spb/root" "INFO" "1:12d0508" :: "RAID protection has been upgraded for Storage Pool My Dynamic Pool. Internal information only. Pool OID 0x300000005." :: Category=System Component=mlu TimeZone=UTC

"2019-04-10T15:42:16.938Z" "spb@APMx" "Neo_CEM" "23864" "N/A" "INFO" "14:60344" ::
"Storage pool My Dynamic Pool has finished rebuilding." :: Category=User Component=Health TimeZone=UTC

"2019-04-10T15:42:16.963Z" "spb@APM" "Neo_CEM" "23864" "N/A" "INFO" "14:60326" ::
"Storage pool My Dynamic Pool is operating normally" :: Category=User Component=Health TimeZone=UTC

 

The Pools' RAID protection goes through a degraded status during the initial fault of the drive followed by the pool status being updated for the degraded drive as well.

Once we see the reference of the pool stating "Finished rebuilding" followed by the subsequent "Operating normally", we start the replacement of the faulted drive.


Without Free Drives in the array to rebuild to: 

Once the Dynamic Sparing process is completed, the GUI references a pool Status of "OK The component is operating normally. No action is required" (shown in the figure below). The pool currently has a reduced amount of spare space. If the pool is currently rebuilding due to a lost drive, the rebuild completes, but there may not be enough space for subsequent failures. Replace the faulted drive or add a drive of the same type and size or larger to the system.

User-added image

The Dynamic Pool rebuild can be verified with the same command as before but the output is a different. This time, an entry stating "Storage pool <Pool Name> does not have enough spare space." appears. This indicates there are no free drives available and the array is going to rebuild the faulted drive to free Extents in the Dynamic Pool as designed.

Example of the output: 

Command >> 15:57:32 service@spb spb:~/user> cat /EMC/backend/log_shared/EMCSystemLogFile.log | grep -i "<pool name>"
 
15:57:32 service@spb spb:~/user> cat /EMC/backend/log_shared/EMCSystemLogFile.log | grep -i "My Dynamic Pool"


"2019-04-13T18:39:06.846Z" "spb@APMx" "Kittyhawk_safe" "25230" "unix/spb/root" "WARN" "1:12d4602" :: "RAID protection for Storage Pool My Dynamic Pool is degraded. Please resolve any hardware problems. Internal information only: Pool OID 0x300000005." :: Category=System Component=mlu TimeZone=UTC

"2019-04-13T18:40:02.213Z" "spb@APMx" "Neo_CEM" "23864" "N/A" "WARN" "14:6032d" ::
"Storage pool My Dynamic Pool is degraded." :: Category=User Component=Health TimeZone=UTC

"2019-04-13T18:44:10.475Z" "spb@APMx" "Neo_CEM" "23864" "N/A" "WARN" "14:60345" ::
"Storage pool My Dynamic Pool does not have enough spare space." :: Category=User Component=Health TimeZone=UTC

"2019-04-13T18:45:12.766Z" "spb@APMx" "Neo_CEM" "23864" "N/A" "WARN" "14:60343" ::
"Storage pool My Dynamic Pool is rebuilding due to the loss of a drive." :: Category=User Component=Health TimeZone=UTC

"2019-04-13T19:04:22.047Z" "spb@APMx" "Kittyhawk_safe" "25230" "unix/spb/root" "INFO" "1:12d0508" ::
"RAID protection has been upgraded for Storage Pool My Dynamic Pool. Internal information only. Pool OID 0x300000005." :: Category=System Component=mlu TimeZone=UTC

"2019-04-13T19:05:15.701Z" "spb@APMx" "Neo_CEM" "23864" "N/A" "INFO" "14:60344" ::
"Storage pool My Dynamic Pool has finished rebuilding." :: Category=User Component=Health TimeZone=UTC

At this point, the faulted Storage Pool Drive still shows a "warning" error. Once the pool rebuild has been confirmed and the pool references "OK The component is operating normally. No action is required.", the faulted drive can safely be replaced. Once the disk has been replaced, the pool must be allowed to rebuild back to the replaced drive and reference "OK" only.
Notes

Article Attachments

Attachments

Attachments

Article Properties

First Published

Wed Apr 10 2019 21:08:58 GMT

First Published

Wed Apr 10 2019 21:08:58 GMT

Rate this article

Accurate
Useful
Easy to understand
Was this article helpful?
0/3000 characters