Dell EMC Unity: Drive errors causing drive failures or performance issues (Dell EMC Correctable)

Summary: Severe performance issues after some flash drives start logging errors but are not automatically failed by the system.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms



Overview:
Two separate issues have been observed for the following drive part numbers:
005052377, 005052378, 005052379, 005052380, 005051739, 005051740, 005052154, 005052155

Issue # 1.
Drives fail (set to EOL, end of life) after drive reports many Soft SCSI bus errors  accompanied by  INVALID SENSE BUFFER 

Drives with the part numbers listed and are running firmware PC09, PC0B, PC0D, PC10, PC42 or PC47

Example SP logs (/EMC/backend/log_shared/EMCSystemLogFile.log) similar as below:
01/29/18 05:41:00.121 Bus0 Enc0 Dsk19   11c4003 [WARN] System: Disk Soft SCSI bus error. DrvErrExtStat:0x1 SRT 349ms ST 0xcd51723a571 ET 0xcd517285579 . [REQUEST SENSE command failed]
01/29/18 05:41:00.131 Bus0 Enc0 Dsk19   11c0006 [INFO] System: Disk INVALID SENSE BUFFER OP 0x28, LBA 0x37d8e000, SZ 0x800



Issue # 2.
The system administrator observes severe performance issues on the system after a drive starts logging errors -  Soft Media Errors and 01|18|ff

Drives with the part numbers listed and are running firmware PC13 or PC4A.  Errors may be reported by the system "Soft media error".
The drive may report it has reached end of life (EOL), but it remains part of the pool, degrading performance for all LUNs/FSs in the affected pool.
Problem has been observed mostly on all-flash dynamic pools, but could potentially involve traditional pools containing flash drives of the affected part numbers.

Example SP logs (/EMC/backend/log_shared/EMCSystemLogFile.log) similar as below:
04/27/18 21:52:52.909 Bus1 Enc0 Dsk02   11c4004 [WARN] System: Disk 1_0_2 Soft media error. DrvErrExtStat:0x22 SRT 69ms ST 0x5eae3254c6d ET 0x5eae32659fa . [Recovered error (on-drive ECC)]
04/27/18 21:52:52.921 Bus1 Enc0 Dsk02   11c0006 [INFO] System: Disk 1_0_2 01|18|ff BLBA 0x127fedd0 OP 0x2f, LBA

Cause

Issue # 1: Drive reports Soft SCSI bus errors accompanied by INVALID SENSE BUFFER 
The cause of the errors is an incorrect data sense field length returned by the drive: Although Dell EMC requirement specifies that the maximum size allowed for the descriptor format sense data is 48 bytes, the firmware generates a descriptor format sense data at larger than 48 bytes. 

Issue # 2: Drive reports Soft Media Errors and 01|18|ff
The cause of the soft media errors is Unity code incorrectly acknowledging errors returned by the drive. In both cases, the errors returned by the drive are not correctly acknowledged by the Unity code while the rate of errors to good IOs continues to be low, causing the drive to continue to operate in a  non-optimal state for a long time and affecting the performance of all LUNs/FSs in the pool.

Resolution

Issue # 1: 
Dell EMC Unity OE 4.2.1.9535982 and above addresses this issue and it is recommended the array software is updated to the most recent release. 
Additionally, drive firmware updates are strongly recommended to avoid future issues. See below.


Issue # 2: Drive reports Soft Media Errors and 01|18|ff
For immediate relief of the performance problem, the offending drive should be taken out of the pool. Once the drive is out of the pool, performance should improve immediately. 

  • If physical access to the system is available, remove the problematic drive from the slot and contact Dell EMC Technical Support to request a drive replacement.
  • If no physical access to the system is immediately available, and to discuss other possible workarounds, contact Dell EMC Technical Support or your Authorized Service Representative and quote this Knowledgebase article ID.

Recommendations:

To address issue #1 and #2, it is recommended to update drive firmware to the versions listed below or greater. Unity Drive Firmware bundle V9 released on February 27, 2019 contains firmware for the following part numbers and corresponding firmware versions

005052377 - QC4E 
005052378 - QC4E
005052379 - QC4E
005052380 - QC4E

005051739   PC16
005051740   PC16
005052154   PC16
005052155   PC16

 

Refer to KB 490700 https://support.emc.com/kb/490700 for instructions on updating drive firmware.

The latest Unity drive firmware bundle is available for download from the support page, and it can be found by searching for "Unity_Drive_Firmware_Package".
.
Refer to
DTA 528178: Dell EMC Unity: Drive soft media errors may result in performance issues and data unavailability (User Correctable) released for drives 005052377 005052378 005052379 and 005052380

Additional Information

This content is translated in other languages: 
https://downloads.dell.com/TranslatedPDF/ES_KB521649.pdf
https://downloads.dell.com/TranslatedPDF/DE_KB521649.pdf
https://downloads.dell.com/TranslatedPDF/FR_KB521649.pdf
https://downloads.dell.com/TranslatedPDF/IT_KB521649.pdf
https://downloads.dell.com/TranslatedPDF/JA_KB521649.pdf
https://downloads.dell.com/TranslatedPDF/KO_KB521649.pdf

Affected Products

Dell EMC Unity Family

Products

Dell Unity 300, Dell EMC Unity 300F, Dell EMC Unity 350F, Dell EMC Unity XT 380, Dell EMC Unity XT 380F, Dell EMC Unity 400, Dell EMC Unity 400F, Dell EMC Unity 450F, Dell EMC Unity XT 480, Dell EMC Unity XT 480F, Dell EMC Unity 500 , Dell EMC Unity 500F, Dell EMC Unity 550F, Dell EMC Unity 600, Dell EMC Unity 600F, Dell EMC Unity 650F, Dell EMC Unity XT 680, Dell EMC Unity XT 680F, Dell EMC Unity XT 880, Dell EMC Unity XT 880F, Dell EMC Unity Family |Dell EMC Unity All Flash, Dell EMC Unity Family, Dell EMC Unity Hybrid ...
Article Properties
Article Number: 000050358
Article Type: Solution
Last Modified: 30 Jun 2021
Version:  4
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.