VNX / Unity: Understanding Uncorrectable sectors and parity errors (User Correctable)
Summary: This article explains uncorrectable sectors and parity errors.
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
Understanding Uncorrectable sectors and parity errors on a CLARiiON, VNX, or Unity array.
Event log messages such as the following, may also appear as Dial Homes:
VNX1
Error code: 0x953 Uncorrectable Parity Sector
Error code: 0x957 Uncorrectable Data Sector
Error code: 0x68A Uncorrectable Parity Sector
Error code: 0x695 Uncorrectable Data Sector
Error code: 0x840 Data Sector Invalidated
b26 Cache has issued CORRUPT_CRC. LUN= 309 ca_sync.c 0 309 2
VNX2
71688003 Uncorrectable Sector RAID Group: %2 Position: %3 LBA: %4 Blocks: %5 Error info: %6 Extra info: %7
71688008 Uncorrectable Sector RAID Group: 10 Position: 1 LBA: d180 Blocks: 8 Error info: 0 Extra info: e [r5_rb FLU 8224 r5_rb]
71688008 Uncorrectable Sector RAID Group: 10 Position: 1 LBA: d170 Blocks: 8 Error info: 0 Extra info: e [r5_rb FLU 8224 r5_rb]
71688001 Data Sector Invalidated RAID Group: 10 Position: 1 LBA: d121 Blocks: 7 Error info: 0 Extra info: e [r5_rb FLU 8224 r5_rb]
Event log messages such as the following, may also appear as Dial Homes:
VNX1
Error code: 0x953 Uncorrectable Parity Sector
Error code: 0x957 Uncorrectable Data Sector
Error code: 0x68A Uncorrectable Parity Sector
Error code: 0x695 Uncorrectable Data Sector
Error code: 0x840 Data Sector Invalidated
b26 Cache has issued CORRUPT_CRC. LUN= 309 ca_sync.c 0 309 2
VNX2
71688003 Uncorrectable Sector RAID Group: %2 Position: %3 LBA: %4 Blocks: %5 Error info: %6 Extra info: %7
71688008 Uncorrectable Sector RAID Group: 10 Position: 1 LBA: d180 Blocks: 8 Error info: 0 Extra info: e [r5_rb FLU 8224 r5_rb]
71688008 Uncorrectable Sector RAID Group: 10 Position: 1 LBA: d170 Blocks: 8 Error info: 0 Extra info: e [r5_rb FLU 8224 r5_rb]
71688001 Data Sector Invalidated RAID Group: 10 Position: 1 LBA: d121 Blocks: 7 Error info: 0 Extra info: e [r5_rb FLU 8224 r5_rb]
Please see article 382528 VNX2: Array reports events like 0x71688001,0x71688002, 0x71688003, 0x71688007 or 0x71688008 (User Correctable) for additional event codes.
Cause
Uncorrectable errors occur when two different disks in the same raid group, within the same sector, have media errors.
One example, when a disk with media errors is copying to a hotspare , and another disk in the same raid group, in the same sector, also has media errors, this would result in an uncorrectable error / sector.
The event codes described above are logged when the system is unable to read data sectors from a disk, and subsequent attempts to reconstruct the data from other disk in the RAID group failed. The "Uncorrectable" messages indicate which disk(s) was unable to successfully read the sectors from, and the "Invalidated" messages indicate which disk(s) sectors were marked as being void of valid information in a specific location. This marking is done to ensure that no invalid data will be returned to a host system. Attempts to read from an invalidated location will result in a hard error being returned to a host.
Attempts to write to an invalidated location will complete successfully and generally "fill" (overwrite) the void location, thus effectively fixing the uncorrectable. This is the reason that sometimes past uncorrectable errors disappear after a host has overwritten these sectors with new good data.
One example, when a disk with media errors is copying to a hotspare , and another disk in the same raid group, in the same sector, also has media errors, this would result in an uncorrectable error / sector.
The event codes described above are logged when the system is unable to read data sectors from a disk, and subsequent attempts to reconstruct the data from other disk in the RAID group failed. The "Uncorrectable" messages indicate which disk(s) was unable to successfully read the sectors from, and the "Invalidated" messages indicate which disk(s) sectors were marked as being void of valid information in a specific location. This marking is done to ensure that no invalid data will be returned to a host system. Attempts to read from an invalidated location will result in a hard error being returned to a host.
Attempts to write to an invalidated location will complete successfully and generally "fill" (overwrite) the void location, thus effectively fixing the uncorrectable. This is the reason that sometimes past uncorrectable errors disappear after a host has overwritten these sectors with new good data.
Resolution
For VNX:
Once all the hardware issues are resolved, Dell EMC Technical Support will need to execute a manual Read Only Background Verify (ROBV) if the affected internal LUN(s) in the affected pool. ROBV reads and checks the data for uncorrectables on the entire LUN (internal), including un-used space to determine how many uncorrectables sectors may still exist.
Once ROBV has completed, if uncorrectables are still occurring, your Dell EMC Technical Support Engineer will need to execute additional steps including collecting and analyzing Storage Allocation Table information(SAT) to identify the specific user LUN(s) affected (the internal LUNs where the uncorrectables were found will be mapped to the User LUNs).
For a complete explanation and the pre-requirements needed to execute a ROBV, please see article 466638, VNX: Explanation of Read Only Background Verify (ROBV) (User Correctable)
When an uncorrectable sector is found in a user LUN, the user data will need to be verified by the host application to determine if the user data is corrupt or if the error resides in unused space. Any process that would read the data such as a backup would suit to identify/flag possible corruption.
If there is corruption, the data can be restored from a good backup, with either a full restore, or a partial restore of only the affected file(s).
If there is not a good backup, another means from the host application should be used to restore or recreate the data.
Should the uncorrectable error not be found in user data, the background processes may still discover the error in the future, if host I/O does not overwrite the sector. This can lead to an incorrect assessment that this is a new error and cause delays in analysis and remediation for an old error that was not completely resolved.
In this case, it is highly recommended to move the good data to another LUN and delete the original affected LUN.
For Unity, other methods may exist to try to help resolve this issue. Please check for more Unity specific articles.
Once all the hardware issues are resolved, Dell EMC Technical Support will need to execute a manual Read Only Background Verify (ROBV) if the affected internal LUN(s) in the affected pool. ROBV reads and checks the data for uncorrectables on the entire LUN (internal), including un-used space to determine how many uncorrectables sectors may still exist.
Once ROBV has completed, if uncorrectables are still occurring, your Dell EMC Technical Support Engineer will need to execute additional steps including collecting and analyzing Storage Allocation Table information(SAT) to identify the specific user LUN(s) affected (the internal LUNs where the uncorrectables were found will be mapped to the User LUNs).
For a complete explanation and the pre-requirements needed to execute a ROBV, please see article 466638, VNX: Explanation of Read Only Background Verify (ROBV) (User Correctable)
When an uncorrectable sector is found in a user LUN, the user data will need to be verified by the host application to determine if the user data is corrupt or if the error resides in unused space. Any process that would read the data such as a backup would suit to identify/flag possible corruption.
If there is corruption, the data can be restored from a good backup, with either a full restore, or a partial restore of only the affected file(s).
If there is not a good backup, another means from the host application should be used to restore or recreate the data.
Should the uncorrectable error not be found in user data, the background processes may still discover the error in the future, if host I/O does not overwrite the sector. This can lead to an incorrect assessment that this is a new error and cause delays in analysis and remediation for an old error that was not completely resolved.
In this case, it is highly recommended to move the good data to another LUN and delete the original affected LUN.
For Unity, other methods may exist to try to help resolve this issue. Please check for more Unity specific articles.
Additional Information
Frequently Asked Questions:
Does Engineering have another way to recover lost customer data if a customer host application does not overwrite the data, and if a restore from backup does not work?
There is no other way to recover the data other than a restore operation, or recreating the data from the application.
Since the Uncorrectable data is actually missing data, there is no way of knowing what that data should be in order to write it back. This is why the sector is 'invalidated' and a hard error gets returned to the host. It is better to return a hard error than incorrect data.
Is it possible for an invalidated sector to change locations on a disk?
For a standard LUN, the invalid data sector will always remain the same.
For a pool LUN with auto-tiering enabled, it can move if that slice is relocated.
Is there a way of finding out the actual location of an invalidated sector?
It is very difficult to locate the position of an invalidated sector, due to how LUNs are mapped within RAID groups or Pools, and what information is available in the event logs.
Contact Dell EMC Support for further assistance to identify the blocks containing the invalidated sector. The support team will need to first go through the uncorrectable recovery process first and then escalate the issue to the recovery team. .
If the invalidated sector does not appear to impact the customer data area, is there a way to get rid of it without unbinding the LUN?
Some success has been reported when writing temporary data to fill the LUN and then deleting the temporary data. If the invalidated area is written to with temporary data, the voided location(s) are filled, thus restoring the invalid sector with valid data.
Can a customer run just a CHKDSK or FSCK to check the integrity of the data in the filesystem if Uncorrectable errors are reported by Read Only Background Verify?
When there is an issue of Uncorrectable sectors, the customer data should checked to see if any file corruption exists. In order to do this, run some type of application or program that reads all of the used sectors in the LUN space. The most common type of method is a full backup of the data. It is not advisable to simply run an FSCK (UNIX) or CHKDSK (Windows) because these utilities only check the metadata area of the files. If the Uncorrectable sectors are not in metadata space, the customer will be left with the impression that the data is OK when in fact it may not be.
Other FAQ:
Why is it necessary to disable Data Compression?
Data compression is a feature that analyzes the data on a disk and applies algorithms that reduce the size of repetitive sequences of bits that are inherent in some types of files. During the compression operation for a RAID group LUN, the software migrates and compresses the LUN data to a thin LUN in a pool. The LUN becomes a compressed thin LUN. Compression operations for pool LUNs (thick and thin) take place within the pool in which the LUN being compressed resides. Whenever data is compressed, there is a data movement within the pool which will not help us identify the correct MLU that is being affected due to Uncorrectables or Unexpected Coherency. So the feature has to be paused.
Why is it necessary to disable Auto-Tiering?
The auto-tiering feature migrates data between storage tiers or different storage media (EFD, FC & SATA). The purpose of tiered storage is to retain the most frequently accessed or important data on fast, high performance (more expensive) drives, and move the less frequently accessed and less important data to low performance (less expensive) drives. Similar to Data Compression, there is data movement involved in Auto-Tiering too which will not help us in identifying the sector the of the MLU that is affected due to Uncorrectables or Unexpected Coherency if it is not disabled. So the relocation needs to be stopped and the schedule has to be disabled.
Why is it necessary to disable FAST Cache?
Fast Cache only needs to be disabled if the Uncorrectable Sector Error is reported in Fast Cache
Why is it necessary to run ROBV on the whole RAID Group (RG) and not on the particular LUN?
You need to run ROBV on the entire RG to make certain other customer LUNs in the same RG were not affected.
Why is it necessary to run ROBV on the Pool and not just the RAID Group?
You need to run ROBV on an entire Pool if an Auto-Tiering schedule has run since the time an Uncorrectable was reported and the ROBV is scheduled to begin. This is necessary since data slices can move the Uncorrectable to another sector if that slice of data is moved to another tier.
Why is it necessary to gather SAT - Storage Allocation Table information?
The SAT information when run through the tools used by Dell EMC support, will determine the customer LUN/MLU that the Uncorrectable sector lies in. This will also indicate if the issue is in the data space or in the metadata space of the customer LUN.
Does Engineering have another way to recover lost customer data if a customer host application does not overwrite the data, and if a restore from backup does not work?
There is no other way to recover the data other than a restore operation, or recreating the data from the application.
Since the Uncorrectable data is actually missing data, there is no way of knowing what that data should be in order to write it back. This is why the sector is 'invalidated' and a hard error gets returned to the host. It is better to return a hard error than incorrect data.
Is it possible for an invalidated sector to change locations on a disk?
For a standard LUN, the invalid data sector will always remain the same.
For a pool LUN with auto-tiering enabled, it can move if that slice is relocated.
Is there a way of finding out the actual location of an invalidated sector?
It is very difficult to locate the position of an invalidated sector, due to how LUNs are mapped within RAID groups or Pools, and what information is available in the event logs.
Contact Dell EMC Support for further assistance to identify the blocks containing the invalidated sector. The support team will need to first go through the uncorrectable recovery process first and then escalate the issue to the recovery team. .
If the invalidated sector does not appear to impact the customer data area, is there a way to get rid of it without unbinding the LUN?
Some success has been reported when writing temporary data to fill the LUN and then deleting the temporary data. If the invalidated area is written to with temporary data, the voided location(s) are filled, thus restoring the invalid sector with valid data.
Can a customer run just a CHKDSK or FSCK to check the integrity of the data in the filesystem if Uncorrectable errors are reported by Read Only Background Verify?
When there is an issue of Uncorrectable sectors, the customer data should checked to see if any file corruption exists. In order to do this, run some type of application or program that reads all of the used sectors in the LUN space. The most common type of method is a full backup of the data. It is not advisable to simply run an FSCK (UNIX) or CHKDSK (Windows) because these utilities only check the metadata area of the files. If the Uncorrectable sectors are not in metadata space, the customer will be left with the impression that the data is OK when in fact it may not be.
Other FAQ:
Why is it necessary to disable Data Compression?
Data compression is a feature that analyzes the data on a disk and applies algorithms that reduce the size of repetitive sequences of bits that are inherent in some types of files. During the compression operation for a RAID group LUN, the software migrates and compresses the LUN data to a thin LUN in a pool. The LUN becomes a compressed thin LUN. Compression operations for pool LUNs (thick and thin) take place within the pool in which the LUN being compressed resides. Whenever data is compressed, there is a data movement within the pool which will not help us identify the correct MLU that is being affected due to Uncorrectables or Unexpected Coherency. So the feature has to be paused.
Why is it necessary to disable Auto-Tiering?
The auto-tiering feature migrates data between storage tiers or different storage media (EFD, FC & SATA). The purpose of tiered storage is to retain the most frequently accessed or important data on fast, high performance (more expensive) drives, and move the less frequently accessed and less important data to low performance (less expensive) drives. Similar to Data Compression, there is data movement involved in Auto-Tiering too which will not help us in identifying the sector the of the MLU that is affected due to Uncorrectables or Unexpected Coherency if it is not disabled. So the relocation needs to be stopped and the schedule has to be disabled.
Why is it necessary to disable FAST Cache?
Fast Cache only needs to be disabled if the Uncorrectable Sector Error is reported in Fast Cache
Why is it necessary to run ROBV on the whole RAID Group (RG) and not on the particular LUN?
You need to run ROBV on the entire RG to make certain other customer LUNs in the same RG were not affected.
Why is it necessary to run ROBV on the Pool and not just the RAID Group?
You need to run ROBV on an entire Pool if an Auto-Tiering schedule has run since the time an Uncorrectable was reported and the ROBV is scheduled to begin. This is necessary since data slices can move the Uncorrectable to another sector if that slice of data is moved to another tier.
Why is it necessary to gather SAT - Storage Allocation Table information?
The SAT information when run through the tools used by Dell EMC support, will determine the customer LUN/MLU that the Uncorrectable sector lies in. This will also indicate if the issue is in the data space or in the metadata space of the customer LUN.
Affected Products
VNX1 SeriesProducts
CLARiiON, CLARiiON CX4 Series, Dell EMC Unity Family |Dell EMC Unity All Flash, Dell EMC Unity Family, Dell EMC Unity Hybrid, VNX2 SeriesArticle Properties
Article Number: 000046044
Article Type: Solution
Last Modified: 06 Nov 2025
Version: 6
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.