XtremIO: SSD failure performance impact on XtremIO Array

Summary: SSD failure performance impact on XtremIO Array

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

Single or Multiple SSD failures in XtremIO DPG may cause performance impact on XtremIO Array. In order to understand what causes this situation, we must explain DPG Operations & DPG States:

- Main DPG Operations:

DPG Rebuild:

  • When: When an SSD fails
  • Why: Restore double parity protection

DPG Integration:

  • When: When a technician replaces a failed SSD with a brand-new drive
  • Why: Adds a new SSD into the DPG

DPG States:

  • Healthy: Double parity protection
  • Single Degraded: Single parity protection
  • Double Degraded: No parity protection
  • Failed: Data Loss

XtremIO is a Content Aware Storage (a.k.a. CAS). Therefore, all the I/O operations to the DPG are statistically random, this fact allows us to achieve the same performance regardless of the user's random or sequential workload. Another benefit is that if an SSD fails, the cluster is not required to return a page to its original location.

Other arrays that are not CAS do have requirements for both sequential logical data and sequential physical data; If you do not return data to its original location, you lose the sequential I/O performance.

Note: The same SSD space used for user writes is also used for data recovery. XtremIO's Hot-Spare is actually horizontal.

DPG Rebuild explained:
When an SSD is removed or fails SYM issues an automatic DPG rebuild, the DPG rebuild requires two operations to take place:

Phase 1: Recover all the lost data and write it elsewhere:
The lost pages (data + parity) are recovered to the DPG (new write flow)
The PLBM/HMD tables are updated.

Phase 2: Update the parity information of all stripes:
Moving data/parity pages requires updating all parities (across all stripes)
Both operations require updating all stripes, to save time and reduce writes. Both are performed on a single iteration.

Note: The DPG rebuild flow requires cannibalizing user space, however, the XtremIO implementation keeps one SSD worth of space aside for recovering from the first failure. This is the requirement per X-Brick!
Note: When the DPG loses the first SSD. The DPG's usable capacity does not drop. We pre-allocate space for such a scenario
Note: When the DPG loses the second SSD. The DPG's usable capacity drops by the capacity of one SSD.

DPG Integration explained: Adding a new SSD to the DPG requires a manual intervention.

  • It requires a manual intervention of a technician (place a new SSD in the DAE slot)
  • There is little/no criticality (opposed to a rebuild)

Once requested, the DPG integration process balances the parity blocks. Only parity blocks are recovered to the original SSD (to achieve an even parity distribution). This is done by Assigning, Adding & Integrating the new SSD.

 

Cause

Single or Multiple SSD failures in XtremIO DPG

 

Resolution

Based on the above information, during a DPG rebuild/integration there is some increase in cluster resource utilization, though usually there should not be a noticeable performance or latency increase. However, during a double DPG rebuild the cluster focuses nearly all of its resources to rebuilding the failed SSDs as soon as possible in order to ensure data integrity and avoid data loss. This is expected by design, and performance should go back to normal performance after completion of all operations.

 

Affected Products

XtremIO Family

Products

XtremIO Family
Article Properties
Article Number: 000071340
Article Type: Solution
Last Modified: 09 Jan 2026
Version:  5
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.