PowerFlex: Client IO Errors When Replication Is Being used

Resumen: Client servers are experiencing I/O errors against PowerFlex-backed devices. The overall backend (MDMs and SDSs) appears to be healthy. PowerFlex replication is being utilized, and there are some RPO errors against one or more of the RCGs. ...

Este artículo se aplica a Este artículo no se aplica a Este artículo no está vinculado a ningún producto específico. No se identifican todas las versiones del producto en este artículo.

Síntomas

  • No degraded or failed capacity
  • No SDSes were decoupled, and no SDS devices reported errors
  • No disconnected MDMs
  • The replication feature is being utilized

 

One or more alerts in the UI reporting the following errors:

 Minor - Remote Consistency Group RPO Exceeded
Major - The RCG consistent image is too large to be consumed by the destination in one piece.

 

MDM event logs may report the following:

2024-06-11 15:55:56.592000:0001566:RPL_PD_CAP_UTILIZATION_MINOR     WARNING  Protection Domain ID <pd_id> Replication journal capacity is at MINOR utilization level 
...
2024-06-11 16:20:12.848000:0001567:RPL_PD_CAP_UTILIZATION_MAJOR     ERROR    Protection Domain ID <pd_id> Replication journal capacity is at MAJOR utilization level 
...
2024-06-11 17:19:57.272000:0001584:RPL_PD_CAP_UTILIZATION_CRITICAL  CRITICAL Protection Domain ID <pd_id> Replication journal capacity is at VERY_HIGH utilization level 
...
2024-06-11 17:52:26.352000:0001585:RPL_PD_CAP_UTILIZATION_CRITICAL  CRITICAL Protection Domain ID <pd_id> Replication journal capacity is at CRITICAL utilization level
...
2024-06-11 16:25:14.381000:0001576:RPL_CG_MOVED_TO_SLIM_MODE        INFO     Replication Consistency Group ID <rcg_id> entered slim mode
...
2024-06-11 18:27:29.738000:0001586:SDR_CRITICAL_CAP_CHANGE          ERROR    SDR ID <sdr_id>) handling user data changed discarded old user data and stopped to accumulate new user data due critical capacity

 

Impact

Clients are unable to access volumes that are being replicated.

Causa

A rare software defect may occur where the MDM and the SDR component disagree on the internal counters related to journal capacity.

This discrepancy may cause the MDM to fail to unallocate(release) additional journal capacity when the SDR's capacity is full, potentially leading to I/O errors on client servers utilizing devices backed by PowerFlex.

Resolución

A rolling restart of all SDR components is required, and the MDM ownership on the Source system needs to be switched to resolve the issue.

 

Note: It is recommended that all RCGs on the Source system be terminated until the issue is resolved.

 

Restart SDR components on the Target site

1. Identify all the SDRs and validate that they are in a healthy state before continuing to step 2:

scli --query_all_sdr

2. Enter maintenance mode on the SDR:

scli --enter_sdr_maintenance_mode --sdr_name <name>

3. Validate that the SDR is in maintenance mode by running the command in step 1.

4. Restart the SDR component

pkill sdr

5. Repeat steps 1-4 for all SDRs on the Source site.

6. Exit SDRs from maintenance mode:

scli --exit_sdr_maintenance_mode --sdr_name <name>

7. Switch the MDM ownership, once all SDRs are restarted and are in a healthy state:

Note: The ownership can be transferred back to the original MDM server if desired.
#3.x
scli --switch_mdm_ownership --new_master_mdm_name <name>

#4.x
scli --switch_mdm_ownership --new_primary_mdm_name <name>

8. Validate that the I/O errors on the client servers are no longer reported.  If the client has entered a read-only filesystem, the client server may require a reboot.

 

Impacted Versions

PowerFlex 3.x

PFMP 4.x

Fixed In Version

PowerFlex 4.5.3
PowerFlex 4.5.4 - upgrade to 4.5.4 HF1 
PowerFlex 4.5.5 - no fix available
PowerFlex 4.5.6 and higher  

Productos afectados

PowerFlex rack, PowerFlex Appliance, PowerFlex Software
Propiedades del artículo
Número del artículo: 000227849
Tipo de artículo: Solution
Última modificación: 02 mar 2026
Versión:  10
Encuentre respuestas a sus preguntas de otros usuarios de Dell
Servicios de soporte
Compruebe si el dispositivo está cubierto por los servicios de soporte.