PowerScale: SyncIQ Replication Issues When Jumbo Frames Are Enabled On PowerScale clusters

Summary: SyncIQ replication jobs may intermittently fail due to SyncIQ worker restarts and network-related errors. These issues are often observed in environments where PowerScale subnets are configured to use jumbo frames. The Knowledge Base (KB) outlines procedures to validate whether the end-to-end network infrastructure supports jumbo frames when IP packets are transmitted with the "Do Not Fragment" (DF) flag set in the IP header. When the DF bit is enabled, intermediate devices are unable to fragment oversized packets. If any segment of the network path does not support the configured MTU size (typically 9000 bytes for jumbo frames), these packets may be dropped, potentially resulting in SyncIQ worker process failures and replication job instability. ...

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

SyncIQ replication may fail with the following error: "SyncIQ policy failed. A work item has been restarted too many times." 

 

  • SyncIQ jobs replicating small datasets typically complete successfully.
  • SyncIQ jobs involving larger datasets may fail during execution.
  • SyncIQ Replication jobs without encryption succeed, while those using encryption fail immediately.

Cause

This issue may occur intermittently or appear randomly in environments where dynamic routing is enabled. In such cases, SyncIQ traffic may occasionally be routed through a network path that does not support packet fragmentation, leading to failures.



Troubleshooting:
 

  1. Use the ping command to verify whether the network infrastructure supports jumbo frames by testing end-to-end MTU compatibility. 
     
Use the ping command from the source cluster’s replication interface to the target cluster’s replication interface, specifying a payload size of 8972 bytes without setting the 'Do Not Fragment' (DF) flag.
   
        isi_for_array -n<lnn> 'ping -S <source-ip> -s 8972 <target-ip>'
source-1# isi_for_array -n1 'ping -c 4 -S xxx.xxx.xxx.xxx -s 8972 yyy.yyy.yyy.yyy'     
source-1: PING yyy.yyy.yyy.yyy (10.0.1.231) from xxx.xxx.xxx.xxx: 8972 data bytes
source-1: 1528 bytes from yyy.yyy.yyy.yyy: icmp_seq=0 ttl=64 time=0.944 ms
source-1: 1528 bytes from yyy.yyy.yyy.yyy: icmp_seq=1 ttl=64 time=0.797 ms
source-1: 1528 bytes from yyy.yyy.yyy.yyy: icmp_seq=2 ttl=64 time=0.912 ms

            The output shows that the network successfully passes packets when the "Do Not Fragment" (DF) flag is not set, suggesting that packets may be fragmented in transit. 
   

 To verify jumbo packet support by sending a ping from the source cluster's replication interface to the target cluster's replication interface with the "Do Not Fragment" flag enabled, follow these steps: 
   
          isi_for_array -n<lnn> 'ping -S <source-ip> -D -s 8972 <target-ip>'  

source-1# isi_for_array -n1 'ping -c 4 -S xxx.xxx.xxx.xxx -D -s 8972 yyy.yyy.yyy.yyy'                                                 
source-1: ping: sendto: Message too long
source-1: ping: sendto: Message too long
source-1: ping: sendto: Message too long
source-1: ping: sendto: Message too long
source-1: ping: sendto: Message too long

            The output shows that packet transmission fails when the "Do Not Fragment" (DF) bit is set, suggesting possible MTU constraints or issues with path MTU discovery. 

 

NOTE: The ping test should be conducted across all network paths and on all source and target cluster interfaces involved in SyncIQ replication.

 

  1. Use traceroute with MTU testing to identify intermediate network hops that may not support jumbo frames. 

    Testing specifying a payload size of 8972 bytes with the "Do Not Fragment" (DF) flag unset.

              isi_for_array -n<lnn> 'traceroute -s <source-ip> -p 5667 <target-ip> 8972'
source-1# isi_for_array -n1 'traceroute -s xxx.xxx.xxx.xxx -p 5667 yyy.yyy.yyy.yyy 8972'                                              
traceroute to yyy.yyy.yyy.yyy (yyy.yyy.yyy.yyy) from xxx.xxx.xxx.xxx, 64 hops max, 8972 byte packets
 1  example.name.internal (aaa.aaa.aaa.aaa)  0.577 ms  0.470 ms  0.472 ms
 2  bbb.bbb.bbb.bbb (bbb.bbb.bbb.bbb)  24.810 ms
    ccc.ccc.ccc.ccc (ccc.ccc.ccc.ccc)  23.418 ms  23.366 ms
 3  yyy.yyy.yyy.yyy (yyy.yyy.yyy.yyy)  23.639 ms  23.596 ms  23.608 ms

            The output shows that the traceroute test completed successfully when the 'Do Not Fragment' (DF) flag was not set.

source-1# isi_for_array -n1 'traceroute -s xxx.xxx.xxx.xxx -p 5667 yyy.yyy.yyy.yyy 8972'
traceroute to yyy.yyy.yyy.yyy (yyy.yyy.yyy.yyy) from xxx.xxx.xxx.xxx, 64 hops max, 8972 byte packets
 1  * * *
 2  * * *
 3  yyy.yyy.yyy.yyy (yyy.yyy.yyy.yyy)  23.661 ms  23.618 ms  23.743 ms

            The output shows that the traceroute test completed successfully when the 'Do Not Fragment' (DF) flag was not set, but fragmentation indicators were observed along the network path. 
 

Testing specifying a payload size of 8972 bytes with the "Do Not Fragment" (DF) flag set.

     isi_for_array -n<lnn> 'traceroute -F -s <source-ip> -p 5667 <target-ip> 8972'

source-1# isi_for_array -n1 'traceroute -F -s xxx.xxx.xxx.xxx -p 5667 yyy.yyy.yyy.yyy 8972'
traceroute to yyy.yyy.yyy.yyy (yyy.yyy.yyy.yyy) from xxx.xxx.xxx.xxx, 64 hops max, 8972 byte packets
traceroute: sendto: Message too long
 1 traceroute: wrote yyy.yyy.yyy.yyy 8972 chars, ret=-1
 *traceroute: sendto: Message too long
traceroute: wrote yyy.yyy.yyy.yyy 8972 chars, ret=-1
 *traceroute: sendto: Message too long
traceroute: wrote yyy.yyy.yyy.yyy 8972 chars, ret=-1
 *
traceroute: sendto: Message too long
 2 traceroute: wrote yyy.yyy.yyy.yyy 8972 chars, ret=-1
 *traceroute: sendto: Message too long
traceroute: wrote yyy.yyy.yyy.yyy 8972 chars, ret=-1
 *traceroute: sendto: Message too long
traceroute: wrote yyy.yyy.yyy.yyy 8972 chars, ret=-1
 * 

           The output indicates that the traceroute to the target failed, suggesting potential MTU limitations or fragmentation issues along the network path.                     

Resolution

Workaround:

  1. If the PowerScale subnet designated for SyncIQ traffic is configured with an MTU of 9000 bytes, it is critical to ensure that the entire network path between the participating PowerScale clusters fully supports jumbo frames.
  2. If the network path between participating PowerScale clusters does not support jumbo frames, ensure that the PowerScale subnet dedicated to SyncIQ traffic is configured with an MTU of 1500 bytes on both the source and target systems.

 

NOTE: Adjusting the MTU setting can interrupt ongoing data flows and may impact services relying on consistent packet delivery, such as SyncIQ replication or NFS operations. It is recommended to perform such changes during a maintenance window. Ensure proper coordination and validation across all network segments before applying changes.

Affected Products

Isilon

Products

Isilon, Isilon SyncIQ
Article Properties
Article Number: 000056217
Article Type: Solution
Last Modified: 27 Nov 2025
Version:  6
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.