This article provides the overview of SRDF/A session drop reason and 3 cases sharing.
I/O Operation of SRDF/A:
Following figure indicates the I/O Operation of SRDF/A.
Common SRDF/A Connectivity
IP WAN Connectivity
SRDF/A links recovery capacity
SRDF/A cache utilize limitation
Error message and Event
Enginuity error message:
Symevent error message:
SRDF/A common drop reasons and solutions
In addition to the mentioned link problem, SRDF/A common causes of link drop could be the data transferred exceed the link bandwidth, cache limitation is reached and triggered the link drop. On the other hand, the whole array performance issues will also cause the lack of memory will facing link drop.
Case 1 - Bandwidth issue
Configuration: SRDF/A between SiteB to SiteC. Totally 622Mb WAN between three arrays
Issue: One of RDF group drop at 17:30 every day.
Cause: Check the performance trace, the SRDF/A data transfer reach around 90MB/s (over 720Mb/s) which exceed the total bandwidth. The insufficient bandwidth at peak time causes the SRDF/A group drop.
11/12/2013 17:20 87.94 MB
11/12/2013 17:25 94.59 MB
11/12/2013 17:30 71.96 MB
11/12/2013 17:35 73.58 MB
11/12/2013 17:40 79.58 MB
11/12/2013 17:45 79.58 MB
11/12/2013 17:50 78.46 MB
Solution: Increase bandwidth.
Case 2 - Short term insufficient bandwidth
Issue: From the performance log, SRDF / A traffic suddenly increased, and then the entire group is offline.
Log analysis: Check the inline log and find CACA.10 error, which means that the array memory usage exceeds the limit then drop the SRDF/A group.
Solution: DSE is recommended. After applying the DSE, the short-term large amounts of data from SRDF/S group 206 need to be transferred through the SRDF/A group 207 to the remote disaster recovery array, we can see DSE cache a lot of data, SRDF/A group 207 ran for a period of time close to the upper limit of the transmission, but the SRDF / A group didn’t not drop offline.
Case 3 - A link problem caused an out of memory
Log analysis: Inline error shows cycle number has been an hour without switching at 16:30
Inline error display memory overrun led to SRDF / A interrupt. From the performance data to see Write Pending to 79% at 23:44
Performance log analysis: From 15:30, the cycle number is no longer grow, but the state of SRDF/A is still active. Then, active cycle size continued to grow, the memory also continued to grow. It became inactive at 23:45.
Cause: The link quality problem causes R2 to fail to receive the cycle, causing the cycle size to continue to grow and eventually exceed the cache usage limit.
Solution: Repair the link quality problems.
Author: Fenglin Li
Please click here for for all contents shared by us.