RecoverPoint for Virtual Machines: Consistency Groups in loops Paused by system/Error/Init/Active with many intermittent splitter/volume connection errors

Summary: RP4VM with IP splitter in a large scale environment - Consistency Groups in loops Paused by system/Error/Init/Active with many intermittent splitter/volume connection errors, even though no network issues identified ...

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

No specific change - but usually occurs in large scale environments (Many ESXi Splitters)

RP4VM with IP splitter in a large scale environment - Consistency Groups in loops Paused by system/Error/Init/Active with many intermittent splitter/volume connection errors, even though no network issues identified

The following can be seen in splitter logs:
2018-04-18T12:49:07.856Z cpu20:6831323)esx_splitter: KL_INFO:865: #2 - PathRpaEsx_v_check_status_impl: Rpa Port 43151e692cf0 (state OK) lease expired (120 < 262615747 - 262615622)
2018-04-18T12:49:07.856Z cpu20:6831323)esx_splitter: KL_INFO:865: #2 - PathRpaEsx_disconnect: starting to flush splits. path 10.xx.xx.xx:5042


Or similar to:
krnl:[21:05:19.493] 0/0 #1 - PathRpa_isKeepAliveOk: 0x430944449088 (10.xx.xxx.xx:5042) expired. 124 seconds since last keep-alive (120 allowed).


under NGS logs (extracted*/home/kos/ngs/result.log.gz) run the below grep:
zgrep Received result.log.gz | grep VSCSI | grep <ESX_IP>

e.g.
zgrep Received result.log.gz | grep VSCSI | grep '10.yy.yy.yyy'

The below print should be printed every ~20 seconds for every ESXi Splitter:

2018/05/16 11:17:40.388 - #2 - 16249/16238 - xTE: 2199311247: processSystemMessage: Received connect for channel: 31681873 module: MPIControlTransporter machine: VSCSI_SPLITTER&10.yy.yy.yyy UIDName: 4b0f9fb95da8xxxx version: 0         isExternal:  srcInternalSiteName:  localMPIVersion: 1 remoteMPIVersion: 1
2018/05/16 11:27:40.390 - #2 - 16249/16238 - xTE: 2284932560: processSystemMessage: Received connect for channel: 31684868 module: MPIControlTransporter machine: VSCSI_SPLITTER&10.yy.yy.yyy UIDName: 2836bd09480exxxx version: 0         isExternal:  srcInternalSiteName:  localMPIVersion: 1 remoteMPIVersion: 1


In this case 10 minutes (more than a minute) between 2 calls is a good indication that we're suffering from this issue.

Other symptoms in replication logs (extracted*/files/home/kos/replication/result.log) which are also symptoms of network/communication issues:
2018/04/30 09:18:13.630 - #1 - 4997/4967 - SplitCommIO::~SplitCommIO: IPDataCommIOInterface: not all responses where processed...
2018/04/18 10:16:50.383 - #0 - 7383/7323 - SocketIOProvider::openDevice: errno=35 socket is not connected for device ( m_lr=(0x0,0x108db1cd4dxxxxxx,e_IPDataComm) m_handle=31 m_openCount=34 m_status=e_NoConnection  m_splitterId = 108db1cd4dxxxxxx m_localIPAddress = 10.xx.xx.xxx) :  a_openParams.m_openMode = 2
2018/04/18 10:56:11.509 - #0 - 7373/7323 - SplitCommSocketIOInterface_AO_IMPL::processOneResponse_sys: errno=62 IPDataCommIOInterface: processOneResponse readStream failed
2018/04/18 10:56:11.513 - #0 - 7375/7323 - SplitCommSocketIOInterface_AO_IMPL::readStream: errno=62 IPDataCommIOInterface: readSocket failed ioStatus = general error a_bResetOnError = 0

Cause

Many sync calls (AmIAttach) from the NGS process to Control process leads to a bottleneck from the RPA to the Splitter which leads the Spitter to close the connection between the RPA to the Splitter due to inactivity for X (120) seconds

Resolution

Workaround:

No direct workaround exists

Resolution:


This issue is resolved in RecoverPoint for VMs 5.1.1.4 and later.
Upgrade to RecoverPoint for VMs 5.1.1.5 is recommended to resolve.

Products

RecoverPoint for Virtual Machines
Article Properties
Article Number: 000170712
Article Type: Solution
Last Modified: 11 Jun 2025
Version:  4
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.