RecoverPoint for VMs: I volumi del journal bloccati su host ESXi specifici possono causare la non disponibilità della replica dei dati e la regolamentazione del riavvio.

Summary: I volumi del journal bloccati su host ESXi specifici possono causare la non disponibilità della replica dei dati e la regolamentazione del riavvio.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

RecoverPoint for Virtual Machines utilizza volumi journal per tenere traccia e gestire le immagini point-in-time.
All'interno di un'ambiente RP4VMs, può verificarsi un problema che blocca un volume di journal su un singolo host ESXi.
Se la coppia vRPA assegnata per replicare le VM viene spostata tramite vMotion su un host ESXi diverso, può verificarsi il problema Data Replication Unavailable (DRU).

Ciò può influire su tutti i consistency group (CG) assegnati a uno specifico RPA, che determineranno lo stato DISABLE o ERROR.

 

È possibile che si verifichino arresti anomali del processo di replica:

replication Sun May 19 20:12:01 UTC 2024

replication Sun May 19 20:37:55 UTC 2024

replication Sun May 19 20:38:21 UTC 2024

replication Sun May 19 21:03:10 UTC 2024

replication Sun May 19 21:32:18 UTC 2024

replication Sun May 19 22:03:39 UTC 2024

replication Sun May 19 22:32:26 UTC 2024

replication Sun May 19 23:03:56 UTC 2024

replication Sun May 19 23:28:46 UTC 2024

Determina la disconnessione di RecoverPoint Appliance (RPA) dal cluster.

Log di replica vRPA del controllo sito che mostrano un deadlock:

2024/05/20 01:27:54.039 - #1 - 21594/21577 - DLManager: deadlock suspected at N6Kashya26ReplicationControl_AO_IMPLE, pid=21659/21577, config=(alertThreshold=10,alertLevel=1,killThreshold=0,coreThreshold=0,full=1), deltaTime=501, cmd=N6Kashya33SerializableMethodRequestRefCountE

2024/05/20 01:27:54.039 - #1 - 21594/21577 - DLManager: deadlock suspected at N6Kashya26ReplicationControl_AO_IMPLE, pid=21658/21577, config=(alertThreshold=10,alertLevel=1,killThreshold=0,coreThreshold=0,full=1), deltaTime=288, cmd=N6Kashya33SerializableMethodRequestRefCountE

2024/05/20 01:27:54.039 - #1 - 21594/21577 - DLManager: deadlock suspected at N6Kashya26ReplicationControl_AO_IMPLE, pid=21657/21577, config=(alertThreshold=10,alertLevel=1,killThreshold=0,coreThreshold=0,full=1), deltaTime=493, cmd=N6Kashya33SerializableMethodRequestRefCountE

2024/05/20 01:27:54.039 - #1 - 21594/21577 - DLManager: deadlock suspected at N6Kashya26ReplicationControl_AO_IMPLE, pid=21656/21577, config=(alertThreshold=10,alertLevel=1,killThreshold=0,coreThreshold=0,full=1), deltaTime=503, cmd=N6Kashya33SerializableMethodRequestRefCountE

2024/05/20 01:27:54.039 - #1 - 21594/21577 - DLManager: deadlock suspected at N6Kashya26ReplicationControl_AO_IMPLE, pid=21655/21577, config=(alertThreshold=10,alertLevel=1,killThreshold=0,coreThreshold=0,full=1), deltaTime=775, cmd=N6Kashya33SerializableMethodRequestRefCountE

2024/05/20 01:27:54.039 - #1 - 21594/21577 - DLManager: deadlock suspected at N6Kashya26ReplicationControl_AO_IMPLE, pid=21654/21577, config=(alertThreshold=10,alertLevel=1,killThreshold=0,coreThreshold=0,full=1), deltaTime=501, cmd=N6Kashya33SerializableMethodRequestRefCountE

2024/05/20 01:27:54.039 - #1 - 21594/21577 - DLManager: deadlock suspected at N6Kashya26ReplicationControl_AO_IMPLE, pid=21653/21577, config=(alertThreshold=10,alertLevel=1,killThreshold=0,coreThreshold=0,full=1), deltaTime=799, cmd=N6Kashya33SerializableMethodRequestRefCountE

Registri di controllo RPA che mostrano l'interruzione della replica:

RecoverPoint-0x5d0aa2fff4d45cf4-RPA6:/home/kos/control# grep killRep result.log.latest

2024/05/20 02:54:41.348 - #2 - 23464/23290 - RepHandler: killRep called. isKill=1 reason=Replication process didn't handle last command in a timely manner. cmd=Vector(busyType=BUSY_WITH_RCS(busyWithWhat= RC (RC_preOpenTask),busyWithWhichRCs= VectorMap([KBox6 => 1]))) isCoreDump=1

2024/05/20 02:54:41.936 - #2 - 23464/23290 - RepHandler: killRep called. isKill=0 reason=Rep disconnected. Verifying it is down isCoreDump=0

2024/05/20 02:54:48.974 - #2 - 23464/23290 - RepHandler: killRep called. isKill=1 reason=Replication process didn't handle last command in a timely manner. cmd=Vector(busyType=BUSY_WITH_RCS(busyWithWhat= RC (RC_preOpenTask),busyWithWhichRCs= VectorMap([KBox6 => 1]))) isCoreDump=1

2024/05/20 02:54:49.172 - #2 - 23464/23290 - RepHandler: killRep called. isKill=0 reason=Rep disconnected. Verifying it is down isCoreDump=0

2024/05/20 03:18:43.263 - #2 - 23464/23290 - RepHandler: killRep called. isKill=1 reason=Replication process didn't handle last command in a timely manner. cmd=Vector(busyType=BUSY_WITH_RCS(busyWithWhat= RC (RC_preOpenTask),busyWithWhichRCs= VectorMap([KBox6 => 1]))) isCoreDump=1

2024/05/20 03:18:43.265 - #2 - 23464/23290 - RepHandler: killRep called. isKill=1 reason=Replication process didn't handle last command in a timely manner. cmd=Vector(busyType=BUSY_WITH_RCS(busyWithWhat= RC (RC_preOpenTask),busyWithWhichRCs= VectorMap([KBox6 => 1]))) isCoreDump=1

2024/05/20 03:18:43.266 - #2 - 23464/23290 - RepHandler: killRep called. isKill=1 reason=Replication process didn't handle last command in a timely manner. cmd=Vector(busyType=BUSY_WITH_RCS(busyWithWhat= RC (RC_preOpenTask),busyWithWhichRCs= VectorMap([KBox6 => 1]))) isCoreDump=1

2024/05/20 03:18:43.268 - #2 - 23464/23290 - RepHandler: killRep called. isKill=1 reason=Replication process didn't handle last command in a timely manner. cmd=Vector(busyType=BUSY_WITH_RCS(busyWithWhat= RC (RC_preOpenTask),busyWithWhichRCs= VectorMap([KBox6 => 1]))) isCoreDump=1

2024/05/20 03:18:43.269 - #2 - 23464/23290 - RepHandler: killRep called. isKill=1 reason=Replication process didn't handle last command in a timely manner. cmd=Vector(busyType=BUSY_WITH_RCS(busyWithWhat= RC (RC_preOpenTask),busyWithWhichRCs= VectorMap([KBox6 => 1]))) isCoreDump=1

2024/05/20 03:18:43.807 - #2 - 23464/23290 - RepHandler: killRep called. isKill=0 reason=Rep disconnected. Verifying it is down isCoreDump=0

2024/05/20 03:59:26.445 - #2 - 23464/23290 - RepHandler: killRep called. isKill=1 reason=Replication process didn't handle last command in a timely manner. cmd=Vector(busyType=BUSY_WITH_RCS(busyWithWhat= RC (RC_preOpenTask),busyWithWhichRCs= VectorMap([KBox6 => 1]))) isCoreDump=1

 

L'host ESXi xxxxxxx mostra iofilterd-emcjiraf.log i blocchi degli oggetti VSAN:

2024-05-23T17:15:33.517Z iofilterd-emcjiraf[2104572]: FiltLib: Opening virtual disk ("/vmfs/volumes/vsan:529b87d1fe9acb29-b0cbb19ee7de1e57/RPvStorage/5d0aa2fff4d45cf4_IOFilter_JVOL_00668.vmdk") failed: "Failed to lock the file" (16392).

2024-05-23T17:15:33.517Z iofilterd-emcjiraf[2104572]: RPVS_iof_LunInfo_OpenLun: Failed to open vmdk file of lun 668 with status Busy path /vmfs/volumes/vsan:529b87d1fe9acb29-b0cbb19ee7de1e57/RPvStorage/5d0aa2fff4d45cf4_IOFilter_JVOL_00668.vmdk

2024-05-23T17:15:33.517Z iofilterd-emcjiraf[2104572]: execute_open_cmd: Cluster id: 0x5d0aa2fff4d45cf4 lun 668 open failed. open_rc: Busy con 0xc6db9c9340

2024-05-23T17:15:33.518Z iofilterd-emcjiraf[2104572]: execute_close_cmd: cluster id: 0x5d0aa2fff4d45cf4 lun 668 is not opened (openState: 0). Ignore close request

2024-05-23T17:15:33.519Z iofilterd-emcjiraf[2104572]: execute_close_cmd: cluster id: 0x5d0aa2fff4d45cf4 lun 48 is not opened (openState: 0). Ignore close request

2024-05-23T17:15:33.519Z iofilterd-emcjiraf[2104572]: execute_close_cmd: cluster id: 0x5d0aa2fff4d45cf4 lun 507 is not opened (openState: 0). Ignore close request

2024-05-23T17:15:33.519Z iofilterd-emcjiraf[2104572]: execute_close_cmd: cluster id: 0x5d0aa2fff4d45cf4 lun 558 is not opened (openState: 0). Ignore close request

2024-05-23T17:15:33.520Z iofilterd-emcjiraf[2104572]: execute_close_cmd: cluster id: 0x5d0aa2fff4d45cf4 lun 359 is not opened (openState: 0). Ignore close request

2024-05-23T17:15:34.228Z iofilterd-emcjiraf[2104572]: OBJLIB-VSANOBJ: VsanObjLock: Object 'b05fe864-d249-c108-0391-6cfe544f26e4' locked, retrying.

2024-05-23T17:15:36.581Z iofilterd-emcjiraf[2104572]: OBJLIB-VSANOBJ: VsanObjLock: Object 'e6ebf164-9c47-0fd7-6c17-6cfe544f26e4' locked, retrying.

2024-05-23T17:15:37.526Z iofilterd-emcjiraf[2104572]: OBJLIB-VSANOBJ: VsanObjLock: Object 'b2e10165-ee40-1674-3ead-6cfe544f26e4' locked, retrying.

2024-05-23T17:15:37.530Z iofilterd-emcjiraf[2104572]: OBJLIB-VSANOBJ: VsanObjLock: Object '681efa64-16f0-8bb4-ab56-6cfe544f26e4' locked, retrying.

2024-05-23T17:15:38.537Z iofilterd-emcjiraf[2104572]: OBJLIB-VSANOBJ: VsanObjLock: Could not lock object 'b05fe864-d249-c108-0391-6cfe544f26e4': Failed to lock the file (4).

2024-05-23T17:15:38.537Z iofilterd-emcjiraf[2104572]: DISKLIB-VMFS  : "vsan://529b87d1fe9acb29-b0cbb19ee7de1e57/b05fe864-d249-c108-0391-6cfe544f26e4" : failed to open (Failed to lock the file): ObjLib_Open failed. Type 3

2024-05-23T17:15:38.537Z iofilterd-emcjiraf[2104572]: DISKLIB-LINK  : "/vmfs/volumes/vsan:529b87d1fe9acb29-b0cbb19ee7de1e57/f9f9b364-aadc-f567-41a4-6cfe544ef870/5d0aa2fff4d45cf4_IOFilter_JVOL_00116.vmdk" : failed to open (Failed to lock the file).

2024-05-23T17:15:38.537Z iofilterd-emcjiraf[2104572]: DISKLIB-CHAIN : "/vmfs/volumes/vsan:529b87d1fe9acb29-b0cbb19ee7de1e57/f9f9b364-aadc-f567-41a4-6cfe544ef870/5d0aa2fff4d45cf4_IOFilter_JVOL_00116.vmdk" : failed to open (Failed to lock the file).

2024-05-23T17:15:38.545Z iofilterd-emcjiraf[2104572]: DISKLIB-VMFS  : "vsan://529b87d1fe9acb29-b0cbb19ee7de1e57/b05fe864-d249-c108-0391-6cfe544f26e4" : open successful (524293) size = 25769803776, hd = 0. Type 3

2024-05-23T17:15:38.546Z iofilterd-emcjiraf[2104572]: PluginLdr_Load: Loaded plugin 'libvmiof-disk-emcjiraf.so' from '/usr/lib64/vmware/plugin/libvmiof-disk-emcjiraf.so'

2024-05-23T17:15:38.546Z iofilterd-emcjiraf[2104572]: DISKLIB-VMFS  : "vsan://529b87d1fe9acb29-b0cbb19ee7de1e57/b05fe864-d249-c108-0391-6cfe544f26e4" : closed.

2024-05-23T17:15:38.546Z iofilterd-emcjiraf[2104572]: DISKLIB-LIB   : Failed to open '/vmfs/volumes/vsan:529b87d1fe9acb29-b0cbb19ee7de1e57/RPvStorage/5d0aa2fff4d45cf4_IOFilter_JVOL_00116.vmdk' with flags 0xa Failed to lock the file (16392).

2024-05-23T17:15:38.546Z iofilterd-emcjiraf[2104572]: FiltLib: Opening virtual disk ("/vmfs/volumes/vsan:529b87d1fe9acb29-b0cbb19ee7de1e57/RPvStorage/5d0aa2fff4d45cf4_IOFilter_JVOL_00116.vmdk") failed: "Failed to lock the file" (16392).

2024-05-23T17:15:38.546Z iofilterd-emcjiraf[2104572]: RPVS_iof_LunInfo_OpenLun: Failed to open vmdk file of lun 116 with status Busy path /vmfs/volumes/vsan:529b87d1fe9acb29-b0cbb19ee7de1e57/RPvStorage/5d0aa2fff4d45cf4_IOFilter_JVOL_00116.vmdk

2024-05-23T17:15:38.546Z iofilterd-emcjiraf[2104572]: execute_open_cmd: Cluster id: 0x5d0aa2fff4d45cf4 lun 116 open failed. open_rc: Busy con 0xc6dc2deaa0

2024-05-23T17:15:38.547Z iofilterd-emcjiraf[2104572]: execute_close_cmd: cluster id: 0x5d0aa2fff4d45cf4 lun 457 is not opened (openState: 0). Ignore close request

2024-05-23T17:15:38.548Z iofilterd-emcjiraf[2104572]: execute_close_cmd: cluster id: 0x5d0aa2fff4d45cf4 lun 250 is not opened (openState: 0). Ignore close request

2024-05-23T17:15:38.548Z iofilterd-emcjiraf[2104572]: execute_close_cmd: cluster id: 0x5d0aa2fff4d45cf4 lun 623 is not opened (openState: 0). Ignore close request

2024-05-23T17:15:38.548Z iofilterd-emcjiraf[2104572]: execute_close_cmd: cluster id: 0x5d0aa2fff4d45cf4 lun 436 is not opened (openState: 0). Ignore close request

2024-05-23T17:15:38.549Z iofilterd-emcjiraf[2104572]: execute_close_cmd: cluster id: 0x5d0aa2fff4d45cf4 lun 62 is not opened (openState: 0). Ignore close request

2024-05-23T17:15:40.886Z iofilterd-emcjiraf[2104572]: OBJLIB-VSANOBJ: VsanObjLock: Object 'e6ebf164-9c47-0fd7-6c17-6cfe544f26e4' locked, retrying.

2024-05-23T17:15:41.837Z iofilterd-emcjiraf[2104572]: OBJLIB-VSANOBJ: VsanObjLock: Object 'b2e10165-ee40-1674-3ead-6cfe544f26e4' locked, retrying.

2024-05-23T17:15:41.842Z iofilterd-emcjiraf[2104572]: OBJLIB-VSANOBJ: VsanObjLock: Object '681efa64-16f0-8bb4-ab56-6cfe544f26e4' locked, retrying.

2024-05-23T17:15:42.561Z iofilterd-emcjiraf[2104572]: OBJLIB-VSANOBJ: VsanObjLock: Object 'bcdce564-9ed7-ff34-42df-6cfe544f26e4' locked, retrying.

Nell'host in cui si trova l'RPA interessato nel vmkernel.log, vengono stampate le seguenti voci:

2024-10-21T13:08:13.487Z cpu31:2101016)WARNING: NFSLock: 4084: File is being locked by a consumer on host HOSTNAME0503.corp.qbe.com with exclusive lock.
2024-10-21T13:08:13.487Z cpu31:2101016)NFSLock: 5032: failed to get lock on file 38233f05329208c8_IOFilter_JVOL_00244-flat.vmdk(.lck-d84c0c0000000000) 0x430b567b2f60 on DATASTORE xxx.xxx.xxx.xxx (xxx.xxx.xxx.xxx): Busy
2024-10-21T13:08:23.795Z cpu31:2101016)WARNING: NFSLock: 4084: File is being locked by a consumer on host HOSTNAME0503.corp.qbe.com with exclusive lock.
2024-10-21T13:08:23.795Z cpu31:2101016)NFSLock: 5032: failed to get lock on file 38233f05329208c8_IOFilter_JVOL_00244-flat.vmdk(.lck-d84c0c0000000000) 0x430b567b7330 on DATASTORE xxx.xxx.xxx.xxx (xxx.xxx.xxx.xxx): Busy
2024-10-21T13:08:33.854Z cpu7:2099418)WARNING: NFSLock: 4084: File is being locked by a consumer on host HOSTNAME0503.corp.qbe.com with exclusive lock.
2024-10-21T13:08:33.854Z cpu7:2099418)NFSLock: 5032: failed to get lock on file 38233f05329208c8_IOFilter_JVOL_00069-flat.vmdk(.lck-8d7b030000000000) 0x430b567db0b0 on DATASTORE xxx.xxx.xxx.xxx (xxx.xxx.xxx.xxx): Busy
2024-10-21T13:08:44.163Z cpu2:2099418)WARNING: NFSLock: 4084: File is being locked by a consumer on host HOSTNAME0503.corp.qbe.com with exclusive lock.

Cause

La causa di questa situazione è attualmente oggetto di indagine.
Al momento non è disponibile alcuna risoluzione permanente; tuttavia, si tratta di una situazione specifica di RP4VMs.

La situazione può verificarsi in vari scenari:

1. Quando vRPA responsabile della protezione di una VM viene trasferito tramite vMotion su un altro host ESXi come host ESXi originale, blocca il journal sull host ESXi specifico.

2. Quando un consistency group sposta la proprietà da una coppia vRPA a un'altra, richiamando così la situazione descritta sopra.

Resolution

Risoluzione:
Esiste una soluzione per questo problema, ma è necessario l'intervento del personale del supporto tecnico Dell Technologies.  È necessario raccogliere i registri RP4VM, ESXi e vCenter appropriati per valutare questa situazione e sottoporli a escalation in base alle necessità. Contattare il centro di assistenza clienti Dell Technologies o il responsabile dell'assistenza per ricevere supporto tecnico e fare riferimento a questo ID soluzione della Knowledge Base di Dell Technologies.

Affected Products

RecoverPoint for Virtual Machines

Products

RecoverPoint for Virtual Machines
Article Properties
Article Number: 000053722
Article Type: Solution
Last Modified: 31 Oct 2024
Version:  4
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.