RecoverPoint for VMs:组处于错误状态,RPA 似乎占用 100% CPU 并且 ESXi 主机显示虚拟机的许多排队任务

摘要: RecoverPoint for Virtual Machines 复制已关闭,一致性组 (CG) 出错,RecoverPoint 设备 (RPA) 占用 100% CPU 且 ESXi 主机和虚拟机 (VM) 似乎停滞。

本文适用于 本文不适用于 本文并非针对某种特定的产品。 本文并非包含所有产品版本。

症状

RecoverPoint for Virtual Machines 一致性组 (CG) 处于错误状态:
  

Group01:
    Enabled: YES
    Transfer source:N/A
    Copy:
      Prod:
        Enabled: YES
        Active primary RPA: RPA 5
        Unavailable RPAs:
          Items: RPA 5
        Storage access: DIRECT ACCESS (marking data)
      Replica_at_target:
        Enabled: YES
        Active primary RPA: RPA 5
        Unavailable RPAs:
          Items: RPA 5
        Journal: DISTRIBUTING PRE REPLICATION IMAGE
        Storage access: NO ACCESS
        Max journal size: 1.09 TB
    Link:
      Prod->Replica_at_target:
        Data Transfer: ERROR



管理日志 (extracted*/files/home/kos/control/management.log.gz) 显示无法关闭虚拟机:

genericWarningID=WarnID_VMS_REGULATED,problemSeverity=ProblemSeverity_ERROR,genericWarningCategory=WarnCategory_GROUP,siteUID=Option(SiteUID(0x2xxxxxxxxxxxxxxx)),boxRole=[],hostID=NoOption,globalLinks=[],groupCopies=Vector(),groups=Vector(),links=Vector(),devices=Vector(),hostVolumes=Vector(),vms=Vector((vcUuid=(uuid=xxxxxxxx-e844-4456-b9b5-xxxxxxxxxxxx),uuid=xxxxxxxx-33f1-9036-4975-xxxxxxxxxxxx),(vcUuid=(uuid=xxxxxxxx-e844-4456-b9b5-xxxxxxxxxxxx),uuid=xxxxxxxx-fde7-63ee-b1a0-xxxxxxxxxxxx),(vcUuid=(uuid=xxxxxxxx-e844-4456-b9b5-xxxxxxxxxxxx),uuid=xxxxxxxx-fc05-2aa8-6783-xxxxxxxxxxxx),(vcUuid=(uuid=xxxxxxxx-e844-4456-b9b5-xxxxxxxxxxxx),uuid=xxxxxxxx-96d7-3d4b-3972-xxxxxxxxxxxx)),arrays=Vector(),contents=RecoverPoint failed to power off VMs. VMs: VM1.copy, VM2.copy, VM6.copy, VM5.copy



连接器日志中多个任务失败:

2021-09-23 16:33:09,288 [pool-7-thread-8] (ActionInfo.java:45) DEBUG - actionInfo for entity: 'rp.VM6.copy.shadow', action description: VirtualMachine.powerOff
2021-09-23 16:33:09,288 [pool-7-thread-8] (ActionInfo.java:46) DEBUG - state: Error task: Task:task-328099 @ https://10.xx.xx.xx:443/sdk
2021-09-23 16:33:09,288 [pool-7-thread-8] (ActionInfo.java:61) ERROR - Action failed: Task:task-328099 @ https://10.xx.xx.xx:443/sdk state: 'Error', error Message: Another task is already in progress.
2021-09-23 16:33:09,289 [pool-7-thread-8] (BaseCommand.java:69) ERROR - Command failed: SetVMPowerStateCommand(ID:5339931996651185259) took: 1537
com.emc.recoverpoint.connectors.vi.exceptions.VCConnectorException: Another task is already in progress.


打开虚拟机电源后,ESXi 主机 VMkernel 日志可能会显示 IO/VSCSI/拆分器错误:

2021-09-21T19:49:56.291Z cpu59:xxxxxxx)WARNING: VSCSI: 3577: handle 8194(vscsi0:0):WaitForCIF: Issuing reset;  number of CIF:1
2021-09-21T19:49:56.291Z cpu59:xxxxxxx)VSCSI: 2685: handle 8194(vscsi0:0):Reset request on FSS handle 81507336 (1 outstanding commands) from (vmm0:rp.VM1.copy.shadow)
2021-09-21T19:49:56.291Z cpu30:xxxxxxx)VSCSI: 2970: handle 8194(vscsi0:0):Reset [Retries: 0/0] from (vmm0:rp.VM1.copy.shadow)
2021-09-21T19:49:56.291Z cpu30:2097789)esx_splitter: KL_INFO:862: #2 - VSCSISplitter_ResetTarget: got a RESET TARGET request (not an ABORT!) handle ptr=0x430c4244ac00, filterInfo ptr=0x430c4244af60, cmd cdb[0]=0
2021-09-21T19:49:56.291Z cpu30:xxxxxxx)esx_splitter: KL_INFO:862: #2 - EsxSplitterVolume_HandleResetTarget: Called for volume guid 0xa7ce510709e9xxxx. resetTargetHandlingPolicy = 3, state = 1, bDomino = 0
2021-09-21T19:49:56.291Z cpu30:xxxxxxx)esx_splitter: KL_INFO:862: #2 - EsxSplitterVolume_HandleResetTarget: Policy is RESET_TARGET_POLICY_DO_NOTHING.
2021-09-21T19:49:56.291Z cpu30:xxxxxxx)VSCSI: 2753: handle 8194(vscsi0:0):Completing reset (0 outstanding commands)

2021-09-22T15:23:17.036Z cpu45:xxxxxxx)esx_splitter: KL_ERROR:937: #0 - IoEsx_ToStorage_v_isSucceeded_i: IO 0x4330e150e0e0 Failed. Host_Status = 0x5, Device_Status = 0x0, dataLength = 512
2021-09-22T15:23:17.037Z cpu45:xxxxxxx)esx_splitter: KL_ERROR:937: #0 - IoEsx_ToStorage_v_isSucceeded_i: numFSSRetries = 0, numFDSRetries = 0, absTimeoutMS = 64359253, startTC = 0
2021-09-22T15:23:17.037Z cpu45:xxxxxxx)esx_splitter: KL_ERROR:937: #0 - CommandIoFromRpaRead_v_storageReadEndIo_i: Failed to read data from guid 0x468127a1xxxxxxxx. Io status 0. msgId = 1155. Sending Nack to Kbox.
2021-09-22T15:23:17.037Z cpu45:xxxxxxx)esx_splitter: KL_ERROR:937: #0 - HandlesManager_v_acquire_handle_i: tried to acquire invalid handle 60


ESXi 文件 /scratch/log/iofilterd-cvtblrt.log 或 /var/run/log/iofilterd-cvtblrt.log 上的事件:

2021-09-22T15:23:17Z iofilterd-cvtblrt[2103769]: CVBLRTD: cvdrv_msg_recv_bytes:502: EOF while reading next message header from sd=14
2021-09-22T15:23:17Z iofilterd-cvtblrt[2103769]: CVBLRTD: cvdrv_msg_socket_destroy:133: Destroying socket sd=14
2021-09-22T15:23:17Z iofilterd-cvtblrt[2103769]: CVBLRTD: cvdrv_msg_recv_bytes:502: EOF while reading next message header from sd=15
2021-09-22T15:23:17Z iofilterd-cvtblrt[2103769]: CVBLRTD: cvdrv_msg_socket_destroy:133: Destroying socket sd=15
2021-09-22T15:53:43Z iofilterd-cvtblrt[2103769]: CVBLRTD: cvblrtd_uxmsg_accept_cb:51: Accepted new LI connection sd=14
2021-09-22T15:53:43Z iofilterd-cvtblrt[2103769]: CVBLRTD: cvdrv_msg_socket_create:107: Initialized new AF_UNIX socket connection for sd=14
2021-09-22T15:53:43Z iofilterd-cvtblrt[2103769]: CVBLRTD: cvblrtd_uxmsg_accept_cb:51: Accepted new LI connection sd=15
2021-09-22T15:53:43Z iofilterd-cvtblrt[2103769]: CVBLRTD: cvdrv_msg_socket_create:107: Initialized new AF_UNIX socket connection for sd=15
2021-09-22T15:53:43Z iofilterd-cvtblrt[2103769]: CVBLRTD: <= CVBLRT_UXMSG_HELLO#14: vm_uuid=21454cf30ba22688xxxxxxxxxxxxxxxxxxxx, vmdk_uuid=6000c29e6b90342xxxxxxxxxxxxxxxxxxxx, vmdk_path=/vmfs/volumes/vsan:52d3a286xxxxxxxxxxxx-330164f7xxxxxxxx/xxxxxxxxxxx-9622-b3a6-af52-000axxxxxxxxxxxx/VM3.copy_1.vmdk, vmdk_type=BOOT
2021-09-22T15:53:43Z iofilterd-cvtblrt[2103769]: CVBLRTD: <= CVBLRT_UXMSG_HELLO#15: vm_uuid=214xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, vmdk_uuid=6000xxxxxxxxxxxxxxxxxxxxxxxxxxxx, vmdk_path=/vmfs/volumes/vsan:52d3a286xxxxxxxx-330164f7xxxxxxxx/xxxxxxxx-9622-b3a6-af52-xxxxxxxxxxxx/VM1.copy.vmdk, vmdk_type=BOOT

 

原因

RecoverPoint 受保护/目标虚拟机具有在规则中包含以下内容的虚拟机存储策略:
通用规则
缓存 > 自定义
Provider                                            cvtblrt
   freeBoot                                            0

这会导致虚拟机使用名为 Commvault BlockiLevel Replication IO Filter(CVBLRT) 的第三方 IO 筛选器,从而导致下划线虚拟机出现 IO 错误。
RecoverPoint 无法读取或写入虚拟机,导致虚拟机出现错误和任务排队。

解决方案

分辨率:
在不使用 Commvault BlockiLevel Replication IO Filter(CVBLRT) 的情况下将存储策略更改为默认策略或任何策略。

受影响的产品

RecoverPoint for Virtual Machines
文章属性
文章编号: 000192169
文章类型: Solution
上次修改时间: 12 5月 2026
版本:  9
从其他戴尔用户那里查找问题的答案
支持服务
检查您的设备是否在支持服务涵盖的范围内。