RecoverPoint:当阶段 1 高速缓存内存不足时,复制过程崩溃

Summary: 复制将崩溃,阶段 1 高速缓存内存不能进行足够的断言,从而导致重新启动管控。

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms



一致性组的状态继续处于初始化状态,但正态分布似乎从未启动,并且 CG 不会转换为活动状态。 当第 1 阶段高速缓存内存不足且目标端 RecoverPoint Appliance 无法写入目标日志时,复制过程崩溃并记录断言。 在 /home/kos/replication 日志中发现的症状: 断言: XXXX/XX/XX 18:59: 25.693-#2-17936/16776-AssertLogSender:发送日志:topic = DistributorGroupHandler,msg = 断言失败:bIsPhase1CacheMemorySufficient 行 1825 文件 DistributorGroupHandlerPhase1.cc PID:16776 Info: regular phase1 cache memory not enough m_GroupGridCopyRID = (groupCopyRID=(kVolSlot=XXXXXXXXXX,globalCopyID=GlobalCopy(SiteUID(0xXXXXXXXXXXXXXX) 0) ),gridCopyID=0) XXXX/XX/XX 18:59: 25.694-#2-16911/16776-RemoteLogSender:获得事件(uniqueId = 0,eventTime = 1584471565693987),EventID_KBOX_ASSERTION_FAILED(3031),SiteUID (0xxxxxxxxxxxxxxxxx),seDetails = Sender = replication, Topic=DistributorGroupHandler, msg=Assertion failed: bIsPhase1CacheMemorySufficient Line 1825 File DistributorGroupHandlerPhase1.cc PID:16776 信息: 常规 phase1 高速缓存内存不足 m_GroupGridCopyRID = (groupCopyRID=(kVolSlot=XXXXXXXXXX,globalCopyID=GlobalCopy(SiteUID(0xXXXXXXXXXXX) 0) ),gridCopyID=0) 显示高数据流的统计信息: XXXX/XX/XX 18:52: 41.520-#2-7676/7665-AccumulatorFormatManager::p rintStatistics:组统计信息 Option( kVolSlot = XXXXXXXXXX groupUID = GroupCopy(1346840554 SiteUID(0xXXXXXXXXXXX) 0) gridID = 0):{ STATISTICS: name=InitNCOnePhaseSpeed kVolSlot = 1346840554 groupUID = GroupCopy(1346840554 SiteUID(0xXXXXXXXXXXXXX) 0) gridID = 0 description: init nc one phase speed . STATISTICS: name=InitNCOnePhaseSpeed kVolSlot = 1346840554 groupUID = GroupCopy(1346840554 SiteUID(0xXXXXXXXXXXXXX) 0) gridID = 0 8 sec window:平均:1.14e + 03 MB/秒 STATISTICS: name=InitNCOnePhaseSpeed kVolSlot = 1346840554 groupUID = GroupCopy(1346840554 SiteUID(0xXXXXXXXXXXXXX) 0) gridID = 0 77 sec window:平均:1.06e + 03 MB/秒 一致性组处于 Initialization 状态: 2020/03/17 18:56: 05.070-#2-7954/7665-InitNCState::D istributeOnePhase:分发一个阶段 m_groupID = ( groupCopyRID=( kVolSlot=XXXXXXXXXX,globalCopyID=GlobalCopy(SiteUID (0xXXXXXXXXXXXX) 0) ),gridCopyID=0) 此一致性组的 Phase1 使用者在断言上显示高消费: XXXX/XX/XX 18:56: 05.241-#2-7954/7665-MemoryManager:断言上的 viscus + 倒计时 = 2413/390 + 最小内存需求 = 433429(固定329537灵活103892) + 灵活使用空间 = 37977/3864963 + 池空间使用量 = 37985/4194500(最大143544) >> 1160635626647715840 :p hase1#22 >> (groupTaskID=(sessionID=1817723153,replicationLinkID=(kVolSlot=XXXXXXXXX,srcCopyID=GlobalCopy(SiteUID(0xXXXXXXXXXXXX) >> 0) ,destCopyID=GlobalCopy (SiteUID 还会遇到 Replication StackTrace: 2020/03/17 18:56: 05.278-#0-7954/7665-StackTrace:提供 = 0 3: /home/kos/kashya/archive/lib/libreplication_libsrelease.so(_ZNK6Kashya23DistributorGroupHandler21waitForMemoryIfNeededEv+0x5b2) [0xxxxxxxxxxxxx] 2020/03/17 18:56: 05.278-#0-7954/7665-StackTrace:提供 = 0 4: /home/kos/kashya/archive/lib/libreplication_libsrelease.so(_ZN6Kashya23DistributorGroupHandler25addSequencesToPhase1CacheENS_9SequencesERNS_15ReplicationModeE+0x939) 2020/03/17 18:56: 05.278-#0-7954/7665-StackTrace:提供 = 0 5: /home/kos/kashya/archive/lib/libreplication_libsrelease.so(_ZN6Kashya23DistributorGroupHandler23handleSplittedSequencesENS_9SequencesERKNS_15ReplicationModeERKb+0x20a) 2020/03/17 18:56: 05.278-#0-7954/7665-StackTrace:提供 = 0 6: /home/kos/kashya/archive/lib/libreplication_libsrelease.so(_ZN6Kashya23DistributorGroupHandler15handleSequencesENS_9SequencesERKNS_15ReplicationModeERKb+0x577) 2020/03/17 18:56: 05.278-#0-7954/7665-StackTrace:提供 = 0 7: /home/kos/kashya/archive/lib/libreplication_libsrelease.so(_ZN6Kashya19Distributor_AO_IMPL23continueHandleSequencesENS_9SequencesENS_15ReplicationModeEbRKNS_10GridCopyIDE+0xf7) 2020/03/17 18:56: 05.278-#0-7954/7665-StackTrace:提供 = 0 8: /home/kos/kashya/archive/lib/libreplication_libsrelease.so(_ZN6Kashya16SequencesRequest21continueHandleRequestERNS_28JournalRegulationRequestBase14RequestHandlerE+0x30b) 2020/03/17 18:56: 05.278-#0-7954/7665-StackTrace:提供 = 0 9: /home/kos/kashya/archive/lib/libreplication_libsrelease.so(_ZN6Kashya31JournalRegulationThread_AO_IMPL9process_iERKNS_16GroupGridCopyRIDE+0x36f)

Cause

内存管理器无法扩展对阶段 1 高速缓存的内存分配,这会导致阶段 1 高速缓存没有剩余空间用于传入序列的临时情况,因此断言。

Resolution

解决办法:将调整 t_phase1CacheMemoryThreadSleepTime 的值更改为 5000。(将等待时间从 10 微秒增加到 5 毫秒)。这将确保在线程等待内存 5 毫秒之前我们不会断言。如果问题仍未发生:1.请同时收集生产站点日志。因为这会让我们知道在问题发生时从生产环境发送的数据量。2.将调整t_maxNoOfTriesToWaitForPhase1CacheMemory的值更改为 10。提醒:这些调整仅与版本 5.1.3 及更高版本相关。如果代码版本不是 5.1.3 或更高版本,则必须将 RecoverPoint 升级到最新代码才能利用这些调整。解决方案:Dell EMC 工程部门目前正在调查此问题。目前正在开发永久修复。要寻求帮助,请联系 Dell EMC 客户支持中心或服务代表并参考此解决方案 ID。

Affected Products

RecoverPoint

Products

RecoverPoint, RecoverPoint EX
Article Properties
Article Number: 000174142
Article Type: Solution
Last Modified: 10 Jul 2025
Version:  5
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.