Avamar. Резервное копирование завершается сбоем после модернизации до IDPA 2.7.7, если включен или настроен Cloud Tier
Summary: Модернизация IDPA до версии 2.7.7 или Griffin (Avamar и Data Domain) приводит к случайным сбоям резервного копирования из-за отклоненных подключений в Data Domain из Object Existence Check (OEC). ...
Symptoms
В журнале сбоев резервного копирования отображаются следующие сообщения об ошибках:
avtar Warning <18125>: Calling DDR_OPEN returned result code:5040 message:calling system(), returns nonzero
avtar Error <10542>: Data Domain server "ddmgmt.lab.com" open failed DDR result code: 5040, desc: calling system(), returns nonzero
avtar Error <10509>: Problem logging into the DDR server:'', only GSAN communication was enabled.
avtar FATAL <17964>: Backup is incomplete because file "/ddr_files.xml" is missing
avtar Info <10642>: DDR errors caused the backup to not be posted, errors=0, fatals=0
avtar Info <12530>: Backup was not committed to the DDR.
avtar FATAL <8941>: Fatal server connection problem, aborting initialization. Verify correct server address and login credentials.
avtar Info <19155>: - Establishing a connection via token to the Data Domain system with certificate authentication (Connection mode: A:2 E:2).
avtar Warning <18133>: Calling DDR_MOPEN returned result code:(5040) calling system(), returns nonzero message:DDRInstance::Connect: Unable to connect to DDR: ddmgmt.lab.com
[41581] [139628191549184] Wed Oct 9 18:17:26 2024
ddp_connect_with_config() failed, Hostname: ddmgmt.lab.com, Err: 5040-RPC procedure=SYSTEM_INFO failed, Can't connect to NFS server retval=4
[41581] [139628191549184] Wed Oct 9 18:17:26 2024
ddp_connect_with_config_internal() failed, Hostname: ddmgmt.lab.com, Err: 5040-RPC procedure=SYSTEM_INFO failed, Can't connect to NFS server retval=4
[41581] [139628237514496] Wed Oct 9 18:16:23 2024
ddp_access() failed, Path avamar-1234567890/STAGING/10f19ca3331644f885c61dae1eb936cb7624eb03/BACKUP-30C108396751178970C7E117A05FE89E5C34A8D3, mode 0 Err: 5004-nfs lookup failed (nfs: No such file or directory)
avtar FATAL <5889>: Fatal signal 11 in pid 41611
[SessionMgr] FATAL ERROR: <0001> uapp::handlefatal: Fatal signal 11
avtar Warning <18133>: Calling DDR_WRITE returned result code:(5040) calling system(), returns nonzero message:DDRIO_Write::WriteToDDR: ddp_write failed
[18529] [139991135528704] Thu Nov 21 09:07:15 2024
ddp_write() failed Offset 0, BytesToWrite 3805, BytesWritten 0 Err: 5040-DDBoost OST_QUERY_SECURE RPC failure 4
[18529] [139991135528704] Thu Nov 21 09:04:40 2024
ddp_stat() failed, Path avamar-1234567890//STAGING/93f26264b84f4e30018f8f9755144866b48fec42/BACKUP-3262F4E4E3FA660B5975057EC08CD98140049755/DBF03EC0AAA6783DADBE469DCDD94913E4EC2BDA, Err: 5004-nfs lookup failed (nfs: No such file or directory)
[18529] [139991153891072] Thu Nov 21 09:04:40 2024
ddp_access() failed, Path avamar-1634225547/STAGING/93f26264b84f4e30018f8f9755144866b48fec42/BACKUP-3262F4E4E3FA660B5975057EC08CD98140049755, mode 0 Err: 5004-nfs lookup failed (nfs: No such file or directory)
avtar Info <10690>: - Processed file on Data Domain: "VMConfiguration/avamar vm configuration.xml" (3,805 bytes)
avtar Error <16709>: DDRInstance::Invoke - ddrmgr write failure result code: 5040
avtar FATAL <0000>: <10565>Failed to write data to stream, stream index: 7, DDR stream handle: 1003, DDR result code: 5040 desc: calling system(), returns nonzero.
avtar FATAL <40009>: DDR encountered errors.
avtar Info <9772>: Starting graceful (staged) termination, DDR_ERROR event received (fatal severity) (wrap-up stage)
avtar Info <0000>: Entering the 'final' phase of termination, DDR_ERROR need to exit)
avtar FATAL <5155>: Backup aborted due to earlier errors. No backup created on the server.
Data Domain показывает несколько отклоненных подключений:
Recent Alerts and Log Messages
------------------------------
Nov 20 22:03:46 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 63 connection attempts to ddr in the last 1384 minutes, already has 36 connection to port 264.
Nov 20 22:34:21 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 53 connection attempts to ddr in the last 30 minutes, already has 44 connection to port 264.
Nov 20 23:08:11 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 115 connection attempts to ddr in the last 33 minutes, already has 41 connection to port 264.
Nov 20 23:33:50 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 8 connection attempts to ddr in the last 25 minutes, already has 43 connection to port 264.
Nov 20 23:50:00 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 8 connection attempts to ddr in the last 16 minutes, already has 42 connection to port 264.
Nov 21 02:04:07 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 33 connection attempts to ddr in the last 134 minutes, already has 42 connection to port 264.
Nov 21 02:36:45 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 108 connection attempts to ddr in the last 32 minutes, already has 50 connection to port 264.
Nov 21 03:06:47 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 47 connection attempts to ddr in the last 30 minutes, already has 53 connection to port 264.
Существует несколько сеансов CLOSE_WAIT, указывающих на идентификатор процесса в файловой системе Data Domain (ddfs):
!!!! ddmgmt YOUR DATA IS IN DANGER !!!! # while true; do echo -n "CLOSE_WAIT Connections ===>"; netstat -tanp | grep CLOSE_WAIT | grep ddfs | wc -l; sleep 60; done
CLOSE_WAIT connections ===>265
CLOSE_WAIT connections ===>314
CLOSE_WAIT connections ===>360
CLOSE_WAIT connections ===>411
CLOSE_WAIT connections ===>459
CLOSE_WAIT connections ===>484
CLOSE_WAIT connections ===>503
CLOSE_WAIT connections ===>503
...
!!!! ddmgmt YOUR DATA IS IN DANGER !!!! #Cause
Оперативная проверка существования объекта (OEC) открывает соединения, но они остаются открытыми.
Инженерный отдел Data Domain все еще изучает этот признак.
Resolution
Временное временное решение проблемы — отключение CM_OEC_ENABLED. Объекты записываются в облако в рамках перемещения данных. OEC периодически создает список этих объектов в облаке, чтобы убедиться, что все ожидаемые объекты по-прежнему присутствуют в облаке. Это важная часть архитектуры Data Invulnerability Architecture (DIA). Если какой-либо объект отсутствует, создается оповещение. Эта проверка не выполняется, если OEC отключен.
Для выполнения этой задачи обратитесь в службу технической поддержки Dell Data Domain.