Avamar: I backup hanno esito negativo dopo l'aggiornamento a IDPA 2.7.7 quando Cloud Tier è abilitato o configurato
Summary: L'aggiornamento di IDPA alla versione 2.7.7 o Griffin (Avamar e Data Domain) causa errori di backup casuali a causa di connessioni rifiutate in Data Domain da Object Existence Check (OEC). ...
Symptoms
Nel registro dei backup non riusciti vengono visualizzati i seguenti messaggi di errore:
avtar Warning <18125>: Calling DDR_OPEN returned result code:5040 message:calling system(), returns nonzero
avtar Error <10542>: Data Domain server "ddmgmt.lab.com" open failed DDR result code: 5040, desc: calling system(), returns nonzero
avtar Error <10509>: Problem logging into the DDR server:'', only GSAN communication was enabled.
avtar FATAL <17964>: Backup is incomplete because file "/ddr_files.xml" is missing
avtar Info <10642>: DDR errors caused the backup to not be posted, errors=0, fatals=0
avtar Info <12530>: Backup was not committed to the DDR.
avtar FATAL <8941>: Fatal server connection problem, aborting initialization. Verify correct server address and login credentials.
avtar Info <19155>: - Establishing a connection via token to the Data Domain system with certificate authentication (Connection mode: A:2 E:2).
avtar Warning <18133>: Calling DDR_MOPEN returned result code:(5040) calling system(), returns nonzero message:DDRInstance::Connect: Unable to connect to DDR: ddmgmt.lab.com
[41581] [139628191549184] Wed Oct 9 18:17:26 2024
ddp_connect_with_config() failed, Hostname: ddmgmt.lab.com, Err: 5040-RPC procedure=SYSTEM_INFO failed, Can't connect to NFS server retval=4
[41581] [139628191549184] Wed Oct 9 18:17:26 2024
ddp_connect_with_config_internal() failed, Hostname: ddmgmt.lab.com, Err: 5040-RPC procedure=SYSTEM_INFO failed, Can't connect to NFS server retval=4
[41581] [139628237514496] Wed Oct 9 18:16:23 2024
ddp_access() failed, Path avamar-1234567890/STAGING/10f19ca3331644f885c61dae1eb936cb7624eb03/BACKUP-30C108396751178970C7E117A05FE89E5C34A8D3, mode 0 Err: 5004-nfs lookup failed (nfs: No such file or directory)
avtar FATAL <5889>: Fatal signal 11 in pid 41611
[SessionMgr] FATAL ERROR: <0001> uapp::handlefatal: Fatal signal 11
avtar Warning <18133>: Calling DDR_WRITE returned result code:(5040) calling system(), returns nonzero message:DDRIO_Write::WriteToDDR: ddp_write failed
[18529] [139991135528704] Thu Nov 21 09:07:15 2024
ddp_write() failed Offset 0, BytesToWrite 3805, BytesWritten 0 Err: 5040-DDBoost OST_QUERY_SECURE RPC failure 4
[18529] [139991135528704] Thu Nov 21 09:04:40 2024
ddp_stat() failed, Path avamar-1234567890//STAGING/93f26264b84f4e30018f8f9755144866b48fec42/BACKUP-3262F4E4E3FA660B5975057EC08CD98140049755/DBF03EC0AAA6783DADBE469DCDD94913E4EC2BDA, Err: 5004-nfs lookup failed (nfs: No such file or directory)
[18529] [139991153891072] Thu Nov 21 09:04:40 2024
ddp_access() failed, Path avamar-1634225547/STAGING/93f26264b84f4e30018f8f9755144866b48fec42/BACKUP-3262F4E4E3FA660B5975057EC08CD98140049755, mode 0 Err: 5004-nfs lookup failed (nfs: No such file or directory)
avtar Info <10690>: - Processed file on Data Domain: "VMConfiguration/avamar vm configuration.xml" (3,805 bytes)
avtar Error <16709>: DDRInstance::Invoke - ddrmgr write failure result code: 5040
avtar FATAL <0000>: <10565>Failed to write data to stream, stream index: 7, DDR stream handle: 1003, DDR result code: 5040 desc: calling system(), returns nonzero.
avtar FATAL <40009>: DDR encountered errors.
avtar Info <9772>: Starting graceful (staged) termination, DDR_ERROR event received (fatal severity) (wrap-up stage)
avtar Info <0000>: Entering the 'final' phase of termination, DDR_ERROR need to exit)
avtar FATAL <5155>: Backup aborted due to earlier errors. No backup created on the server.
Il Data Domain mostra diverse connessioni rifiutate:
Recent Alerts and Log Messages
------------------------------
Nov 20 22:03:46 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 63 connection attempts to ddr in the last 1384 minutes, already has 36 connection to port 264.
Nov 20 22:34:21 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 53 connection attempts to ddr in the last 30 minutes, already has 44 connection to port 264.
Nov 20 23:08:11 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 115 connection attempts to ddr in the last 33 minutes, already has 41 connection to port 264.
Nov 20 23:33:50 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 8 connection attempts to ddr in the last 25 minutes, already has 43 connection to port 264.
Nov 20 23:50:00 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 8 connection attempts to ddr in the last 16 minutes, already has 42 connection to port 264.
Nov 21 02:04:07 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 33 connection attempts to ddr in the last 134 minutes, already has 42 connection to port 264.
Nov 21 02:36:45 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 108 connection attempts to ddr in the last 32 minutes, already has 50 connection to port 264.
Nov 21 03:06:47 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 47 connection attempts to ddr in the last 30 minutes, already has 53 connection to port 264.
Esistono diverse sessioni di CLOSE_WAIT che puntano all'ID del processo ddfs (Data Domain File System):
!!!! ddmgmt YOUR DATA IS IN DANGER !!!! # while true; do echo -n "CLOSE_WAIT Connections ===>"; netstat -tanp | grep CLOSE_WAIT | grep ddfs | wc -l; sleep 60; done
CLOSE_WAIT connections ===>265
CLOSE_WAIT connections ===>314
CLOSE_WAIT connections ===>360
CLOSE_WAIT connections ===>411
CLOSE_WAIT connections ===>459
CLOSE_WAIT connections ===>484
CLOSE_WAIT connections ===>503
CLOSE_WAIT connections ===>503
...
!!!! ddmgmt YOUR DATA IS IN DANGER !!!! #Cause
L'OEC (Object Existence Check) online apre le connessioni, ma queste rimangono aperte.
Data Domain Engineering sta ancora esaminando questo sintomo.
Resolution
Una soluzione temporanea consiste nel disabilitare il CM_OEC_ENABLED. Gli oggetti vengono scritti nel cloud come parte dello spostamento dei dati. L'OEC esegue un elenco periodico di questi oggetti nel cloud per garantire che ciò che si aspetta sia ancora presente nel cloud. Si tratta di una parte importante di Data Invulnerability Architecture (DIA). Se manca un oggetto, viene generato un avviso. Questo controllo non viene eseguito se OEC è disabilitato.
Contattare il supporto tecnico di Dell Data Domain per eseguire questa attività.