Avamar : Les sauvegardes échouent après la mise à niveau vers IDPA 2.7.7 lorsque Cloud Tier est activé ou configuré
Summary: La mise à niveau d’IDPA vers la version 2.7.7 ou Griffin (Avamar et Data Domain) provoque des échecs de sauvegarde aléatoires en raison des connexions rejetées dans Data Domain à partir d’Object Existence Check (OEC). ...
Symptoms
Les messages d’erreur suivants s’affichent dans le log des échecs de sauvegarde :
avtar Warning <18125>: Calling DDR_OPEN returned result code:5040 message:calling system(), returns nonzero
avtar Error <10542>: Data Domain server "ddmgmt.lab.com" open failed DDR result code: 5040, desc: calling system(), returns nonzero
avtar Error <10509>: Problem logging into the DDR server:'', only GSAN communication was enabled.
avtar FATAL <17964>: Backup is incomplete because file "/ddr_files.xml" is missing
avtar Info <10642>: DDR errors caused the backup to not be posted, errors=0, fatals=0
avtar Info <12530>: Backup was not committed to the DDR.
avtar FATAL <8941>: Fatal server connection problem, aborting initialization. Verify correct server address and login credentials.
avtar Info <19155>: - Establishing a connection via token to the Data Domain system with certificate authentication (Connection mode: A:2 E:2).
avtar Warning <18133>: Calling DDR_MOPEN returned result code:(5040) calling system(), returns nonzero message:DDRInstance::Connect: Unable to connect to DDR: ddmgmt.lab.com
[41581] [139628191549184] Wed Oct 9 18:17:26 2024
ddp_connect_with_config() failed, Hostname: ddmgmt.lab.com, Err: 5040-RPC procedure=SYSTEM_INFO failed, Can't connect to NFS server retval=4
[41581] [139628191549184] Wed Oct 9 18:17:26 2024
ddp_connect_with_config_internal() failed, Hostname: ddmgmt.lab.com, Err: 5040-RPC procedure=SYSTEM_INFO failed, Can't connect to NFS server retval=4
[41581] [139628237514496] Wed Oct 9 18:16:23 2024
ddp_access() failed, Path avamar-1234567890/STAGING/10f19ca3331644f885c61dae1eb936cb7624eb03/BACKUP-30C108396751178970C7E117A05FE89E5C34A8D3, mode 0 Err: 5004-nfs lookup failed (nfs: No such file or directory)
avtar FATAL <5889>: Fatal signal 11 in pid 41611
[SessionMgr] FATAL ERROR: <0001> uapp::handlefatal: Fatal signal 11
avtar Warning <18133>: Calling DDR_WRITE returned result code:(5040) calling system(), returns nonzero message:DDRIO_Write::WriteToDDR: ddp_write failed
[18529] [139991135528704] Thu Nov 21 09:07:15 2024
ddp_write() failed Offset 0, BytesToWrite 3805, BytesWritten 0 Err: 5040-DDBoost OST_QUERY_SECURE RPC failure 4
[18529] [139991135528704] Thu Nov 21 09:04:40 2024
ddp_stat() failed, Path avamar-1234567890//STAGING/93f26264b84f4e30018f8f9755144866b48fec42/BACKUP-3262F4E4E3FA660B5975057EC08CD98140049755/DBF03EC0AAA6783DADBE469DCDD94913E4EC2BDA, Err: 5004-nfs lookup failed (nfs: No such file or directory)
[18529] [139991153891072] Thu Nov 21 09:04:40 2024
ddp_access() failed, Path avamar-1634225547/STAGING/93f26264b84f4e30018f8f9755144866b48fec42/BACKUP-3262F4E4E3FA660B5975057EC08CD98140049755, mode 0 Err: 5004-nfs lookup failed (nfs: No such file or directory)
avtar Info <10690>: - Processed file on Data Domain: "VMConfiguration/avamar vm configuration.xml" (3,805 bytes)
avtar Error <16709>: DDRInstance::Invoke - ddrmgr write failure result code: 5040
avtar FATAL <0000>: <10565>Failed to write data to stream, stream index: 7, DDR stream handle: 1003, DDR result code: 5040 desc: calling system(), returns nonzero.
avtar FATAL <40009>: DDR encountered errors.
avtar Info <9772>: Starting graceful (staged) termination, DDR_ERROR event received (fatal severity) (wrap-up stage)
avtar Info <0000>: Entering the 'final' phase of termination, DDR_ERROR need to exit)
avtar FATAL <5155>: Backup aborted due to earlier errors. No backup created on the server.
Data Domain affiche plusieurs connexions rejetées :
Recent Alerts and Log Messages
------------------------------
Nov 20 22:03:46 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 63 connection attempts to ddr in the last 1384 minutes, already has 36 connection to port 264.
Nov 20 22:34:21 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 53 connection attempts to ddr in the last 30 minutes, already has 44 connection to port 264.
Nov 20 23:08:11 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 115 connection attempts to ddr in the last 33 minutes, already has 41 connection to port 264.
Nov 20 23:33:50 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 8 connection attempts to ddr in the last 25 minutes, already has 43 connection to port 264.
Nov 20 23:50:00 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 8 connection attempts to ddr in the last 16 minutes, already has 42 connection to port 264.
Nov 21 02:04:07 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 33 connection attempts to ddr in the last 134 minutes, already has 42 connection to port 264.
Nov 21 02:36:45 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 108 connection attempts to ddr in the last 32 minutes, already has 50 connection to port 264.
Nov 21 03:06:47 ddmgmt ddfs[22835]: WARNING: MSG-RPC-00002: Rejected 47 connection attempts to ddr in the last 30 minutes, already has 53 connection to port 264.
Il existe plusieurs sessions CLOSE_WAIT pointant vers l’ID de processus Data Domain File System (ddfs) :
!!!! ddmgmt YOUR DATA IS IN DANGER !!!! # while true; do echo -n "CLOSE_WAIT Connections ===>"; netstat -tanp | grep CLOSE_WAIT | grep ddfs | wc -l; sleep 60; done
CLOSE_WAIT connections ===>265
CLOSE_WAIT connections ===>314
CLOSE_WAIT connections ===>360
CLOSE_WAIT connections ===>411
CLOSE_WAIT connections ===>459
CLOSE_WAIT connections ===>484
CLOSE_WAIT connections ===>503
CLOSE_WAIT connections ===>503
...
!!!! ddmgmt YOUR DATA IS IN DANGER !!!! #Cause
Le contrôle d’existence de l’objet (OEC) en ligne ouvre les connexions, mais elles restent ouvertes.
L’équipe d’ingénieurs Data Domain continue d’étudier ce symptôme.
Resolution
Une solution de contournement temporaire consiste à désactiver le CM_OEC_ENABLED. Les objets sont écrits dans le Cloud dans le cadre du déplacement des données. L’OEC dresse régulièrement la liste de ces objets dans le Cloud pour s’assurer que ce qu’il attend est toujours présent dans le Cloud. Il s’agit d’une partie importante de l’architecture DIA (Data Invulnerability Architecture). Si un objet est manquant, une alerte est déclenchée. Cette vérification n’est pas effectuée si OEC est désactivé.
Contactez le support technique Dell Data Domain pour effectuer cette tâche.