Numéro d’article: 532144

printer Impression mail Courriel

Avamar - Datadomain: Failed to create a checkpoint backup due to incorrect deployment of the AVE

Produit principal: Avamar

Produit: Avamar Server plus...

Dernière publication: 13 mars 2020

Type d’article: Break Fix

État de publication: En ligne

Version: 4

Avamar - Datadomain: Failed to create a checkpoint backup due to incorrect deployment of the AVE

Contenu de l’article

Problème


- Failed to create a checkpoint backup
- All maintenance jobs completed successfully
- Confirmed connection between Av and DD
 
mccli event show | grep -i "failed to create"
71546 2019-02-21 09:09:31 CET ERROR       31034 SYSTEM      PROCESS  /                          Failed to create a checkpoint backup.
71229 2019-02-20 09:09:22 CET ERROR       31034 SYSTEM      PROCESS  /                          Failed to create a checkpoint backup.
70926 2019-02-19 09:26:29 CET ERROR       31034 SYSTEM      PROCESS  /                          Failed to create a checkpoint backup.
70340 2019-02-18 09:10:41 CET ERROR       31034 SYSTEM      PROCESS  /                          Failed to create a checkpoint backup.

- Errors returned in the cpbackup log for a certain cp on a certain data partition: data07
 
admin@*****:/usr/local/avamar/var/client/>: less cpbackup-cp.20190221080716-28111.log
.
.
[Thu Feb 21 09:09:17 2019] Backup data07 finished in 00:00:01.
[Thu Feb 21 09:09:17 2019] Cleanup backup for data07
[Thu Feb 21 09:09:17 2019] Backup data07 returned with exit code 158
[Thu Feb 21 09:09:17 2019] Execute: ps -o pid,ppid,cmd --no-headers --ppid 28226 || true
> [Thu Feb 21 09:09:17 2019] Killing all child processes: 28226
> [Thu Feb 21 09:09:17 2019] Killing PIDs (28226) with signal 15.
[Thu Feb 21 09:09:17 2019] Sleeping for 10 seconds after killing processes...
[Thu Feb 21 09:09:27 2019] Execute: ps -o pid,ppid,cmd --noheaders --pid 28226 || true
[Thu Feb 21 09:09:27 2019] Execute output:
28226 28111 [sh] <defunct>
.
.
[Thu Feb 21 09:09:28 2019] Backup data06 finished in 00:00:12.
[Thu Feb 21 09:09:28 2019] Cleanup backup for data06
[Thu Feb 21 09:09:28 2019] Backup data06 returned with exit code 158
[Thu Feb 21 09:09:28 2019] Finished backing up files in 00:00:12.
[Thu Feb 21 09:09:28 2019] Execute: /usr/local/avamar/bin/mccli event publish --code=31034 --attribute="checkpoint" --value="
cp.20190221080716" --attribute="logfile" --value="/space/avamar/var/client/cpbackup-cp.20190221080716-28111.log" --attribute=
"cache" --value="OK" --attribute="data06 elapsed" --value="00:00:12" --attribute="data06 fail" --value="Exit code: 158. Signa
l: 0." --attribute="data07 elapsed" --value="00:00:01" --attribute="data07 fail" --value="Exit code: 158. Signal: 0." --attri
bute="max ddr streams" --value="6" --attribute="max parallel avtars" --value="2" --attribute="parallel running avtars" --valu
e="2" --attribute="pass thru flags" --value="--id=root --ap=******** --hfsaddr=***** --hfsport=27000" --attribute="total
elapsed time" --value="00:00:12" --attribute="volumes" --value="data01 data02 data03 data04 data05 data06 data07"
 
[Thu Mar 22 09:10:03 2018] Execute: /usr/local/avamar/bin/mccli event publish --code=31034 --attribute="checkpoint" --value="cp.20180322150523" --attribute="logfile" --value="/data01/avamar/var/cpbackup-cp.20180322150523-34518.log" 
--attribute="abort reason" --value="Flag --max-ddr-streams must be greater than zero" --attribute="cache" --value="OK" --attribute="max ddr streams" --value="0" --attribute="max parallel avtars" --value="2" --attribute="pass thru flags" --value="--id=root --ap=******** --hfsaddr=avamar" --attribute="volumes" --value="data01 data02 data03"
 abort reason        Flag --max-ddr-streams must be greater than zero
[Thu Feb 21 09:09:31 2019] Execute output:
0,23000,CLI command completed successfully.
 Attribute               Value
 ----------------------- -------------------------------------------------------------
 checkpoint              cp.20190221080716
 cache                   OK
 parallel running avtars 2
 logfile                 /space/avamar/var/client/cpbackup-cp.20190221080716-28111.log
 max ddr streams         6
 pass thru flags         --id=root --ap=******** --hfsaddr=***** --hfsport=27000
 volumes                 data01 data02 data03 data04 data05 data06 data07
 data06 fail             Exit code: 158. Signal: 0.
 total elapsed time      00:00:12
 data07 elapsed          00:00:01
 max parallel avtars     2
 data06 elapsed          00:00:12
 data07 fail             Exit code: 158. Signal: 0.



- Taking a look into the event reported for the cpbackup failing
 
mccli event show --id=71546
0,23000,CLI command completed successfully.
Attribute   Value                                                                                                            
----------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ID          71546
Date        2019-02-21 09:09:31 CET
Type        ERROR
Code        31034
Category    SYSTEM
Severity    PROCESS
Domain      /
Summary     Failed to create a checkpoint backup.
SW Source   MCS:BS
For Whom    All Users
HW Source   *****
Description Failed to create a checkpoint backup.
Remedy      No action required.
Notes       N/A
Data        <data><entry key="checkpoint" type="text" value="cp.20190221080716" version="1"/><entry key="cache" type="text" value="OK" version="1"/><entry key="parallel running avtars" type="text" value="2" version="1"/><entry key="logfile" type="text" value="/space/avamar/var/client/cpbackup-cp.20190221080716-28111.log" version="1"/><entry key="max ddr streams" type="text" value="6" version="1"/><entry key="pass thru flags" type="text" value="--id=root --ap=******** --hfsaddr=***** --hfsport=27000" version="1"/><entry key="volumes" type="text" value="data01 data02 data03 data04 data05 data06 data07" version="1"/><entry key="requestor" type="xml" value="&amp;lt;requestor domain=&amp;quot;/&amp;quot; product=&amp;quot;NONE&amp;quot; role=&amp;quot;Administrator&amp;quot; user=&amp;quot;MCUser&amp;quot;/&amp;gt;" version=""/><entry key="data06 fail" type="text" value="Exit code: 158. Signal: 0." version="1"/><entry key="total elapsed time" type="text" value="00:00:12" version="1"/><entry key="data07 elapsed" type="text" value="00:00:01" version="1"/><entry key="max parallel avtars" type="text" value="2" version="1"/><entry key="data06 elapsed" type="text" value="00:00:12" version="1"/><entry key="data07 fail" type="text" value="Exit code: 158. Signal: 0." version="1"/></data>

- Checking the cpbackup log file for one of the affected data paritions, found a message that the server is read only due to diskfull in the cpbackup log for data06 and data07
 
admin@*****:/usr/local/avamar/var/client/>: less cpbackup-cp.20190221080716-data06.log
.
.

2019-02-21 09:09:16 avtar Info <5554>: Connecting to one node in each datacenter
2019-02-21 09:09:16 avtar Info <5993>: - Connect: Connected to 172.27.7.3:29000, Priv=0, SSL Cipher=AES256-SHA
2019-02-21 09:09:16 avtar Info <5993>: - Datacenter 0 has 1 nodes: Connected to 172.27.7.3:29000, Priv=0, SSL Cipher=AES256-SHA
2019-02-21 09:09:16 avtar Info <42862>: - Server is in read-only mode due to diskfull
2019-02-21 09:09:16 avtar Info <17972>: - Server is in Read-only mode.
2019-02-21 09:09:16 avtar FATAL <8604>: Fatal server connection problem, aborting initialization. Verify correct server address and login credentials.
2019-02-21 09:09:16 avtar FATAL <8941>: Fatal server connection problem, aborting initialization. Verify correct server address and login credentials.
2019-02-21 09:09:16 avtar Info <6149>: Error summary: 2 errors: 8604, 8941
2019-02-21 09:09:16 avtar Info <6645>: Not sending wrapup anywhere.
2019-02-21 09:09:16 avtar Info <5314>: Command failed (2 errors, exit code 10008: cannot establish connection with server (possible network or DNS failure))



- Running fs-perc shows a high difference between data01 and the other partitions:
 
avmaint nodelist |grep -i "fs-perc"
 fs-percent-full="37.0"
 fs-percent-full="8.8"
 fs-percent-full="8.7"
 fs-percent-full="8.7"
 fs-percent-full="8.8"
 fs-percent-full="8.7"
 fs-percent-full="8.7"
 
- Running df -h showed data01 having a smaller size than other partitions but was 37% full while the others were 9%


df -h

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        16G  5.0G   10G  34% /
udev             18G  184K   18G   1% /dev
tmpfs            18G     0   18G   0% /dev/shm
/dev/sda1      1011M   91M  869M  10% /boot
/dev/sda6       7.6G  287M  6.9G   4% /var
/dev/sda8        62G   14G   45G  25% /space
/dev/sdb1       250G   93G  158G  37% /data01
/dev/sdc1       1.0T   90G  934G   9% /data02
/dev/sdd1       1.0T   89G  935G   9% /data03
/dev/sde1       1.0T   90G  935G   9% /data04
/dev/sdf1       1.0T   91G  934G   9% /data05
/dev/sdg1       1.0T   90G  935G   9% /data06
/dev/sdh1       1.0T   90G  935G   9% /data07

 
Cause
Partition /data01 is 250G while the others are 1 TB so this is why its utilization is higher than the others. We do not support having different data partition sizes in the same AVE.

Since this is a 4TB AVE, there should be 6 data partitions (1 TB each); source: https://support.emc.com/docu85221_Avamar-Virtual-Edition-7.5-for-Vmware-System-Installation-Guide.pdf?language=en_US ("AVE virtual disk requirements", page 17).

There is a note there in the same page:

"Because the AVE .ova installation creates three 250 GB storage partitions along with the OS disk, approximately 900 GB of free disk space is required at installation. However, the AVE .ovf installation does not create storage partitions during installation, and therefore only enough disk space for the OS disk is required at installation, and subsequent storage partitions can be created on other datastores"

Since this AVE has 7 partitions (1 of them of 250 GB and the rest 1TB) instead of 6 of 1 TB, so the AVE would be wrongly deployed.
Résolution
In this case, customer will need to deploy a new 4TB AVE (with 6x1TB data partitions, no 250GB partitions) and then replicate all their data to the new system.
Remarques

New deployment would be the responsibility of the local team and professional services.

Problème


- Failed to create a checkpoint backup
- All maintenance jobs completed successfully
- Confirmed connection between Av and DD
 
mccli event show | grep -i "failed to create"
71546 2019-02-21 09:09:31 CET ERROR       31034 SYSTEM      PROCESS  /                          Failed to create a checkpoint backup.
71229 2019-02-20 09:09:22 CET ERROR       31034 SYSTEM      PROCESS  /                          Failed to create a checkpoint backup.
70926 2019-02-19 09:26:29 CET ERROR       31034 SYSTEM      PROCESS  /                          Failed to create a checkpoint backup.
70340 2019-02-18 09:10:41 CET ERROR       31034 SYSTEM      PROCESS  /                          Failed to create a checkpoint backup.

- Errors returned in the cpbackup log for a certain cp on a certain data partition: data07
 
admin@*****:/usr/local/avamar/var/client/>: less cpbackup-cp.20190221080716-28111.log
.
.
[Thu Feb 21 09:09:17 2019] Backup data07 finished in 00:00:01.
[Thu Feb 21 09:09:17 2019] Cleanup backup for data07
[Thu Feb 21 09:09:17 2019] Backup data07 returned with exit code 158
[Thu Feb 21 09:09:17 2019] Execute: ps -o pid,ppid,cmd --no-headers --ppid 28226 || true
> [Thu Feb 21 09:09:17 2019] Killing all child processes: 28226
> [Thu Feb 21 09:09:17 2019] Killing PIDs (28226) with signal 15.
[Thu Feb 21 09:09:17 2019] Sleeping for 10 seconds after killing processes...
[Thu Feb 21 09:09:27 2019] Execute: ps -o pid,ppid,cmd --noheaders --pid 28226 || true
[Thu Feb 21 09:09:27 2019] Execute output:
28226 28111 [sh] <defunct>
.
.
[Thu Feb 21 09:09:28 2019] Backup data06 finished in 00:00:12.
[Thu Feb 21 09:09:28 2019] Cleanup backup for data06
[Thu Feb 21 09:09:28 2019] Backup data06 returned with exit code 158
[Thu Feb 21 09:09:28 2019] Finished backing up files in 00:00:12.
[Thu Feb 21 09:09:28 2019] Execute: /usr/local/avamar/bin/mccli event publish --code=31034 --attribute="checkpoint" --value="
cp.20190221080716" --attribute="logfile" --value="/space/avamar/var/client/cpbackup-cp.20190221080716-28111.log" --attribute=
"cache" --value="OK" --attribute="data06 elapsed" --value="00:00:12" --attribute="data06 fail" --value="Exit code: 158. Signa
l: 0." --attribute="data07 elapsed" --value="00:00:01" --attribute="data07 fail" --value="Exit code: 158. Signal: 0." --attri
bute="max ddr streams" --value="6" --attribute="max parallel avtars" --value="2" --attribute="parallel running avtars" --valu
e="2" --attribute="pass thru flags" --value="--id=root --ap=******** --hfsaddr=***** --hfsport=27000" --attribute="total
elapsed time" --value="00:00:12" --attribute="volumes" --value="data01 data02 data03 data04 data05 data06 data07"
 
[Thu Mar 22 09:10:03 2018] Execute: /usr/local/avamar/bin/mccli event publish --code=31034 --attribute="checkpoint" --value="cp.20180322150523" --attribute="logfile" --value="/data01/avamar/var/cpbackup-cp.20180322150523-34518.log" 
--attribute="abort reason" --value="Flag --max-ddr-streams must be greater than zero" --attribute="cache" --value="OK" --attribute="max ddr streams" --value="0" --attribute="max parallel avtars" --value="2" --attribute="pass thru flags" --value="--id=root --ap=******** --hfsaddr=avamar" --attribute="volumes" --value="data01 data02 data03"
 abort reason        Flag --max-ddr-streams must be greater than zero
[Thu Feb 21 09:09:31 2019] Execute output:
0,23000,CLI command completed successfully.
 Attribute               Value
 ----------------------- -------------------------------------------------------------
 checkpoint              cp.20190221080716
 cache                   OK
 parallel running avtars 2
 logfile                 /space/avamar/var/client/cpbackup-cp.20190221080716-28111.log
 max ddr streams         6
 pass thru flags         --id=root --ap=******** --hfsaddr=***** --hfsport=27000
 volumes                 data01 data02 data03 data04 data05 data06 data07
 data06 fail             Exit code: 158. Signal: 0.
 total elapsed time      00:00:12
 data07 elapsed          00:00:01
 max parallel avtars     2
 data06 elapsed          00:00:12
 data07 fail             Exit code: 158. Signal: 0.



- Taking a look into the event reported for the cpbackup failing
 
mccli event show --id=71546
0,23000,CLI command completed successfully.
Attribute   Value                                                                                                            
----------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ID          71546
Date        2019-02-21 09:09:31 CET
Type        ERROR
Code        31034
Category    SYSTEM
Severity    PROCESS
Domain      /
Summary     Failed to create a checkpoint backup.
SW Source   MCS:BS
For Whom    All Users
HW Source   *****
Description Failed to create a checkpoint backup.
Remedy      No action required.
Notes       N/A
Data        <data><entry key="checkpoint" type="text" value="cp.20190221080716" version="1"/><entry key="cache" type="text" value="OK" version="1"/><entry key="parallel running avtars" type="text" value="2" version="1"/><entry key="logfile" type="text" value="/space/avamar/var/client/cpbackup-cp.20190221080716-28111.log" version="1"/><entry key="max ddr streams" type="text" value="6" version="1"/><entry key="pass thru flags" type="text" value="--id=root --ap=******** --hfsaddr=***** --hfsport=27000" version="1"/><entry key="volumes" type="text" value="data01 data02 data03 data04 data05 data06 data07" version="1"/><entry key="requestor" type="xml" value="&amp;lt;requestor domain=&amp;quot;/&amp;quot; product=&amp;quot;NONE&amp;quot; role=&amp;quot;Administrator&amp;quot; user=&amp;quot;MCUser&amp;quot;/&amp;gt;" version=""/><entry key="data06 fail" type="text" value="Exit code: 158. Signal: 0." version="1"/><entry key="total elapsed time" type="text" value="00:00:12" version="1"/><entry key="data07 elapsed" type="text" value="00:00:01" version="1"/><entry key="max parallel avtars" type="text" value="2" version="1"/><entry key="data06 elapsed" type="text" value="00:00:12" version="1"/><entry key="data07 fail" type="text" value="Exit code: 158. Signal: 0." version="1"/></data>

- Checking the cpbackup log file for one of the affected data paritions, found a message that the server is read only due to diskfull in the cpbackup log for data06 and data07
 
admin@*****:/usr/local/avamar/var/client/>: less cpbackup-cp.20190221080716-data06.log
.
.

2019-02-21 09:09:16 avtar Info <5554>: Connecting to one node in each datacenter
2019-02-21 09:09:16 avtar Info <5993>: - Connect: Connected to 172.27.7.3:29000, Priv=0, SSL Cipher=AES256-SHA
2019-02-21 09:09:16 avtar Info <5993>: - Datacenter 0 has 1 nodes: Connected to 172.27.7.3:29000, Priv=0, SSL Cipher=AES256-SHA
2019-02-21 09:09:16 avtar Info <42862>: - Server is in read-only mode due to diskfull
2019-02-21 09:09:16 avtar Info <17972>: - Server is in Read-only mode.
2019-02-21 09:09:16 avtar FATAL <8604>: Fatal server connection problem, aborting initialization. Verify correct server address and login credentials.
2019-02-21 09:09:16 avtar FATAL <8941>: Fatal server connection problem, aborting initialization. Verify correct server address and login credentials.
2019-02-21 09:09:16 avtar Info <6149>: Error summary: 2 errors: 8604, 8941
2019-02-21 09:09:16 avtar Info <6645>: Not sending wrapup anywhere.
2019-02-21 09:09:16 avtar Info <5314>: Command failed (2 errors, exit code 10008: cannot establish connection with server (possible network or DNS failure))



- Running fs-perc shows a high difference between data01 and the other partitions:
 
avmaint nodelist |grep -i "fs-perc"
 fs-percent-full="37.0"
 fs-percent-full="8.8"
 fs-percent-full="8.7"
 fs-percent-full="8.7"
 fs-percent-full="8.8"
 fs-percent-full="8.7"
 fs-percent-full="8.7"
 
- Running df -h showed data01 having a smaller size than other partitions but was 37% full while the others were 9%


df -h

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        16G  5.0G   10G  34% /
udev             18G  184K   18G   1% /dev
tmpfs            18G     0   18G   0% /dev/shm
/dev/sda1      1011M   91M  869M  10% /boot
/dev/sda6       7.6G  287M  6.9G   4% /var
/dev/sda8        62G   14G   45G  25% /space
/dev/sdb1       250G   93G  158G  37% /data01
/dev/sdc1       1.0T   90G  934G   9% /data02
/dev/sdd1       1.0T   89G  935G   9% /data03
/dev/sde1       1.0T   90G  935G   9% /data04
/dev/sdf1       1.0T   91G  934G   9% /data05
/dev/sdg1       1.0T   90G  935G   9% /data06
/dev/sdh1       1.0T   90G  935G   9% /data07

 
Cause
Partition /data01 is 250G while the others are 1 TB so this is why its utilization is higher than the others. We do not support having different data partition sizes in the same AVE.

Since this is a 4TB AVE, there should be 6 data partitions (1 TB each); source: https://support.emc.com/docu85221_Avamar-Virtual-Edition-7.5-for-Vmware-System-Installation-Guide.pdf?language=en_US ("AVE virtual disk requirements", page 17).

There is a note there in the same page:

"Because the AVE .ova installation creates three 250 GB storage partitions along with the OS disk, approximately 900 GB of free disk space is required at installation. However, the AVE .ovf installation does not create storage partitions during installation, and therefore only enough disk space for the OS disk is required at installation, and subsequent storage partitions can be created on other datastores"

Since this AVE has 7 partitions (1 of them of 250 GB and the rest 1TB) instead of 6 of 1 TB, so the AVE would be wrongly deployed.
Résolution

In this case, customer will need to deploy a new 4TB AVE (with 6x1TB data partitions, no 250GB partitions) and then replicate all their data to the new system.

Remarques

New deployment would be the responsibility of the local team and professional services.

Article Attachments

Pièces jointes

Pièces jointes

Propriétés de l’article

Première publication

mer. avr. 24 2019 09 46 49 GMT

Première publication

mer. avr. 24 2019 09 46 49 GMT

Évaluer cet article

Précis
Utile
Facile à comprendre
Cet article était-il utile?
0/3000 characters