Incorrect AVE deployment
mccli event show | grep -i "failed to create"
71546 2019-02-21 09:09:31 CET ERROR 31034 SYSTEM PROCESS / Failed to create a checkpoint backup.
71229 2019-02-20 09:09:22 CET ERROR 31034 SYSTEM PROCESS / Failed to create a checkpoint backup.
70926 2019-02-19 09:26:29 CET ERROR 31034 SYSTEM PROCESS / Failed to create a checkpoint backup.
70340 2019-02-18 09:10:41 CET ERROR 31034 SYSTEM PROCESS / Failed to create a checkpoint backup.
- Errors returned in the cpbackup log for a certain cp on a certain data partition: data07
admin@*****:/usr/local/avamar/var/client/>: less cpbackup-cp.20190221080716-28111.log
.
.
[Thu Feb 21 09:09:17 2019] Backup data07 finished in 00:00:01.
[Thu Feb 21 09:09:17 2019] Cleanup backup for data07
[Thu Feb 21 09:09:17 2019] Backup data07 returned with exit code 158
[Thu Feb 21 09:09:17 2019] Execute: ps -o pid,ppid,cmd --no-headers --ppid 28226 || true
> [Thu Feb 21 09:09:17 2019] Killing all child processes: 28226
> [Thu Feb 21 09:09:17 2019] Killing PIDs (28226) with signal 15.
[Thu Feb 21 09:09:17 2019] Sleeping for 10 seconds after killing processes...
[Thu Feb 21 09:09:27 2019] Execute: ps -o pid,ppid,cmd --noheaders --pid 28226 || true
[Thu Feb 21 09:09:27 2019] Execute output:
28226 28111 [sh] <defunct>
.
.
[Thu Feb 21 09:09:28 2019] Backup data06 finished in 00:00:12.
[Thu Feb 21 09:09:28 2019] Cleanup backup for data06
[Thu Feb 21 09:09:28 2019] Backup data06 returned with exit code 158
[Thu Feb 21 09:09:28 2019] Finished backing up files in 00:00:12.
[Thu Feb 21 09:09:28 2019] Execute: /usr/local/avamar/bin/mccli event publish --code=31034 --attribute="checkpoint" --value="
cp.20190221080716" --attribute="logfile" --value="/space/avamar/var/client/cpbackup-cp.20190221080716-28111.log" --attribute=
"cache" --value="OK" --attribute="data06 elapsed" --value="00:00:12" --attribute="data06 fail" --value="Exit code: 158. Signa
l: 0." --attribute="data07 elapsed" --value="00:00:01" --attribute="data07 fail" --value="Exit code: 158. Signal: 0." --attri
bute="max ddr streams" --value="6" --attribute="max parallel avtars" --value="2" --attribute="parallel running avtars" --valu
e="2" --attribute="pass thru flags" --value="--id=root --ap=******** --hfsaddr=***** --hfsport=27000" --attribute="total
elapsed time" --value="00:00:12" --attribute="volumes" --value="data01 data02 data03 data04 data05 data06 data07"
[Thu Mar 22 09:10:03 2018] Execute: /usr/local/avamar/bin/mccli event publish --code=31034 --attribute="checkpoint" --value="cp.20180322150523" --attribute="logfile" --value="/data01/avamar/var/cpbackup-cp.20180322150523-34518.log"
--attribute="abort reason" --value="Flag --max-ddr-streams must be greater than zero" --attribute="cache" --value="OK" --attribute="max ddr streams" --value="0" --attribute="max parallel avtars" --value="2" --attribute="pass thru flags" --value="--id=root --ap=******** --hfsaddr=avamar" --attribute="volumes" --value="data01 data02 data03"
abort reason Flag --max-ddr-streams must be greater than zero
[Thu Feb 21 09:09:31 2019] Execute output:
0,23000,CLI command completed successfully.
Attribute Value
----------------------- -------------------------------------------------------------
checkpoint cp.20190221080716
cache OK
parallel running avtars 2
logfile /space/avamar/var/client/cpbackup-cp.20190221080716-28111.log
max ddr streams 6
pass thru flags --id=root --ap=******** --hfsaddr=***** --hfsport=27000
volumes data01 data02 data03 data04 data05 data06 data07
data06 fail Exit code: 158. Signal: 0.
total elapsed time 00:00:12
data07 elapsed 00:00:01
max parallel avtars 2
data06 elapsed 00:00:12
data07 fail Exit code: 158. Signal: 0.
- Taking a look into the event reported for the cpbackup failing
mccli event show --id=71546
0,23000,CLI command completed successfully.
Attribute Value
----------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ID 71546
Date 2019-02-21 09:09:31 CET
Type ERROR
Code 31034
Category SYSTEM
Severity PROCESS
Domain /
Summary Failed to create a checkpoint backup.
SW Source MCS:BS
For Whom All Users
HW Source *****
Description Failed to create a checkpoint backup.
Remedy No action required.
Notes N/A
Data <data><entry key="checkpoint" type="text" value="cp.20190221080716" version="1"/><entry key="cache" type="text" value="OK" version="1"/><entry key="parallel running avtars" type="text" value="2" version="1"/><entry key="logfile" type="text" value="/space/avamar/var/client/cpbackup-cp.20190221080716-28111.log" version="1"/><entry key="max ddr streams" type="text" value="6" version="1"/><entry key="pass thru flags" type="text" value="--id=root --ap=******** --hfsaddr=***** --hfsport=27000" version="1"/><entry key="volumes" type="text" value="data01 data02 data03 data04 data05 data06 data07" version="1"/><entry key="requestor" type="xml" value="&lt;requestor domain=&quot;/&quot; product=&quot;NONE&quot; role=&quot;Administrator&quot; user=&quot;MCUser&quot;/&gt;" version=""/><entry key="data06 fail" type="text" value="Exit code: 158. Signal: 0." version="1"/><entry key="total elapsed time" type="text" value="00:00:12" version="1"/><entry key="data07 elapsed" type="text" value="00:00:01" version="1"/><entry key="max parallel avtars" type="text" value="2" version="1"/><entry key="data06 elapsed" type="text" value="00:00:12" version="1"/><entry key="data07 fail" type="text" value="Exit code: 158. Signal: 0." version="1"/></data>
- Checking the cpbackup log file for one of the affected data paritions, found a message that the server is read only due to diskfull in the cpbackup log for data06 and data07
admin@*****:/usr/local/avamar/var/client/>: less cpbackup-cp.20190221080716-data06.log
.
.
2019-02-21 09:09:16 avtar Info <5554>: Connecting to one node in each datacenter
2019-02-21 09:09:16 avtar Info <5993>: - Connect: Connected to 172.27.7.3:29000, Priv=0, SSL Cipher=AES256-SHA
2019-02-21 09:09:16 avtar Info <5993>: - Datacenter 0 has 1 nodes: Connected to 172.27.7.3:29000, Priv=0, SSL Cipher=AES256-SHA
2019-02-21 09:09:16 avtar Info <42862>: - Server is in read-only mode due to diskfull
2019-02-21 09:09:16 avtar Info <17972>: - Server is in Read-only mode.
2019-02-21 09:09:16 avtar FATAL <8604>: Fatal server connection problem, aborting initialization. Verify correct server address and login credentials.
2019-02-21 09:09:16 avtar FATAL <8941>: Fatal server connection problem, aborting initialization. Verify correct server address and login credentials.
2019-02-21 09:09:16 avtar Info <6149>: Error summary: 2 errors: 8604, 8941
2019-02-21 09:09:16 avtar Info <6645>: Not sending wrapup anywhere.
2019-02-21 09:09:16 avtar Info <5314>: Command failed (2 errors, exit code 10008: cannot establish connection with server (possible network or DNS failure))
- Running fs-perc shows a high difference between data01 and the other partitions:
avmaint nodelist |grep -i "fs-perc"
fs-percent-full="37.0"
fs-percent-full="8.8"
fs-percent-full="8.7"
fs-percent-full="8.7"
fs-percent-full="8.8"
fs-percent-full="8.7"
fs-percent-full="8.7"
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 16G 5.0G 10G 34% /
udev 18G 184K 18G 1% /dev
tmpfs 18G 0 18G 0% /dev/shm
/dev/sda1 1011M 91M 869M 10% /boot
/dev/sda6 7.6G 287M 6.9G 4% /var
/dev/sda8 62G 14G 45G 25% /space
/dev/sdb1 250G 93G 158G 37% /data01
/dev/sdc1 1.0T 90G 934G 9% /data02
/dev/sdd1 1.0T 89G 935G 9% /data03
/dev/sde1 1.0T 90G 935G 9% /data04
/dev/sdf1 1.0T 91G 934G 9% /data05
/dev/sdg1 1.0T 90G 935G 9% /data06
/dev/sdh1 1.0T 90G 935G 9% /data07