I have a customer that is running Avamar version 6.1.1-87. I'm seeing failed replication errors in the activity monitor but the replication.log file doesn't have any errors. When I check the target grid for the clients that are failing, I see all of the images. I also noticed that there may be up to 8 replication jobs active in the activity monitor but only 1 stream is configured. The server session only shows 1 active job. The extra running jobs will eventually fail after 15 minutes.
Multiple streams sounds like there may be a custom replication setup. If the replication that is setup follows the naming standards, repl_cron.cfg, repl2_cron.cfg, ... then there should be corresponding replicate.log files for each as well located in the /usr/local/avamar/var/cron/ directory there will be a replicate.log, replicate2.log, ... a different one for each replication job assuming there is more than one configured.
Yeah, looks like multiple replication jobs are configured as stated by JBalentine. You many want to check your crontab via command:
crontab -l -u dpn
Where you can see the jobs configured. Accordingly check the replication logs in the var/cron directory where you might see job details.
The muliple replication jobs are configured by making changes to /usr/local/avamar/bin/dpncron.pm and creating clones of repl_cron inside the same directory.
to get a better look at what might be going on.
log into avamar command line
crontab -l -u dpn [this will show you if you have more than one cron job scheduled
the one at the bottom is the standard one you see in the gui. Any above that are custom.
15 17 * * * /usr/local/avamar/bin/cron_env_wrapper /usr/local/avamar/lib/mcs_ssh_add repl13_cron
30 17 * * * /usr/local/avamar/bin/cron_env_wrapper /usr/local/avamar/lib/mcs_ssh_add repl14_cron
# <<< BEGIN AVAMAR ADMINISTRATOR MANAGED ENTRIES -- DO NOT MANUALLY MODIFY >>>
0 4 * * * /usr/local/avamar/bin/cron_env_wrapper /usr/local/avamar/lib/mcs_ssh_add repl_cron
# <<< END AVAMAR ADMINISTRATOR MANAGED ENTRIES >>>
Note none of them should start at the same time.
once you see your numbers you can do the following
this command will show you the log the default repl job
replrpt.sh --log=/usr/localavamar/var/cron/replicate#.log [ replace # with what you see in cron for your repl#_cron entry
this will then show you the log for each of these.
each job is give an amount of time to run or will be killed when the Blackout Window starts
so find out when you Blackout starts before you look at the logs
now looking at the logs find out the time of the errors... and what kind of error..
what you are looking for is - is it an error that happens at different times with the same log or do they all error out at the start of the Blackout Window.
so if the error is at the start of the Blackout window then there is just not enough time from the start of replication job to the BW for it to finish.
if it is random times there must be something else wrong, talk to support.
to see if your replication of backups is keeping up run the following script - it will tell you how far behind any server is in getting its images replicated
if your replication jobs are not keeping up, and as it seems you are not the one who set them up, work with support to get them adjusted.
Maybe I didn't explain this corretly. We only have 1 replication cron job running. What I'm seeing is errors in the activity monitor for replication jobs failing on certain clients. It is not the same client each day. When I log in the replication.log for the failed client, there are no errors for that client, only in the activity monitor. I even check the target grid and all images are being replicated. I almost wonder if the MCS is not refreshing fast enough,
Replication starts at noon, i hour after the blackout window ends. There is no errors in the replication.log file, just when you look in the activity monitor. It just fails after 15 minutes with 0 bytes transferred.
ok you keep saying it ERRORS - that means to me it is giveing you and error number or something else that is showing that it did not work.
just what are you seeing that says it is erroring?
and when you logged into the avamar untility node and ran the command
it showed no errors?
so the only thing I can think of is your destation is full or down.