Unsolved
This post is more than 5 years old
2 Posts
0
574
Alerting on when jobs succeded
Hi,
Has anyone come up with a good way to alert on the last time a job completed successfully?
Lots of our problems seem to be with jobs not getting enough run time E.G. SmartPools.
We run nagios alerts for pool utilization but I need a good way to get the information out.
E.G. Alert me when SmartPools hasnt completed Successfully for 5 days.
Something like
ISI_SUCCEED_DATE=`isi job history --job=SmartPools -l1000 -H | grep Succeeded | head -1 |awk '{print $1}'`
#E.G. ISI_SUCCEED_DATE=01/01
SUCCEED_DATE=`date --date="$SUCCEED_DATE" +"%y%m%d"`
# turn it into a useful format.
TRIGGER_DATE=`date -d "-3 days" +"%y%m%d"`
# trigger is today - 3 days
if [ $TRIGGER_DATE -ge $SUCCEED_DATE ];
then
echo 'Hasn't completed successfully in 3 days, do something about it.';
fi
This seems a long way round a simple problem. Is there a different way i'm missing?
Thanks
Pete
cadiletta
106 Posts
0
January 6th, 2015 07:00
Pete,
This is a really great idea for keeping track of critical job status. By design of course things are configured to only focus on failed actions - but lack of successful actions in a given time period I've not seen before. Indeed you'd have to do something like your script in cron at the moment, but I am interested to hear if you are able to get this working successfully, or if others have suggestions. In the meantime, this could be a great suggestion for monitoring enhancement that I'll see if I can get in front of the right people.