prdprd
1 Copper

Alerting on when jobs succeded

Hi,

Has anyone come up with a good way to alert on the last time a job completed successfully?

Lots of our problems seem to be with jobs not getting enough run time E.G. SmartPools.

We run nagios alerts for pool utilization but I need a good way to get the information out.

E.G. Alert me when SmartPools hasnt completed Successfully for 5 days.

Something like

ISI_SUCCEED_DATE=`isi job history --job=SmartPools -l1000 -H  | grep Succeeded | head -1 |awk '{print $1}'`

#E.G. ISI_SUCCEED_DATE=01/01

SUCCEED_DATE=`date --date="$SUCCEED_DATE" +"%y%m%d"`

# turn it into a useful format.

TRIGGER_DATE=`date -d "-3 days" +"%y%m%d"`

# trigger is today - 3 days

if [ $TRIGGER_DATE -ge $SUCCEED_DATE ];
then
echo
'Hasn't completed successfully in 3 days, do something about it.';
fi




This seems a long way round a simple problem. Is there a different way i'm missing?


Thanks

Pete

Labels (1)
0 Kudos
1 Reply
cadiletta
2 Iron

Re: Alerting on when jobs succeded

Pete,

This is a really great idea for keeping track of critical job status.  By design of course things are configured to only focus on failed actions - but lack of successful actions in a given time period I've not seen before.  Indeed you'd have to do something like your script in cron at the moment, but I am interested to hear if you are able to get this working successfully, or if others have suggestions.  In the meantime, this could be a great suggestion for monitoring enhancement that I'll see if I can get in front of the right people. 

0 Kudos