I have a special request to write an alert that will trigger when the utilization (or CPU) averages over 90% for a 5 minute period.
As things are configured today the system spikes over 90% and generates an e-mail to the team, but the e-mails are really meaningless as the spikes quickly drop off. But, if the system remains over 90% for a period of time we could have a real issue.
Has anyone else written an alert like this one? (Any help or pointers on the correct documentation to review would be appreciated!)
Isaka, Thank you for the quick response!
So, I just want to be sure that I understand -
The flow you've documented says that
1. Take the average over the last 15 minutes.
2. Get the sustained average over the last 5 minutes.
3. Compare the sustained average to a benchmark (in this case 90%)
4. Log an alert if over 90 (Send e-mail)
alt 4. Clear log alert
(I'm trying to use this to monitor CPU on VPLex with is a little different as well......Especially when gathering the initial metric.)
If I would like this alert to send an e-mail I would just substitute the "Log Alert Set" with an e-mail and I could leave the "Log alert clear" as the alternative.
BTW - is there a manual that specifically documents all of the parameters for the components in the "flowchart" method to write these custom alerts?
You are correct, except on #2 it takes the sustained average of the last 15 minutes.
For Vplex CPU, you can refine the filter in the reporting UI by using the advanced search to narrow down your results or looking at the filters used in a report in browse mode.
Something like source=='VPLEX-Collector' & name=='CurrentUtilization' & parttype=='Processor' should be a good start.
Indeed, you can replace the log with an email action. Note that the clear will only trigger if the alert has been triggered and the value goes below the comparator.
Also you should pay attention to the stateless setting in the comparator, this effects how often the alert is triggered, if stateless in enabled the alert will only trigger when the value crosses the threshold, if non stateless all values above the threshold will trigger...which can make for a lot of notifications in some cases.
Yes, there is a manual available in the Documentation.
Thank you again Isaka!
I am trying to create a custom grouping, but for some reason I don't have an Icon to add another grouping, and when I try to modify one of the examples and then save the new grouping I receive an error. Here are a few screenshots:
(I've attempted the add as both myself and as the admin user to ensure that this isn't a permissions issue.)
No add button:
My alert design:
Setting up to save the report:
Any thoughts? Maybe I am doing something wrong in my report? I can't figure it out.
Thank you again for your help!