I'm seeking for a good way to monitor Isilon cluster and notify in case of need.
specifically looking for being able to monitor Latency in real time, and report (Email) in case of slowness event.
i know there's the Splunk for Isilon module available, but i couldn't find that it's monitoring latency.
If anyone have custom built such option or have another solution to resolve this challenge, i would be happy to get some details.
One common approach has been to use Graphite to do the monitoring. I know one customer then uses Syren to do the alerting.
A search for "graphite isilon" will likely point you to Jason Davis' python and shell scripts that feed data into graphite. I use a variant of those here. I have not yet implemented Syren although it's on my todo list.
The most important part is to start collecting the data into Graphite. Once you have the data, you can add the alerting later and can always manually check it (say if a user says "my overnight job was slow").
don't think there is much notification but the performance graphs are pretty good in InsightIQ.
There is NO notification in IIQ. The performance graphs are very limited in what they offer and anybody wanting serious monitoring will have to supplement or replace IIQ with another solution anyway.
I am really tempted to completely throw out IIQ in favor of a home grown solution like at least one other company has done. IIQ causes more grief than it's worth
IIQ disqualifies itself by only allowing (or recommending) 8 Clusters or 150 nodes max per instance.
If I need separate instances i do not unify my monitoring, thus enable my monitoring team to have the wrong consoles open.
Threshold alerting is something I have been thinking of adding to the stats connector component of the Isilon SDK.
The stats connector in the SDK grabs stats off a cluster and feeds them to InfluxDB/Grafana today. We will be enhancing it to be able to feed stats into more things (e.g. ELK, Graphite, etc.) - suggestions welcome please. Or if someone wants to contribute by doing that themselves that would be even better.
Given that the stats connector in the SDK is already gathering stats and feeding them to something else it would not be hard to add logic to do some simple threshold checking on a subset of them. Adding coalescing and hysteresis would be more work but a simple email alert would be a start.
Again, if someone wanted to contribute such functionality that would be great.
More details on the Isilon SDK can be found at: