Highlighted
Roya1
1 Copper

Monitoring Isilon latency using Splunk

Hi,

I'm seeking for a good way to monitor Isilon cluster and notify in case of need.

specifically looking for being able to monitor Latency in real time, and report (Email) in case of slowness event.

i know there's the Splunk for Isilon module available, but i couldn't find that it's monitoring latency.

If anyone have custom built such option or have another solution to resolve this challenge, i would be happy to get some details.

Many Thanks,

Roy

Labels (2)
0 Kudos
7 Replies
ed_wilts
2 Iron

Re: Monitoring Isilon latency using Splunk

One common approach has been to use Graphite to do the monitoring.  I know one customer then uses Syren to do the alerting.

A search for "graphite isilon" will likely point you to Jason Davis' python and shell scripts that feed data into graphite.  I use a variant of those here.  I have not yet implemented Syren although it's on my todo list.

The most important part is to start collecting the data into Graphite.  Once you have the data, you can add the alerting later and can always manually check it (say if a user says "my overnight job was slow").

dynamox
6 Thallium

Re: Monitoring Isilon latency using Splunk

don't think there is much notification but the performance graphs are pretty good in InsightIQ.

0 Kudos
ed_wilts
2 Iron

Re: Monitoring Isilon latency using Splunk

don't think there is much notification but the performance graphs are pretty good in InsightIQ.

There is NO notification in IIQ.  The performance graphs are very limited in what they offer and anybody wanting serious monitoring will have to supplement or replace IIQ with another solution anyway.

I am really tempted to completely throw out IIQ in favor of a home grown solution like at least one other company has done.  IIQ causes more grief than it's worth

0 Kudos
dynamox
6 Thallium

Re: Monitoring Isilon latency using Splunk

different strokes for different folks, works great here. I was very flaky back in 1.x days but it's been rock solid for the past 2 years.

0 Kudos
sluetze
3 Silver

Re: Monitoring Isilon latency using Splunk

IIQ disqualifies itself by only allowing (or recommending) 8 Clusters or 150 nodes max per instance.

If I need separate instances i do not unify my monitoring, thus enable my monitoring team to have the wrong consoles open.

0 Kudos
dynamox
6 Thallium

Re: Monitoring Isilon latency using Splunk

as i said, different strokes for different folks. What does not work for you, works just fine in my environment.

0 Kudos
cstacey
1 Nickel

Re: Monitoring Isilon latency using Splunk

Threshold alerting is something I have been thinking of adding to the stats connector component of the Isilon SDK.

The stats connector in the SDK grabs stats off a cluster and feeds them to InfluxDB/Grafana today.   We will be enhancing it to be able to feed stats into more things (e.g. ELK, Graphite, etc.) - suggestions welcome please.    Or if someone wants to contribute by doing that themselves that would be even better.

Given that the stats connector in the SDK is already gathering stats and feeding them to something else it would not be hard to add logic to do some simple threshold checking on a subset of them.   Adding coalescing and hysteresis would be more work but a simple email alert would be a start.

Again, if someone wanted to contribute such functionality that would be great.

More details on the Isilon SDK can be found at:

https://community.emc.com/docs/DOC-48273

Cheers,
Chris

0 Kudos