Start a Conversation

Unsolved

This post is more than 5 years old

A

5 Practitioner

 • 

274.2K Posts

2896

April 26th, 2016 20:00

Monitoring Isilon latency using Splunk

Hi,

I'm seeking for a good way to monitor Isilon cluster and notify in case of need.

specifically looking for being able to monitor Latency in real time, and report (Email) in case of slowness event.

i know there's the Splunk for Isilon module available, but i couldn't find that it's monitoring latency.

If anyone have custom built such option or have another solution to resolve this challenge, i would be happy to get some details.

Many Thanks,

Roy

April 28th, 2016 11:00

One common approach has been to use Graphite to do the monitoring.  I know one customer then uses Syren to do the alerting.

A search for "graphite isilon" will likely point you to Jason Davis' python and shell scripts that feed data into graphite.  I use a variant of those here.  I have not yet implemented Syren although it's on my todo list.

The most important part is to start collecting the data into Graphite.  Once you have the data, you can add the alerting later and can always manually check it (say if a user says "my overnight job was slow").

2 Intern

 • 

20.4K Posts

April 28th, 2016 14:00

don't think there is much notification but the performance graphs are pretty good in InsightIQ.

April 29th, 2016 06:00

don't think there is much notification but the performance graphs are pretty good in InsightIQ.

There is NO notification in IIQ.  The performance graphs are very limited in what they offer and anybody wanting serious monitoring will have to supplement or replace IIQ with another solution anyway.

I am really tempted to completely throw out IIQ in favor of a home grown solution like at least one other company has done.  IIQ causes more grief than it's worth

2 Intern

 • 

20.4K Posts

May 9th, 2016 08:00

different strokes for different folks, works great here. I was very flaky back in 1.x days but it's been rock solid for the past 2 years.

300 Posts

May 9th, 2016 08:00

IIQ disqualifies itself by only allowing (or recommending) 8 Clusters or 150 nodes max per instance.

If I need separate instances i do not unify my monitoring, thus enable my monitoring team to have the wrong consoles open.

31 Posts

May 9th, 2016 09:00

Threshold alerting is something I have been thinking of adding to the stats connector component of the Isilon SDK.

The stats connector in the SDK grabs stats off a cluster and feeds them to InfluxDB/Grafana today.   We will be enhancing it to be able to feed stats into more things (e.g. ELK, Graphite, etc.) - suggestions welcome please.    Or if someone wants to contribute by doing that themselves that would be even better.

Given that the stats connector in the SDK is already gathering stats and feeding them to something else it would not be hard to add logic to do some simple threshold checking on a subset of them.   Adding coalescing and hysteresis would be more work but a simple email alert would be a start.

Again, if someone wanted to contribute such functionality that would be great.

More details on the Isilon SDK can be found at:

https://community.emc.com/docs/DOC-48273

Cheers,
Chris

2 Intern

 • 

20.4K Posts

May 9th, 2016 09:00

as i said, different strokes for different folks. What does not work for you, works just fine in my environment.

No Events found!

Top