Start a Conversation

Unsolved

This post is more than 5 years old

4381

October 21st, 2015 05:00

How to stop "cached" Isilon EMAIL alerts during and after upgrade?

So the past couple of times I have upgraded Isilon clusters, I have shut off the RMR relay (just removed it from the SNMP alerts) which stopped Isilon from flooding our mailboxes (and more importantly, flooding the mailboxes of our management team).  That works great, until you add the relay host back in.  Even when I kill all the alerts, it appears Isilon backlogs all the emails so when I add the relay back, BOOM -- a flood of emails.

The emails are necessary if it was a PRODUCTION issue, but isn't there a maintenance mode or something when upgrading Isilon that says "Oh wait, someone is upgrading my software -- I shouldn't create any alerts!" ??


And before you ask, yes, I checked the community and couldn't find anything in writing that says how we do that. I have another upgrade this Saturday and next Saturday and I really want to know how I can kill these alerts and NOT have any emails generated during and AFTER the upgrades.

Cheers!

October 21st, 2015 07:00

We did that however, when we turned it back on their mailboxes got flooded with Isilon alerts from the upgrade.  Do we need to cancel the alerts before we turn snmpd back on?

I'm not sure if it's the native alerts but in this case, both snmpd and rmr relay were removed during the upgrade.  I canceled all the alerts and connected back to the RMR relay and BLAM!  Ton of alerts flooding mailboxes.


Gotta get this fixed before the next upgrade.

1.2K Posts

October 21st, 2015 07:00

isi services snmpd disable/enable

Or are OneFS "native" alerts (non SNMP) also an issue?

-- Peter

1.2K Posts

October 21st, 2015 09:00

"Native" means notifications directly sent by OneFS via SMTP (Simple Mail Transport Protocol).

One can mess around with these, too...

Can you confirm that you are using SNMP alerts aka "traps" (SNMP = Simple Network Management Protocol)?

October 21st, 2015 09:00

Yes, SNMP traps. That is why I mentioned the RMR relay address.  When I remove the RMR address, no alerts.  When I add the RMR address back, even though I have suppressed all the alerts, they just seem to fire away from cache.


What we need -- and maybe this is a Service Enhancement request -- is to be able to put Isilon in "maintenance mode" so no alerts are generated during an upgrade.  Even the GUI shows a user that the cluster is being upgraded, why on earth would it send alerts related reboots?  SILLY!

1.2K Posts

October 21st, 2015 10:00

OK, try it out with a virtual nodes: quiet  & cancel all events, then re-enable snmpd, the re-add the RMR.

BTW, any  chance that your RMR tool can told to suppress traps from one device for while...?

Finally, yes talk to your account team about an enhancement request...  you might find out something interesting

-- Peter

October 21st, 2015 11:00

I don't have a virtual infrastructure available to test on just yet.  I tried removing the RMR relay address which worked.  Then I cancelled all alerts, then added the RMR relay address back in and still got flooded with alerts.  I can try to disable snmpd this time as well (I don't recall doing so last time).  Definitely an EMC support thing and enhancement request!

1.2K Posts

October 22nd, 2015 01:00

The free VMware Player will do it, even on a laptop.

And keep in mind, cancelling and quieting OneFS events are two distinct actions, to be taken in that order.

-- Peter

October 22nd, 2015 04:00

I'm fairly certain I hit "cancel all events" but there's always a chance I just quieted them.  I'll double check that this weekend since I have ANOTHER upgrade......(and another one the week after).

205 Posts

October 22nd, 2015 04:00

Here's a script to clear the celog and stop notifications. It'll reaaaaaally clear the celog, though!

#!/bin/bash

isi services -a celog_coalescer disable

isi services -a celog_monitor disable

isi services -a celog_notification disable

sleep 120

isi_for_array killall isi_mcp

isi_for_array pkill isi_celog_

sleep 60

isi_for_array rm -rf /var/db/celog/*

isi_for_array rm -rf /var/db/celog_master/*

rm -rf /ifs/.ifsvar/db/celog/*

isi_for_array isi_mcp

sleep 30

isi services -a celog_coalescer enable

sleep 30

isi services -a celog_monitor enable

sleep 30

isi services -a celog_notification enable

sleep 30

isi services -a celog_coalescer enable

isi services -a celog_monitor enable

isi services -a celog_notification enable

This was compiled from a list of commands given to me by support when I was complaining about hte same thing.

On this last upgrade to 7.2.1, I had to run it about 5 times before the alerts stopped. Yay.

205 Posts

October 22nd, 2015 08:00

Cancel all events never works for me to stop the torrent of emails, especially after an upgrade.

October 26th, 2015 05:00

Actually cancelling all events worked this weekend.  Of course, that clears out your local history BUT -- on the plus side, we didn't get bombarded with email alerts

5 Practitioner

 • 

274.2K Posts

October 27th, 2015 11:00

OneFS doesn't have a maintenance mode today but in a future OneFS release it will. For exactly this purpose. The way it will work is you will be able to specify a time period for the maintenance mode. During that time OneFS will store all events that occur. After the maintenance mode is over, if there are any unresolved events it will alert (email) on those unresolved events only. We are doing this so we make sure to alert if a disk fails during maintenance mode (for instance) we want to make sure to send the right alert out if the disk isn't fixed by the time the maintenance mode is over.

You can use this maintenance period for upgrades, planned smartfails, hardware moves, etc. basically whenever you know the system will be experiencing "issues" and you don't need EMC support to try to help you with them.


Stay tuned!

1 Message

April 13th, 2021 11:00

It looks like this arrived with OneFS 8
https://www.dell.com/support/kbdoc/en-us/000022784/onefs-8-0-how-to-place-the-isilon-cluster-into-maintenance-mode?lang=en

Example -
isi event settings modify --maintenance-start 2017-02-23T22:00:00 --maintenance-duration 2H

No Events found!

Top