Start a Conversation

Unsolved

This post is more than 5 years old

S

2207

November 19th, 2015 07:00

control timings of tape alert generation

After switching to a new tape library that is somewhat slower in loading tapes, we get several "tape mount request" alerts.

It is the notification "Tape mount request 3" that's being triggered.

My support agent says this cannot be customized.

In this forum I found the hint to look at Save Mount Timeout of the devices, but this is set to a default of 30 minutes, and our library certainly does not require 30 minutes to load a specific tape.

Thanks in advance, Thomas

14.3K Posts

November 20th, 2015 05:00

save mount is something else.  You should check load times defines in timers tab of jukebox properties.

44 Posts

November 22nd, 2015 23:00

already raised those timers ...

Timers / Timers

   Load sleep 5 -> 60

   Unload sleep 5 -> 60

   Eject sleep 5 -> 60

   Idle Device Timeout 10 -> 0

Looking at the logs, I get the impression, that the alert generation occurs in the middle of nowhere, and appears to be not directly related to load and unload operations of the jukebox. I have the impression that the time limits related to alert generation are not well documented?

2.4K Posts

November 23rd, 2015 03:00

Certainly, a jukebox should be able to load a tape withing half an hour.

To me, this looks more like a (partial) misconfiguration which for example can happen if you do not hook up the right cable to the right drive(s).

In this case you the jukebox will load drive A (at least what he thinks is drive A) but NW will then monitor drive B to become ready. However, as B has not been loaded it will/can never become ready. The potential danger for that issue raises if you share the jukebox among multiple storage nodes.

How to test that ?

  - Load (without mount) a tape in your first drive.

  - Run "nsrmm" and verify the output which tape drive has really been loaded.

       If you could look at he jukebox to verify the drives would be optimal.

  - Unload the tape.

  - Repeat the sequence for each other tape drive.

  - If there is a mismatch twist the cables on the drives in question.

Worst case you nned to reconfigure your jukebox in the end.

44 Posts

November 23rd, 2015 04:00

I think this does probably not apply to our environment: we do have several LTO drives, but currently, only one is not enabled,

I guess Networker would not raise alerts for disabled tape drives.

14.3K Posts

November 23rd, 2015 05:00

One question - when you get tape mount request alert - is tape drive available?  Because normally this turns from warning to critical, but this might be valid if there are more requests to accommodate streams than available volumes/drives.  Since you see it with new lib, it might be that you had device setting different before and this could be simplest explanation.  It also would explain alert in the middle of nothing (there would be pending message, but no real impact since stream would be queued).

2.4K Posts

November 23rd, 2015 06:00

My issue is not pointing to disabled devices at all - it applies to enabled ones.

44 Posts

November 23rd, 2015 08:00

certainly, I understood, but here only a single

tape drive is enabled at all, there should be no

confusion about several ones.

44 Posts

November 24th, 2015 00:00

Hrvoje,

the situation with the new library is similar to the situation before: there is always on drive available.

We did some modifications to the device parallelism, but an increased parallelism should not lead to more tape alerts, I guess.

The logs say nothing about the cause of the tape alert. Is it possible to adjust the logging to learn more about that?

Thanks, Thomas

2.4K Posts

November 24th, 2015 03:00

Ah - your new statement " ... but here only a single tape drive is enabled at all, ..." clarifies the situation.

In your yesterday's note you mentioned " ... but currently, only one is not enabled,"

The tape drive parallelism only controls the number of data streams.

Unfortunately, there is no specific log file for the 2 library control daemons.

However, you can increase the debug level for a running process (nsrjb or nsrd) while it is running.

The appropriate command would be dbgcommand:

  - find the appropriate proccess id (PID)

  - select another debug level using       dbgcommand -p Debug=#

You will find the information in the daemon.raw file.

Do not forget to switch to debug level 0 later - the log file might become very big very fast.

14.3K Posts

November 24th, 2015 03:00

Actually it can lead to this... for example, if you have single or multiple devices and this/these device(s) are not part of the pool (which is perfectly fine) then if ts value (target sessions) is exceeded, NW will fire out request for additional volume for that pool.  If this request is still valid and pending after certain amount of time, it becomes critical alert and as it steps up so does it step up in terms of which notification is used.  If you look at what they do you will see:

ecn.JPG.jpg

So, your request for volume went to alert phase.  To get an idea of workflow exactly, one needs to check your logs and resources and from there it is easy.  You can open a ticket with support and have them run this from analysis engine and from there is it easy to see why and how to change it to get rid of it.

44 Posts

November 25th, 2015 00:00

Thanks for the details outlining the mechanisms.

Our issue is the following: the Media alerts go out to a IT team inbox, where the admin on duty needs to take action ... when there is need for action.

Thus, tape alerts that get resolved over night are nuisance alerts: the admin on duty will start NMC and have a look, only to see that everything is OK.

If I understand the explanations correctly, rising the value of target sessions could alleviate the issue, but this will probably have other implications, as more sessions are multiplexed to tape, potentially raising recovery duration.

I consider switching off the mail alert, and polling the NMC for open alerts with a monitoring tool like Nagios.

Would this be possible?

Regards, Thomas

2.4K Posts

November 26th, 2015 01:00

There is no such switch available in the NW Admin GUI.

Of course this is potentially possible via nsradmin. However, if you try, you will be noticed that this is a read-only attribute.

Maybe you should contact EMC support to check whether there is another method which could help.

14.3K Posts

November 27th, 2015 01:00

I don't know your setup, but you can also try to add device to pool.  NW requests volumes per pool and when it knows it has one device for that pool, then alert management is kicked off differently.  You standby folks can use nsrwatch instead and check pending message and device status in more quicker fashion.

No Events found!

Top