snmdla-tm
2 Iron

control timings of tape alert generation

After switching to a new tape library that is somewhat slower in loading tapes, we get several "tape mount request" alerts.

It is the notification "Tape mount request 3" that's being triggered.

My support agent says this cannot be customized.

In this forum I found the hint to look at Save Mount Timeout of the devices, but this is set to a default of 30 minutes, and our library certainly does not require 30 minutes to load a specific tape.

Thanks in advance, Thomas

0 Kudos
13 Replies
Highlighted
ble1
6 Indium

Re: control timings of tape alert generation

save mount is something else.  You should check load times defines in timers tab of jukebox properties.

snmdla-tm
2 Iron

Re: control timings of tape alert generation

already raised those timers ...

Timers / Timers

   Load sleep 5 -> 60

   Unload sleep 5 -> 60

   Eject sleep 5 -> 60

   Idle Device Timeout 10 -> 0

Looking at the logs, I get the impression, that the alert generation occurs in the middle of nowhere, and appears to be not directly related to load and unload operations of the jukebox. I have the impression that the time limits related to alert generation are not well documented?

0 Kudos
bingo.1
4 Germanium

Re: control timings of tape alert generation

Certainly, a jukebox should be able to load a tape withing half an hour.

To me, this looks more like a (partial) misconfiguration which for example can happen if you do not hook up the right cable to the right drive(s).

In this case you the jukebox will load drive A (at least what he thinks is drive A) but NW will then monitor drive B to become ready. However, as B has not been loaded it will/can never become ready. The potential danger for that issue raises if you share the jukebox among multiple storage nodes.

How to test that ?

  - Load (without mount) a tape in your first drive.

  - Run "nsrmm" and verify the output which tape drive has really been loaded.

       If you could look at he jukebox to verify the drives would be optimal.

  - Unload the tape.

  - Repeat the sequence for each other tape drive.

  - If there is a mismatch twist the cables on the drives in question.

Worst case you nned to reconfigure your jukebox in the end.

0 Kudos
snmdla-tm
2 Iron

Re: control timings of tape alert generation

I think this does probably not apply to our environment: we do have several LTO drives, but currently, only one is not enabled,

I guess Networker would not raise alerts for disabled tape drives.

0 Kudos
ble1
6 Indium

Re: control timings of tape alert generation

One question - when you get tape mount request alert - is tape drive available?  Because normally this turns from warning to critical, but this might be valid if there are more requests to accommodate streams than available volumes/drives.  Since you see it with new lib, it might be that you had device setting different before and this could be simplest explanation.  It also would explain alert in the middle of nothing (there would be pending message, but no real impact since stream would be queued).

0 Kudos
bingo.1
4 Germanium

Re: control timings of tape alert generation

My issue is not pointing to disabled devices at all - it applies to enabled ones.

0 Kudos
snmdla-tm
2 Iron

Re: control timings of tape alert generation

certainly, I understood, but here only a single

tape drive is enabled at all, there should be no

confusion about several ones.

0 Kudos
snmdla-tm
2 Iron

Re: Re: control timings of tape alert generation

Hrvoje,

the situation with the new library is similar to the situation before: there is always on drive available.

We did some modifications to the device parallelism, but an increased parallelism should not lead to more tape alerts, I guess.

The logs say nothing about the cause of the tape alert. Is it possible to adjust the logging to learn more about that?

Thanks, Thomas

0 Kudos
ble1
6 Indium

Re: Re: control timings of tape alert generation

Actually it can lead to this... for example, if you have single or multiple devices and this/these device(s) are not part of the pool (which is perfectly fine) then if ts value (target sessions) is exceeded, NW will fire out request for additional volume for that pool.  If this request is still valid and pending after certain amount of time, it becomes critical alert and as it steps up so does it step up in terms of which notification is used.  If you look at what they do you will see:

ecn.JPG.jpg

So, your request for volume went to alert phase.  To get an idea of workflow exactly, one needs to check your logs and resources and from there it is easy.  You can open a ticket with support and have them run this from analysis engine and from there is it easy to see why and how to change it to get rid of it.