Device error

Question

Tape error I found the following information in my daemon log when I try to clone. ¿04/11/06 10:25:24 nsrd: media critical: TapeAlert flag 'Media' has been posted for /dev/nst0 04/11/06 10:25:24 nsrd: media critical: TapeAlert flag 'Clean now' has been posted for /dev/nst0 04/11/06 10:25:24 nsrd: media warning: TapeAlert flag 'Hard error' has been posted for /dev/nst0 04/11/06 10:25:24 nsrd: media warning: TapeAlert flag 'Diagnostics required' has been posted for /dev/nst0 04/11/06 10:25:24 nsrd: media warning: /dev/nst0 writing: Input/output error, at file 5 record 2 04/11/06 10:25:24 nsrd: media notice: LTO Ultrium-2 tape 000066L2 on /dev/nst0 is full 04/11/06 10:25:24 nsrd: media notice: LTO Ultrium-2 tape 000066L2 used 54 MB of 190 GB capacity 04/11/06 10:25:29 nsrd: media critical: TapeAlert flag 'Media' has been cleared for /dev/nst0 04/11/06 10:25:29 nsrd: media warning: /dev/nst0 reading: Success 04/11/06 10:25:29 nsrd: media warning: /dev/nst0 reading: Success 04/11/06 10:25:29 nsrd: media warning: /dev/nst0 reading: Success 04/11/06 10:25:29 nsrd: media warning: /dev/nst0 reading: Success 04/11/06 10:25:29 nsrd: media warning: /dev/nst0 reading: Success 04/11/06 10:25:29 nsrd: media warning: /dev/nst0 reading: Success pools supported: CT1Month; 04/11/06 10:25:35 nsrd: media warning: /dev/nst0 reading: Success 04/11/06 10:25:35 nsrd: media warning: /dev/nst0 reading: Success 04/11/06 10:25:35 nsrd: media warning: /dev/nst0 reading: Success 04/11/06 10:25:35 nsrd: media warning: /dev/nst0 reading: Success 04/11/06 10:25:35 nsrd: media warning: /dev/nst0 reading: Success 04/11/06 10:25:35 nsrd: media warning: /dev/nst0 reading: Success 04/11/06 10:25:35 nsrd: media notice: Volume '000066L2' on device '/dev/nst0': Cannot decode block. Verify the device configuration. Tape positioning by recor d is disabled. 04/11/06 10:25:35 nsrd: media warning: verification of volume '000066L2', volid 4013644058 failed, can not read record 0 of file 5 on LTO Ultrium-2 tape 00006 6L2 04/11/06 10:25:35 nsrd: media notice: verification of volume '000066L2', volid 4013644058 failed, volume is being marked as full. 04/11/06 10:25:35 nsrd: write completion notice: Writing to volume 000066L2 complete 04/11/06 10:25:35 nsrd: rpblegato1srv.rompetrol.org:cloning session done saving to pool 'CT1Month' (000066L2) 04/11/06 10:25:35 nsrd: media notice: Cloning of save sets to volume 000066L2 on /dev/nst0 is being terminated because: Media verification failed 04/11/06 10:25:41 nsrd: media emergency: RPC send operation failed. A network connection could not be established with the host. 04/11/06 10:25:57 nsrd: media emergency: RPC send operation failed. A network connection could not be established with the host.¿ Also I found in the dmesg following: ¿st0: Error with sense data: Info fld=0x10000, Current st09:00: sense key Medium Error Additional sense indicates Track following error st0: Error with sense data: Info fld=0x1, Current st09:00: sense key Medium Error Additional sense indicates Track following error st0: Error on write filemark. st1: Failed to read 65536 byte block with 32768 byte read.¿ I suspect that volume 000066L2 (volume is new) is broken but I¿m not 100% sure. Also I suspect a SCSI bus reset.

razvan2 · Answer

I think is media, because I tray again with another cartridge and is working.

ble1 · Answer

Both could be correct (bad media and SCSI reset even above looks just like typical SCSI reset issue). It could be also dirty drive, but this is less likely.

ble1 · Answer

Still it could be SCSI reset and perhaps drive was cleaned automatically in the mean time. Unless some important and valid data is on tape I would suggest to relabel tape used during the error and test it in another drive (run some backup against that tape).

tech88kur · Answer

Hi,I have an ongoing issue where the tapes that are being marked prematurely full. This is recurring and not restricted to any set of savesets or pools. Daemon logs generally show it as 'tape marked full' and nothing else, I have not found any reason in the logs for most of the times. I am using nsr server 8.0.2.4, windows 2008 R2, the library is attached via SCSI to nsr server. The drives are clean, no error or notifications received in the tape library GUI console. I got some of the tapes checked by the vendor and they reported that there is no issue with the tapes, all the tested tapes were found in good health.What else reamins to be checked, both tape library and tapes have been checked by the respective vendors and nothing wrong was found. What can be attributed for this premature marking of tapes as full.The last time it happened, I saw a couple of lines in the logs but could not search anything conclusive on-line.In some cases I recycle the tapes again and then sometimes clones finish, the other times they become premature full.Regardstech88kur

bingo.1 · Answer

There could be a lot of issues but where to start ...

- Especially look at the cabling and termination

- Make sure that you have the latest drivers (SCSI, tape drive) installed

- Verify that the correct media type has been selected

Try to isolate the problem:

- Try to find out whether this issue only occures when the server is really busy.

- Switch the drive's CDI characteristics (default: SCSI commands) to 'not used'.

- Remove the drive from the jukebox configuration and check that the drive by itself runs fine.

Either use NetWorker routines like tapeexer.exe

If you are in doubt use another backup SW to test.

- If the issue can be repeated, add verbosity to the command or let it run in bedug mode.

tech88kur · Answer

thanks bingo for writing

There could be a lot of issues but where to start ...

- Especially look at the cabling and termination

- Make sure that you have the latest drivers (SCSI, tape drive) installed

- Verify that the correct media type has been selected

cabling was alright
the drives and firmwares are the latest one
Correct media type was selected

Try to isolate the problem:

- Try to find out whether this issue only occures when the server is really busy.

- Switch the drive's CDI characteristics (default: SCSI commands) to 'not used'.

- Remove the drive from the jukebox configuration and check that the drive by itself runs fine.

Either use NetWorker routines like tapeexer.exe

If you are in doubt use another backup SW to test.

- If the issue can be repeated, add verbosity to the command or let it run in bedug mode.

the issues happens at any time, with no association found with server busy-ness
did not check the 2nd. 3rd and 4th one. Instead as a matter of hit and trial we thought of deleting the recyclable volumes before re-using them and that has really helped. We found that the only recyclable tapes were getting prematurely full, but if deleted their catalog entries and re-labeled them then they worked absolutely fine. In last seven days we deleted and re-labeled 7 LTO5 tapes, all of them accepted data till their maximum size, we used a recyclable tape and it got prematurely full.

This is unexpected but I have taken that as a workaround.

Regards

tech88kur

bingo.1 · Answer

Thanks for sharing the information.

This sounds absolute strange. We also use LTO5s and never experienced such problems. Unfortunately we never used NW 8.0.2.

The weird thing is that it sounds like NW would somehow read the data block before overwriting it. But this is technically impossible. The only situation where NW reads the tape during a write is when he positions to the logical end-of-tape (LEOL) to append new data.

I do not deny what you have experienced - i just cannot explain how this could happen.

Also, i doubt that you use SCSI as i think there is no LTO5 drive available with SCSI at all - I think yours has a SAS interface (which might need other drivers) . But of course i do not know all brands.

The other issue could be that you have not set the correct block size and/or that something limits the block size. This is especially true for drives with SCSI/SAS interfaces. Use scanner and verify that as follows:

C:\>scanner -m \\.\Tape0
8909:scanner: using '\\.\Tape0' as the device name
93507:scanner: Cannot set \\.\Tape0 block size to 262144 bytes. Maximum size configurable is 65536 bytes.
Please modify your SCSI configuration to allow transfers of at least 262144 bytes(search for the MaximumSGList registry parameter)
93504:scanner: Cannot set \\.\Tape0 block size: 262144 bytes is outside the range 32768-65536
8936:scanner: scanning LTO Ultrium-3 tape Test.001 on \\.\Tape0
32350:scanner: adding LTO Ultrium-3 tape Test.001 to pool Default
93504:scanner: Cannot set \\.\Tape0 block size: 262144 bytes is outside the range 32768-65536
8770:scanner: fn 2 rn 0 read error The handle is invalid.

93504:scanner: Cannot set \\.\Tape0 block size: 262144 bytes is outside the range 32768-65536
8761:scanner: done with LTO Ultrium-3 tape Test.001

84255:scanner: tape_rewind rewind failed: The handle is invalid.
(6)

C:\>

Such could occur if you upgraded your hardware and only installed new tape drives along with their drivers.

On the other hand this should not be a general problem - using smaller blocks you just cannot achieve the max. capacity.

However, in case of a hardware upgrade I would never use the old tapes for new backups - even if LTO5 drives support writing to LTO4 tapes.

tech88kur · Answer

bingo,Just when I thought deleting the catalog entries was a workaround I got a tape which got premature full even in this case.I noticed that the clone session was in long waiting session before I got the below. The logs are not very conclusive1) I disabled CDI and changed it to none2) tried scanner -m, please find beloe outputI not good at using scanner. I still have to check if the options can be changed to get a result similiar to what you showed.I am also unknown to networker routines like tapeexr.exe. I am yet to read about them. Meanwhile I thought of putting this here in case you get something in. Please remember that the 1) issue is only with recyclable tapes 2) drives and firmwares have proper updates.Edit :I also noticed that when the tapes get premature full the corresponding saveset shows as aborted and the saveset does not copmplete even if we insert a new volume for the pool, other savesets do clone, also this happens for saveset which are greater than 50 GbRegardstech88kur

bingo.1 · Answer

The information I showed you has been created with a current NW Version (8.1.1.4). To make it appear in 8.0.2.4 you could try to add verbosity (scanner -mv ... or scanner -mvv ...). AFAIR the details have been added around NW 7.5.x.

The other behavior you see is correct:

If you abort a save set it has to be restarted completely. If you backup to tape, a new SSID will be assigned so the new backup can use the same media.

However, if this happens for a clone, NW will use another tape as the same SSID is used already for the previous tape (although it is incomplete). One of NW's core rule is that there can only be 1 entry for a save set on any media. And the clone will be restarted from the beginning of the SS.

Before you recycle a tape, go ahead and delete the volume from the media db (nsrmm -s volume_name). I remember another thread this week where the user stated that this helped. Athough I still cannot explain why, you may want to try that.

ble1 · Answer

Did you check error counts on switch ports?

tech88kur · Answer

Yes, I am deleting the tape before re-using it (though I am not using nsrmm but the gui delete option -"i delete both media db and file index entry, if any") and it seemed to work good but not any more now. The biggest problem here is that of keeping track of aborted saveset or of the savesets which are not cloned which has to be done either by pulling mminfo output in a csv file or by using gui to check aborted saveset (it is not feasible to do this daily) on tape + the excess numbers of tapes we are using, this is making it very difficult.

Problem with me here is that I do not know what is erroneous - tapes, library, NW, since I have got the tapes analyzed ( no error was reported in multiple reads and writes on tapes, they further came back saying to check our backup software), no error on Library notification bar/ history, no conclusive NW logs.

Regards

tech88kur

bingo.1 · Answer

You better get used to run mminfo from the command line which gives you much more possibilities to use a more precise query and to get the better report. Due to the huge number of options it seems to be complicated but once you learned the general usage mminfo from the CLI is really helpful:

- You know exactly what you do (see below)

- If you run a manipulation in the next step you can easily verify the result (rerun the command as it is still stored in the buffer).

- You do not need to switch windows.

- You can use the command for scripting, if necessary.

For example you could check for aborted/incomplete save set or the number of copies/validcopies.

-------------

Looking closer through your scanner output i am really confused. Look at the SSIDs - it says (more or less):

scanning ssid 3514822901

No media in drive

scanning ssid 3565154534

So why should NW switch save sets in the middle of a process ? - This almost looks to me as if the tape has been unloaded during the process and it has been replaced by another one.

Is this possible? - at least the unload is possible at any time, for example by a SCSI bus reset.

However, this is also possible due to NetWorker! - It will occur if ...

- The jukebox' 'Idle device timeout' is not set to 0 (the default value)

- AND the tape has been mounted before scanning. You must load but not mount the tape.

You can achieve that from either the jukebox GUI ...

or from the command line: "nsrjb -ln ..."

-------------------

For the next steps:

- run scanner only if no other jobs will use the jukebox. Use a standalone drive, if possible.

- use "scanner -n -mvv \\.\Tape0". This will not update the info in the media index and it will provide more info and hopefully tell you a bit more when the problem re-occurs.

- If it does, verify to which volumes the SSIDs belong using 'mminfo -q "ssid= - r "ssid,volume" '. If they are different, the tape must have been swapped.

- If so, on the next try, you can hopefully look inside the jukebox (most likely on large ones) and verify that this is really the case.

NetWorker

Was this post helpful?