Start a Conversation

Unsolved

This post is more than 5 years old

3030

December 5th, 2017 09:00

TL-2000 multiple drive failures

I have a TL-2000 with dual LTO5 iSCSI drives. In June we began getting CRC errors on jobs written to Drive 2. Cleaning the drive did not help. We disabled the drive, forcing all jobs to use Drive 1. Recently, we decided to replace Drive 2, since we have a lot of jobs we want to run at the end of the year. We purchased a used LTO5 drive and still got CRC errors. We assumed the replacement drive was bad. Last week we started getting CRC errors to Drive 1. I find it very hard to believe that all three drives are bad.

I've performed diagnostics and found no problems. Cannot seem to perform advanced diagnostics on the library using the administrator login. We have a dedicated cleaning slot (23) and the cleaning tape has only seen 27 cycles. The drives do not report needing cleaned.

Is there something else that could cause this behavior other than bad drives?

December 6th, 2017 00:00

Hi

more than likely the media your writing to it creating the errors unless you have replaced them

Moderator

 • 

6.9K Posts

December 6th, 2017 07:00

Hello Clark_W_Griswold,

How old is your media? Normally when you are getting CRC errors it is due to media. If there is an error that happens during the backup on the data then the error can get written to the media. Also, if the media is older than 3years or has had more than 200 full backups on them then I would try using new media and see if the issue persist.

Please let us know if you have any other questions.

December 6th, 2017 08:00

I meant to put those details in my original post. I've had at least 6 tapes report as bad between the two drives. My media is from three different batches, two different brands (Dell and HP). None of the tapes are very old, and most have only a few loads. The "worst" tape of the bunch only had 65 media loads written. The "best" got CRC errors on the very first load.

I think it's safe to rule out bad tapes as the problem.

Moderator

 • 

6.9K Posts

December 11th, 2017 08:00

Hello Clark_W_Griswold,

Can I get you to pull a drive dump log from your drives so that I can review them to see what is going on? You can do this in the web utility by going to Service Library Menu & then Save Service Dump. Then you will need to select the drive. I will send you a email that you can reply back to with the logs.

Please let us know if you have any other questions.

December 11th, 2017 08:00

the only 2 things left would be cabling and the iscsi bridge

December 11th, 2017 08:00

I'm happy to send along logs. I pulled them before I even posted this thread.

I tried another replacement drive over the weekend and it's giving CRC errors too. There's just no way that four different drives are all having the same problem.

December 11th, 2017 08:00

I think it's fairly safe to rule out cabling, since the problem started suddenly and both drives are showing problems. The chances of sudden problems with two different SAS cables is slim.

If the iSCSI bridge is causing the problem, I'm assuming there's no real way to fix it without replacing the chassis?

I guess it's also possible that the host system's SAS card is the problem. I could get a replacement and try that, but hate to just throw parts at the problem if the logs will give more info.

December 11th, 2017 09:00

i agree cabling is unlikely, more often than not its the drive or tapes.

the iscsi does come out so you may be able to get hold of one. all the units i see never have iscsi bridges, but everyone on here seem to use them.

Moderator

 • 

6.9K Posts

December 11th, 2017 13:00

Hello Clark_W_Griswold,

Thanks for the logs as it helps to look into your issue. What I am seeing is Medium Excessive Wrt Recovery errors & Number of Dataset CQ. That is why you are getting CRC errors. I can see that this is happening on a number of different media as well. Also I notice that there are some media that show on both drives. Since you have swapped drives and getting the same result I can say that it is the media that is causing the CRC errors. When getting both errors on a drive, that normally is the media is too abrasive & damaging the read & write heads of the drive. Now it can also mean a servo issue in the drive as well. However I am not seeing any servo errors in the log so that is why I tend to say it is media over your drive. I will send you an email with the logs as to where I am seeing these errors.

Please let us know if you have any other questions.

December 11th, 2017 14:00

Thanks for looking at the logs, Sam. However, I'm confused as to how this could be a problem with the media. I've had errors on multiple different media types, including Dell's own LTO5 tapes. Most of the tapes that are showing errors have very few loads on them. My latest failed tape (HP brand Ultrium LTO) only has 4 media loads. On the other hand, I have other HP tapes from the same batch with 30+ loads and no errors. Likewise, I have some Dell tapes with 50+ media loads, and others failing with fewer. My very first failed tape was a Dell tape. Unfortunately, I can't tell how many media loads were on the first failed tapes, since that info doesn't seem to be retained by the library if the media is removed and added back later.

Is there anything else that could be the culprit here? I've purchased new media twice since this problem started and continue to get the same results. I could understand if I was using garbage tapes, but these are name brand tapes, including your own company's.

Moderator

 • 

6.9K Posts

December 12th, 2017 10:00

Hello Clark_W_Griswold,

When seeing CRC errors it comes down to tape drive or media issue. When I was looking at the logs, I see a lot of write retries and lost seek position errors. When you see those errors, that is normally due to the read & write heads losing position on the media. When you see it happen on multiple media, then you would lean towards the drive needing to be replaced as the heads or the control arm is wearing out.

Now since you stated that you had replaced the drive already & were still getting the same errors that is what is leading me to think that it might be the media that is causing the issue.

What you can do is to check and see if you are getting any errors or warnings for your sas HBA. If you are getting any event id 7,11, or 129 then there could be an issue with the sas card. In a few case I have seen this caused by the sas card but in most cases it is the drive or media.

Please let us know if you have any other questions.

No Events found!

Top