H730p Raid 6 Unexpected Sense, Unknown Sense Code

Question

We just built an array and I noticed during bench testing I am getting these

Controller ID: 0 Unexpected sense: PD = 00:1:11-Unknown Sense Code, CDB = 0x28 0x00 0x00 0x0b 0x56 0x70 0x00 0x00 0x10 0x00, Sense = 0xf0 0x00 0x0b 0x00 0x0b 0x56 0x7a 0x18 0x00 0x00 0x00 0x00 0x4b 0x04 0x00 0x00 0x00 0x00 0x00 0x00 0x0b 0x4b 0x04 0x00 0x00 0x1a 0xfc 0x04 0x00 0x73 0x00 0x00

I get a few during the bench test and when doing some test copies to the array and within the array I'm getting a fair amount of them. I copied like 150gb and got around 15-20 of them. I read that these are informational only and not indicative of a problem but I would like some confirmation if possible. It seems odd to be seeing this many.

Our sister station has a nearly identical setup as well, he was building his array and was getting similar messages every other second, the array also was only at 71 percent after a week of building so we assumed the drive was bad . We stopped the build and pulled the drive in slot 11 and rebuilt the array with no errors.

I'm thinking I may have a similiar issue with our bay 11 drive, these are refurbished systems, they were running for months without drives in the front bays, I'm wondering if they have dust on the backplane connection, I did have one report bad physical and reseated it with no further issues. I think I will pull the drive and reseat it, let it rebuild and see what happens.

Beyond that though, how many of these do you guys see or would expect to see? I see one every once and a long while on my home r510 system.

DELL-Erman O · Answer

Hello,

There may be a few different things regarding the problem you are experiencing. I tried to understand what's going on here https://dell.to/37CyfYt As you said, you can try to reseat the cables on the Backplane while doing the drive 11 reseat. Please let us know the result. If the array is working properly and you only got this error during testing it will probably be fine. However, I have to say that R510 servers are not compatible with PERC 9 series. You can see it here. https://dell.to/37CDLKH It may still be working, but the warning may be due to this compatibility issue. If the drive has no error, it would be good to keep the controller's firmware up to date. If there is no hardware error, there may be a warning fixed with the current FW.

Let us know if this helps!

pcorwin85 · Answer

Im not using the r510 with the that raid controller, I was just referencing that I see them rarely on that server. I reseated the drive and let it rebuild, I ran tests and it was clean but I was concerned there was something else still problematic. I shut the system down and reseated all the drives a few times just in case there was any other possible connection issues. I powered it back up and reran a copy test and started seeing the sense errors again. I got approximately 32 of them over the course of a 132gb 7zip archive creation on the volume.

I will reseat the backplane cables and retest.

Dell-DylanJ · Answer

What drives are installed in the system? In this case, specifically drive bay 11. The times I've seen unknown sense keys was in 2 instances, either 1) the hard drives needed firmware updates, or 2) the drives were unsupported. Drives being unsupported doesn't mean they won't work, but does mean that you may see unexpected behaviors, sometimes. Updating the PERC firmware, like Erman had mentioned, would also be a good idea.

Is the unknown sense code the only thing you're seeing, or are you seeing any other storage issues? If you'd like to export a PERC log and make it available to download, I'd be happy to give it a look.

pcorwin85 · Answer

So far it's only the drive in bay 11(12) that is reporting these, I updated the firmware and reseated the cables on the backplane, the first test set came back clean, I restarted it and ran atto disk bench again got a couple sense codes but only 2 this time. I ran a 140gb single file copy on the array and got 8. These drives do come up as not certified. We are running the ST8000NM0095 drives https://www.newegg.com/seagate-enterprise-capacity-3-5-st8000nm0095-8tb/p/N82E16822178845?Item=9SIAD5GBY78376

I did check and there are no firmware updates for these drives and they are relatively new.

So far the only thing I am seeing is the unknown sense code, the drives appear to be functioning normally, A bench test reveals what I would assume are normal speeds for an array of this size and drive type.

I will go into the bios and export the log and report back in a few minutes

I have 16 drives, 12 in main bays and 4 in the midplane. I am going to swap the drive as well with another tested drive.

pcorwin85 · Answer

Okay It's rebuilding the drive and I was running a test and I'm seeing them again while its rebuilding?

Same drive, it shouldnt be being used during the test... could there be a problem with the backplane, cable or controller? Does the log tell you anything ?

pcorwin85 · Answer

I was unable to find a place to attach the text log so I put it on dropbox, here is the link to view.

https://www.dropbox.com/s/d4j9mlafiyaiije/RaidLog12212020SenseErrorDriveBay12%2811%29ttyLog.txt?dl=0

I noticed right off the bat the sense code locations appear to show an error

12/20/20 10:03:48: C0:IMMED_CTIO err; cmdId=40, pd=b, iob=c00f9c00, iocStatus=45, scsiStatus=2
12/20/20 10:03:48: C0:EVT#131909-12/20/20 10:03:48: 113=Unexpected sense: PD 0b(e0x20/s11) Path 5000c50094d8b515, CDB: 28 00 00 00 5a d4 00 00 04 00, Sense: b/4b/04
12/20/20 10:03:48: C0:Raw Sense for PD b: f0 00 0b 00 00 5a d7 18 00 00 00 00 4b 04 00 00 00 00 00 00 0b 4b 04 00 00 1a 93 02 01 86 00 00

I have replaced the drive with a spare and will continue more testing when it finishes rebuilding. I appreciate your help and let me know what you think.

Pat

pcorwin85 · Answer

Ironically because I've been burned in the past I ordered a used backplane and cabling from ebay shortly after I got the drives in stock. I will swap them out next week, the whole kit and kaboodle. The strange thing is our sister station has an almost identical setup with the same potential issue, his however was drastically worse. When the controller was building the raid... after a week we checked in on it and it was only at 71 percent. It was a fresh testing install so i put megaraid on it and it was reporting that sense error every other second during the build so I'm inclined to believe you. These systems are both refurbs and were running for months before getting drives, he also manipulated his cables quite a bit as did I while installing the midplane we both purchased. He even had controller resets I can only imagine from the number of events reported throwing it offline. I doubt its faulty as it hadnt had a single issue until then.

I appreciate your help in this matter I will get to testing,

Dell-DylanJ · Answer

B

4B

00

Aborted Command - data phase error

Is what I'm finding in the logs. This error would generally indicate an error in signal from the controller to a drive, disk 11 in this case. This could theoretically be anything from the connector on the controller, the cabling to the backplane, the backplane itself, or the drive. If you haven't already, I'd reseat the whole storage chain. If you have, then you might look at ordering replacement cables as a next step. They should be much less expensive than jumping straight to a backplane, drive, or controller. I'm not seeing anything in the log to indicate any other problems, so I would suspect that either cabling or backplane is the most likely fault here.

EDIT: I saw your second reply after I posted. I'd be most inclined to look at the cabling. The backplane and the controller both have more ways of communicating problems, but they aren't present in the logs. I still wouldn't rule them out as possibilities, but I do believe that cabling would be the best place to start.

pcorwin85 · Answer

Welp, I reseated the controller side of the cable, pain in the butt connector btw, and so far Ive run the bench test around 10 times and copied around 3 TB of data drive to drive and haven't seen the sense pop up once.

I did however get two random overcurrent warnings from the front usb port which is odd considering there is nothing connected. I got one when I pulled a usb thumb drive from the port when exporting the logs last week. I did notice the system did start and is running the battery relearn process before I got the warnings.

Is it possible I just bumped the usb/communications cable while working on the sas cable. I read recommended is to power down/disconnect power and drain the remaining power before powering back up. I saw some recommend an NVRAM clear.

My thought was to reseat the usb/coms cable first and go from there, maybe add on the drain procedure. I had a desktop that would have weird usb errors from time to time and discharging the board power would clear it.

pcorwin85 · Answer

Well it appeared to be fixed, Multiple reboots and large volume data copies, I had done around 10TB of copying and multiple bench tests with atto.

I shut it down and disconnected power and drained it by pressing the power button for 30.

Restarted it and I'm getting just as many if not more as before. I had re-seated the cable and it was clean for all that and now it's there again. I do have a new cable on the way, I have a used back plane I can test with if that doesn't fix it. I find it interesting that a cable reseat fixed it for all that and then it went back to dropping. If it is the cable It may make sense as it settled back into its old position. All I can do I guess is put the new cable in and keep working on it till I don't get anymore.

DELL-Young E · Answer

Hi, we will need to take a look at your
TTY Logs (the RAID controller log). There are many things that causes the same issue.
Could you please refer to this article below for how to?
https://dell.to/3nVdFrR

pcorwin85 · Answer

For the record it's in the logs but its running 16x8tb seagate sas drives for the main array, 4 in midplane 12 in front bay, 2 ssds in the rear flexbay for the operating system.

pcorwin85 · Answer

I posted the log earlier in the messages, I keep trying to reply but my messages aren't showing UP??

Only difference Ive seen in the sense messages is showing physical device as PD 0:1:11 and now its PD = -:-:11

pcorwin85 · Answer

The removal was when I replaced the drive with another one. I'm starting to think the backplane is bad I'm not positive. The odd thing is we have another business with an almost identical setup, same server model, same hard drives, only difference is he lacks the midplane and additional 4 drives. His server actually had so many errors the raid array was building a week later. We weren't aware as the system had a fresh install and the msm utility wasnt installed yet. I installed it only after we noticed how long the array was taking to build, we were just initially preparing for testing.

I may get a handful, he got thousands of the sense errors. It's only that bay too, bay 11(12) I guess its possible the backplane may be bad on both systems. I do have a used one I picked up a while back I can swap out, I was waiting on a replacement perc to bp cable. I guess I'll just replace both in one fell swoop and see what happens.

pcorwin85 · Answer

And yes I know they weren't certified drives, were a small business and are forced to get refurbished systems and can't afford the insane prices the dell branded and certified drives usually run.

PowerEdge HDD/SCSI/RAID

H730p Raid 6 Unexpected Sense, Unknown Sense Code

Was this post helpful?