Start a Conversation

Unsolved

G

14 Posts

3197

October 11th, 2021 14:00

R7515 with HBA330: RHEL 8.4 mpt3sas errors and device power-on/reset

Hi,

I have a new R7515 with an HBA330 controller connected to 4x SSDs. The OS is RHEL 8.4.

Under "heavy" load, where "heavy" is just cp -a /some/big/directory /some/other/place, I see many messages like these in the system log:

mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)

This repeats a few times, and is then followed by

sd 0:0:0:0: Power-on or device reset occurred

This message is seen for all 4 drives.

0x31110e03 is by far the most common error code (about 30000 instances so far this month), which this utility says is PL_LOGINFO_SUB_CODE_DISCOVERY_SATA_ERR.

Apart from these log messages, everything seems to be functioning ok, but I would be very grateful for any insight anyone has... Thanks!

 

Moderator

 • 

3.6K Posts

October 11th, 2021 20:00

Hi, thanks for choosing Dell. May we have parts number on all 4x SSDs?

14 Posts

October 11th, 2021 20:00

Hi,

2 drives are 480GB S4610 drives, Dell part number 400-BDSD. The other 2 drives are 4TB Samsung 870 EVOs.

 

Moderator

 • 

3.6K Posts

October 12th, 2021 00:00

Hi, since we learned that you are using non Dell parts as well, we'd have to ask to trying with Dell parts only and see if you are still facing the same issue. (try only one or try both Dell parts.) Regardless, I'd also like to point out you may have put some heavy workload on as your system is claiming. 

14 Posts

October 12th, 2021 13:00

Hi,

I removed the 2 non-Dell drives from the system and it made no difference, the issue still occurs just the same. Reseating all drives also made no difference, nor did applying all pending firmware updates (these were for unrelated aspects like the NIC and iDRAC). Running the built-in hardware diagnostics shows no error.

This system is completely unloaded. Starting a single "cp" command produces the error within seconds.

2.9K Posts

October 12th, 2021 13:00

Hello,

 

Bus resets occur in order to reestablish device communication. This message alone isn't something I've ever seen be a problem. You might also consider reseating the cabling between the HBA and the backplane at all connection points, seeing as the bus reset is for communication issues. To really get a handle on what's happening, reviewing OS logs may be a good idea, but I'd have to refer you to phone support for that sort of more detailed dive. I can offer to look at a hardware log for you, if you'd like, though. I wouldn't expect to identify a hardware problem, but looking can't hurt.

 

As for the output from the script you linked, I can't really speak to it specifically. That having been said, and assuming it's correct, the output would make me more inclined to reseat cabling in the storage chain. I did a bit of googling and found somewhat similar page hosted by Oracle that also recommended checking cabling.

 

The part number you shared, the 400-BDSD number, is the SKU sales uses. The disks should have a DPN (Dell part number) that is 5 characters long. This is the code support uses and may be helpful. For example, W347K is a part number we have on one of the platter HDDs we used for another model.

14 Posts

October 12th, 2021 17:00

Hi,

Looks like the drive part number may be KCT7J? I don't think the issue is anything specific to the drive though.

Reseating all the backplane cables hasn't made a difference.

Thanks for the offer to look at logs, but I don't think they show much more than was in my original message.

I'll probably open a formal support case. It would be good to know if this is something that can just be ignored. As I said, I haven't noticed any actual problems with the system, just a lot of these log messages.

Moderator

 • 

3.6K Posts

October 12th, 2021 21:00

Hi, GlennMorris at this point I'd say, by all means, please get all the assists you need. Before that makes sure your HBA330 firmware and the driver are up to date. So is your OS version. Unfortunately HBA330 does not store controller logs...

2 Posts

August 15th, 2022 09:00

Hi,

We have the same issue on multiple R7515 (hba330), did you ever find a solution?

Thanks!

14 Posts

August 15th, 2022 10:00

Hi,

I replaced the HBA330 controller with a PERC H730P, and reinstalled the OS using hardware RAID rather than software RAID. Since then it has been fine. That was the only solution I could find!

2 Posts

August 17th, 2022 12:00

Hi Glenn,

What did the support say? 

We have an internal SR opened and they keep saying:

"As our global team is still working on it, once I will get update we will inform you."

In our case, we need a true HBA, hopefully the Dell team can find a solution. We use Ubuntu LTS 20.04.

Thanks!

No Events found!

Top