PowerEdge HDD/SCSI/RAID

Last reply by 08-17-2022 Unsolved
Start a Discussion
2 Bronze
2 Bronze
1564

R7515 with HBA330: RHEL 8.4 mpt3sas errors and device power-on/reset

Hi,

I have a new R7515 with an HBA330 controller connected to 4x SSDs. The OS is RHEL 8.4.

Under "heavy" load, where "heavy" is just cp -a /some/big/directory /some/other/place, I see many messages like these in the system log:

mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)

This repeats a few times, and is then followed by

sd 0:0:0:0: Power-on or device reset occurred

This message is seen for all 4 drives.

0x31110e03 is by far the most common error code (about 30000 instances so far this month), which this utility says is PL_LOGINFO_SUB_CODE_DISCOVERY_SATA_ERR.

Apart from these log messages, everything seems to be functioning ok, but I would be very grateful for any insight anyone has... Thanks!

 

Replies (10)
1436

Hi, thanks for choosing Dell. May we have parts number on all 4x SSDs?


DELL-Young E
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#IWork4Dell

Did I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!

2 Bronze
2 Bronze
1435

Hi,

2 drives are 480GB S4610 drives, Dell part number 400-BDSD. The other 2 drives are 4TB Samsung 870 EVOs.

 

1426

Hi, since we learned that you are using non Dell parts as well, we'd have to ask to trying with Dell parts only and see if you are still facing the same issue. (try only one or try both Dell parts.) Regardless, I'd also like to point out you may have put some heavy workload on as your system is claiming. 


DELL-Young E
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#IWork4Dell

Did I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!

2 Bronze
2 Bronze
1408

Hi,

I removed the 2 non-Dell drives from the system and it made no difference, the issue still occurs just the same. Reseating all drives also made no difference, nor did applying all pending firmware updates (these were for unrelated aspects like the NIC and iDRAC). Running the built-in hardware diagnostics shows no error.

This system is completely unloaded. Starting a single "cp" command produces the error within seconds.

1406

Hello,

 

Bus resets occur in order to reestablish device communication. This message alone isn't something I've ever seen be a problem. You might also consider reseating the cabling between the HBA and the backplane at all connection points, seeing as the bus reset is for communication issues. To really get a handle on what's happening, reviewing OS logs may be a good idea, but I'd have to refer you to phone support for that sort of more detailed dive. I can offer to look at a hardware log for you, if you'd like, though. I wouldn't expect to identify a hardware problem, but looking can't hurt.

 

As for the output from the script you linked, I can't really speak to it specifically. That having been said, and assuming it's correct, the output would make me more inclined to reseat cabling in the storage chain. I did a bit of googling and found somewhat similar page hosted by Oracle that also recommended checking cabling.

 

The part number you shared, the 400-BDSD number, is the SKU sales uses. The disks should have a DPN (Dell part number) that is 5 characters long. This is the code support uses and may be helpful. For example, W347K is a part number we have on one of the platter HDDs we used for another model.

#Iwork4Dell
2 Bronze
2 Bronze
1393

Hi,

Looks like the drive part number may be KCT7J? I don't think the issue is anything specific to the drive though.

Reseating all the backplane cables hasn't made a difference.

Thanks for the offer to look at logs, but I don't think they show much more than was in my original message.

I'll probably open a formal support case. It would be good to know if this is something that can just be ignored. As I said, I haven't noticed any actual problems with the system, just a lot of these log messages.

1380

Hi, GlennMorris at this point I'd say, by all means, please get all the assists you need. Before that makes sure your HBA330 firmware and the driver are up to date. So is your OS version. Unfortunately HBA330 does not store controller logs...


DELL-Young E
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#IWork4Dell

Did I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!

2 Bronze
2 Bronze
446

Hi,

We have the same issue on multiple R7515 (hba330), did you ever find a solution?

Thanks!

2 Bronze
2 Bronze
444

Hi,

I replaced the HBA330 controller with a PERC H730P, and reinstalled the OS using hardware RAID rather than software RAID. Since then it has been fine. That was the only solution I could find!

Latest Solutions
Top Contributor