Unsolved
9 Posts
0
1843
June 27th, 2022 06:00
R720xd mass disk failure
Hello,
I have a very weird issue with our second R720XD.
After a reboot almost all the disks marked as failed:
When I reseat the disks they are coming back as healthy (and foreign) but the next reboot they are failed again.
This is probably a disk related issue because when I swap the disks with another server the problem follows the disks.
The servers have been running for a while before the issue and there were no recent firmware updates.
Did anyone see such a problem before?
regards,
Adam
No Events found!



DELL-Charles R
Moderator
•
4.7K Posts
0
June 27th, 2022 11:00
Hello csecsi,
You may have identified the issue as you state the issue follows the drives.
Are these Dell branded drives or Non Dell drives? - Western Digital, Toshiba, Seagate
Dell should be printed on the label if they are Dell drives.
Can you provide pictures of drive labels?
I would recommend action plan to start with:
*Update firmware
*Run diagnostics to test the hardware
R720xd Support page for firmware and drivers:
https://dell.to/39ZugJt
Update iDRAC, BIOS, controller, backplane and hard drive firmware (if non Dell drives you may have to find from manufacturer)
Diagnostics:
Boot to F11 on Dell Splash screen, selecting Boot Manager -> System Utilities -> Launch Dell Diagnostics. Note any messages and continue testing.
DELL-Charles R
Moderator
•
4.7K Posts
0
June 27th, 2022 12:00
Hello csecsi,
You indicated you could reseat and then import a foreign but next reboot they are failed again.
My proposal was to update firmware when they were online, not in a failed state.
The second image showing Revision: D1R7 is a valid Dell firmware.
The first image you posted showing WD4001FYYG-01SL3 firmware VR07 does not come up as Dell firmware.
Could some disks have been replaced with non-Dell drives?
Are you able to run the diagnostics?
Can you look over the System Event Log in the DRAC for any storage related issue and let me know what you see?
csecsi
9 Posts
0
June 27th, 2022 12:00
Hi Charles,
Thank you for your reply.
We only use certified dell drives (I cannot take photos as I am about 100 miles from the DC but I swear they are ).
All the firmware are up to date.
It would be quite difficult to update the SAS drive firmware anyway as they marked as failed after reboot.
regards,
Adam
csecsi
9 Posts
0
June 27th, 2022 14:00
Hi Charles,
In theory it is possible that the 3rd party who covers our out of warranty servers supplied a non-Dell drive but it does not explain the faults of the Dell drives (I suppose).
As I have no running OS on the server I can only update the firmware via the lifecycle controller which needs reboot when the drives are already faulty.
The diagnostic is complaining about the faulty drives but that is all I have.
Interesting that when some of the drives are replaced the new ones are perfectly fine after reboot.
regards,
Adam
DELL-Young E
Moderator
•
5.3K Posts
0
June 27th, 2022 19:00
Hi, since there’s a 3 party part component involved in your system, At this point how much I can offer in terms of troubleshooting. Sorry I couldn’t be of help.
DELL-Charles R
Moderator
•
4.7K Posts
0
June 28th, 2022 05:00
Hello csecsi,
I can look over the SupportAssist report if you can run it and upload.
Export a SupportAssist Collection via iDRAC7 and iDRAC8
https://www.dell.com/support/kbdoc/en-us/000126803/export-a-supportassist-collection-via-idrac7-and-idrac8?dgc=SM&cid=376139&lid=spr7150637457&refid=sm_LITHIUM_spr7150637457&linkId=170337681
Upload it here under the R720XD service tag : https://upload.dell.com/
Then send me a private message with the service tag for me to retrieve the report.
DELL-Charles R
Moderator
•
4.7K Posts
0
June 30th, 2022 05:00
Hello csecsi,
I received the private message.
I will need a little bit to collect the report, review and update you.
DELL-Charles R
Moderator
•
4.7K Posts
0
June 30th, 2022 12:00
Hello csecsi,
I'm still working to get a look at the report.
Was there any data loss?
Can you also let know your Business name I can attach to the case?
csecsi
9 Posts
0
June 30th, 2022 12:00
Hi Charles,
One of the nodes was a fresh install so not a huge pain the other one is a production box which should be good to rescue (we can restore so no data loss but work). My mayor concern is how to rely on the remaining 720XDs after this incident.
I work for LexisNexis.
regards,
Adam
DELL-Charles R
Moderator
•
4.7K Posts
0
July 1st, 2022 10:00
Hello csecsi,
On these drives I can't say what caused it. The controller log only went back to 6/17/22 and the issue had been occurring previous to that. So history of what happened up to that point is lost. We don't do RCA, we do break/fix. The fix you were able to find that the issue followed the drives and installing replacement drives, they remain online fixed the issue.
You may have already addressed this but I wanted to note I saw this in the log.
As of date of the SupportAssist log date 2022-06-29 PD11 shows failed.
Reseat the drive to see if it comes online then Update the firmware. If it does not then replace drive.
Seagate PSE7 for model number ST8000NM0135.
https://dell.to/3y6Myk8
If you suspect other issues you could give the system a diagnostic run to test the hardware.
Boot to F11 on Dell Splash screen, selecting Boot Manager -> System Utilities -> Launch Dell Diagnostics. Note any messages and continue testing.
DELL-Charles R
Moderator
•
4.7K Posts
0
July 5th, 2022 06:00
Hello csecsi,
I received the private message for a second R720XD. I will gather the log, review and update you.
csecsi
9 Posts
0
July 5th, 2022 07:00
Hi Charles,
This is the production box with different drives.
Moreover it was not even touched since the mass failure hoping that we can revive it.
regards,
Adam
DELL-Charles R
Moderator
•
4.7K Posts
0
July 5th, 2022 07:00
Hello csecsi,
While I gather the log I wanted to check. Are these the same hard drives that were in the first R720xd?
DELL-Charles R
Moderator
•
4.7K Posts
0
July 5th, 2022 11:00
Hello csecsi,
Would you be able to have the DC send you pictures of the drive labels on affected drives?
csecsi
9 Posts
0
July 6th, 2022 05:00
Hi Charles,
Here they are.
regards,
Adam