Unsolved

1 Rookie

 • 

9 Posts

1119

January 19th, 2022 02:00

Dell R540 iDRAC 5 upgrade will cause server hang

Hi!
 
We use for a couple of years HGST SN260 HHHL drives (currently HGST company doesn't exists) with PowerEdge R540
When we was upgraded iDRAC from 4. to 5. and the " Dell 64 Bit uEFI Diagnostics" our Dell R540 will start hang on boot or in BIOS Setup menu. When we removed HHHL Drive from slot - server boot ok
 
Currently we resolve problem via downgrade iDRAC to 4. and uEFI Diag to  "version 4301, 4301A61, 4301.62"
 
The stack traces:
 
Spoiler
1.jpeg

 

2.jpeg

 

  
The drive info from inventory:
Spoiler
BusNumber 175
DataBusWidth 16x or x16
Description Ultrastar SN200 Series NVMe SSD
DeviceDescription PCIe SSD in Slot 4
DeviceNumber 0
DeviceType PCIDevice
FQDD PCIeSSD.Slot.4-1
FunctionNumber 0
InstanceID PCIeSSD.Slot.4-1
LastSystemInventoryTime 2022-01-09T12:42:43
LastUpdateTime 2022-01-09T03:47:27
Manufacturer HGST, Inc.
PCIDeviceID 0023
PCISubDeviceID 0023
PCISubVendorID 1C58
PCIVendorID 1C58
SlotLength Short Length
SlotType PCI Express Gen 3
















BusNumber 175DataBusWidth 16x or x16Description Ultrastar SN200 Series NVMe SSDDeviceDescription PCIe SSD in Slot 4DeviceNumber 0DeviceType PCIDeviceFQDD PCIeSSD.Slot.4-1FunctionNumber 0InstanceID PCIeSSD.Slot.4-1LastSystemInventoryTime 2022-01-09T12:42:43LastUpdateTime 2022-01-09T03:47:27Manufacturer HGST, Inc.PCIDeviceID 0023PCISubDeviceID 0023PCISubVendorID 1C58PCIVendorID 1C58SlotLength Short LengthSlotType PCI Express Gen 3
Thanks,
k

7 Practitioner

 • 

9.7K Posts

 • 

48K Points

January 19th, 2022 06:00

K0ste,

 

The issue may be due to the drives not being compatible in that server. If my assumption is correct, then the part number for those drives is 2Y12T, which if that is the case then those drives are compatible in the R940 or a couple FC series systems only. 

 

Let me know if the part number differs from that and I can confirm them.

 

 

7 Practitioner

 • 

9.7K Posts

 • 

48K Points

January 19th, 2022 07:00

If you like you can Private Message me the svc tag and I can confirm the hardware it shipped with. Also, do you have the specific part number shown on the drives in question?

 

 

1 Rookie

 • 

9 Posts

January 19th, 2022 07:00

I think this is OEM drives shipped directly from the manufacturer
But also I think this is software regression, may be some field parsed incorrectly. Another HHHL drives (Plextor, Intel) works fine in this system simultaneously

 

Thanks

1 Rookie

 • 

9 Posts

January 19th, 2022 07:00

The service tag is {{Svc tag removed by Moderator}}

The drives PN, FW:

 

 

Node             SN                   Model                                    Namespace Usage                      Format           FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1     CVFT7356007Z400BGN   INTEL SSDPEDMD400G4                      1         400.09  GB / 400.09  GB      4 KiB +  0 B   8DV101H0
/dev/nvme1n1     CVFT7360000J400BGN   INTEL SSDPEDMD400G4                      1         400.09  GB / 400.09  GB      4 KiB +  0 B   8DV101H0
/dev/nvme2n1     P02946119493         PLEXTOR PX-256M9PeY                      1         256.06  GB / 256.06  GB    512   B +  0 B   1.07
/dev/nvme3n1     SDM00007F5EB         HUSMR7632BHP301                          1           3.20  TB /   3.20  TB      4 KiB +  0 B   KNGND110

 

 

 Smart of SN260

 

 

smartctl 7.2 2021-01-17 r5170 [x86_64-linux-3.10.0-1160.15.2.el7.x86_64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       HUSMR7632BHP301
Serial Number:                      SDM00007F5EB
Firmware Version:                   KNGND110
PCI Vendor/Subsystem ID:            0x1c58
IEEE OUI Identifier:                0x000cca
Total NVM Capacity:                 3,204,045,602,816 [3.20 TB]
Unallocated NVM Capacity:           0
Controller ID:                      35
NVMe Version:                       1.2.1
Number of Namespaces:               128
Namespace 1 Size/Capacity:          3,200,631,791,616 [3.20 TB]
Namespace 1 Formatted LBA Size:     4096
Namespace 1 IEEE EUI-64:            000cca 0c030ab680
Local Time is:                      Wed Jan 19 22:55:38 2022 +07
Firmware Updates (0x0b):            5 Slots, Slot 1 R/O
Optional Admin Commands (0x000e):   Format Frmw_DL NS_Mngmt
Optional NVM Commands (0x003f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Resv
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Warning  Comp. Temp. Threshold:     84 Celsius
Critical Comp. Temp. Threshold:     87 Celsius
Namespace 1 Features (0x04):        Dea/Unw_Error

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    25.00W       -        -    0  0  0  0    15000   15000
 1 +    24.00W       -        -    1  1  1  1    15000   15000
 2 +    23.00W       -        -    2  2  2  2    15000   15000
 3 +    22.00W       -        -    3  3  3  3    15000   15000
 4 +    21.00W       -        -    4  4  4  4    15000   15000
 5 +    20.00W       -        -    5  5  5  5    15000   15000
 6 +    19.00W       -        -    6  6  6  6    15000   15000
 7 +    18.00W       -        -    7  7  7  7    15000   15000
 8 +    17.00W       -        -    8  8  8  8    15000   15000
 9 +    16.00W       -        -    9  9  9  9    15000   15000
10 +    15.00W       -        -   10 10 10 10    15000   15000
11 +    14.00W       -        -   11 11 11 11    15000   15000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 -     512       0         0
 1 -     512       8         2
 2 +    4096       0         0
 3 -    4096       8         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        41 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    83,533,039 [42.7 TB]
Data Units Written:                 313,028,185 [160 TB]
Host Read Commands:                 1,099,612,891
Host Write Commands:                4,826,801,960
Controller Busy Time:               121,663
Power Cycles:                       24
Power On Hours:                     3,168
Unsafe Shutdowns:                   18
Media and Data Integrity Errors:    0
Error Information Log Entries:      2
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               41 Celsius
Temperature Sensor 2:               33 Celsius
Temperature Sensor 3:               37 Celsius
Temperature Sensor 4:               41 Celsius

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

 

 

Thanks

1 Rookie

 • 

9 Posts

January 19th, 2022 08:00

Chris, this is a software degrade issue after version upgrade

Please, escalade this issue as ticket for developers

The servers work for a two years with this hardware and was receives updates without any issue until iDRAC5

Thanks

7 Practitioner

 • 

9.7K Posts

 • 

48K Points

January 19th, 2022 08:00

The drives that shipped with the server were 1TB Hitachi Rainiers (part number HNWHH), the drives you have installed now are 3.2 TB Hutachi Ultrastars, which I am not showing listed for the R540.

 

Also, for securtiy reasons, I removed the svc tag you provided from the posting. 

 

 

7 Practitioner

 • 

9.7K Posts

 • 

48K Points

January 19th, 2022 09:00

I understand your concern, but if those drives aren't supported on that platform, and is only an issue with those drives on the iDRAC 5.x firmware, I am not certain it would be considered as the iDrac.

 

Just because the drives worked previously, doesn't ensure they will work when the firmware is updated, as they aren't intended or validated for that platform.

 

If you are still wanting to, you can call in and work with them to analyze the issue, but with the drives not being supported it may not proceed.

 

 

1 Rookie

 • 

9 Posts

January 19th, 2022 10:00

Call to who? I need call to developers and tell that if you can't anymore support this drive - please provide this in change log, because is impossible to rollback iDRAC version when this drive is installed, we was should involve remote hands and paying for has work (is a hours) to debug "what we need to revert in software for just a boot". It's a regression in software, not a hardware issue and not a brand new installation

 

Thanks

1 Rookie

 • 

9 Posts

January 19th, 2022 11:00

We tried appeal to support, our case was 1080552072

The support was tell us - you have hardware problem? No! Go to software developers! 🤷‍

7 Practitioner

 • 

9.7K Posts

 • 

48K Points

January 19th, 2022 11:00

I apologize for the misunderstanding. I meant that you can call in to support for them to analyze and escalate, but with the drives being unsupported/unvalidated it likely won't proceed. As it is only occurring with those specific drives installed and isn't occurring with those drives removed and other supported drives installed.

 

 

2 Intern

 • 

63 Posts

April 8th, 2023 13:00

KNGND means it's a genuin HGST drive and not an OEM. Feel free to PM Me for the v122 firmware.

2 Intern

 • 

63 Posts

April 8th, 2023 13:00

Hi,

Sorry for the very late reply.

Those drives work fine on iDRAC 5.y+ if you have firmware v122, not KNGND110

 (you posted this in one of your replies).

See my post on STH here:

https://forums.servethehome.com/index.php?threads/the-quest-for-the-hgst-ultrastar-sn260-firmware-updates.34135/

Here's my T640:


# nvme list|grep -i 122
/dev/nvme0n1 SDM0000XXXXX HUSMR7676BHP3Y1 1 7.68 TB / 7.68 TB 4 KiB + 0 B KNGND122
/dev/nvme1n1 SDM0000XXXXX HUSMR7676BHP3Y1 1 7.68 TB / 7.68 TB 4 KiB + 0 B KNGND122
# /usr/bin/ipmitool bmc info|grep Firmware.Revi
Firmware Revision : 5.10

No Events found!

Top