Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

48586

August 23rd, 2014 22:00

PE T300 PERC6/I RAID Stalls

SBS 2003 SP2 RAID-5 hangs. Outage lasts 2-3 minutes. Recovers itself.
No evidence of HDD failure.

Interferes with WIN 2003 / Exch 2003 work jobs. Various tasks timing out.

Unable to find any malware.

This has been going on for some months but I think it's gradually getting worse.

I have seen some isolated Windows logging related to BBU on PERC6/i but DELL Server Administrator claims Backup Battery is fine.  I didn't order one so I guess the PERC comes with it by default.

I'd appreciate any helpful suggestions.

7 Technologist

 • 

16.3K Posts

August 24th, 2014 16:00

From what you posted, the Learning Cycle just completed, so if that fits the time frame of the issue you described, that is likely the cause.

BIOS firmware is obvious - it is the foundation of all the system hardware.

ESM (Embedded Server Management)/BMC (Baseboard Management Controller) works together with the BIOS to manage server hardware, and is primarily responsible for monitoring components.

PERC firmware is the interface between the system and the drives.  Drivers should be updated before firmware.

HDD firmware is the interface between the controller and the actual storage medium.  PERC firmware and HDD firmware should always be at the latest to provide the smoothest operation between the two.

From your screenshot, your BIOS and PERC driver and firmware are up to date.  The latest version of the BMC is 2.50:
http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=3TMY6&fileId=3078114375&osCode=WNET&productCode=poweredge-t300&languageCode=EN&categoryId=ES

The latest firmware version for the drive appears to be MA0D.  The update is a bootable utility that flashes any Dell drives to the latest firmware:
http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=N36TG&fileId=3368488753&osCode=WNET&productCode=poweredge-t300&languageCode=EN&categoryId=AS

7 Technologist

 • 

16.3K Posts

August 24th, 2014 23:00

Support could certainly walk you through updating the firmware, but you should be able to create a bootable USB flash drive by running /Nautilus_efi_A12/UsbMake.exe.

50 Posts

August 25th, 2014 10:00

Support could certainly walk you through updating the firmware, but you should be able to create a bootable USB flash drive by running /Nautilus_efi_A12/UsbMake.exe.

Thanks.  No problem there.

BRPv

7 Technologist

 • 

16.3K Posts

August 25th, 2014 10:00

I had no problem running it on Windows 8.1.  Run it on a normal/modern workstation/OS, and it should be fine.

50 Posts

August 25th, 2014 13:00

I had no problem running it on Windows 8.1.  Run it on a normal/modern workstation/OS, and it should be fine.

You're right. I ran it on Windows 7 Pro. I've always been afraid to run DELL packages on anything other than the intended target machine.  Wish they'd be more helpful in their instructions.

Thanks Flash.  I'll try this after hours.

BRPv

7 Technologist

 • 

16.3K Posts

August 24th, 2014 07:00

What do the log entries about the BBU say?  The PERC comes with - and requires - the battery unit.  It is used to keep uncommitted writes alive in the cache.  The write cache significantly improves performance.  If the cache is disabled (battery loses charge, manually disabled, etc.) and the system is trying to send a large number of writes to the disk, it is very possible for it to hang or slow while waiting for the writes to complete.

Every 90 days, the battery goes through a "learning cycle", where the controller disables write caching, drains the battery completely, recharges the battery, and re-enables write caching to determine the health of the battery.  During this time, the system may experience sluggish performance.  If you noticed this one time, or several times over a 24 hour period, then that is likely why, since OMSA is indicating the battery is healthy.  You can check to see if this learning cycle coincided with your "outage" by looking at the controller's log (OMSA, Storage, PERC, Information/Configuration (link at top of page), Export Log from dropdown menu of Available Tasks for the controller).  Attach it or post it somewhere for review if you need assistance with it.

My suggestion (since the battery appears to be fine) is to make sure that all the system firmware (BIOS, ESM/BMC, PERC (driver first), HDD, etc.) is up to date.  If it is something other than the learning cycle to blame, but the battery is healthy, then it may be a glitch that can be corrected with up to date firmware.  It could also be an intermittent issue with the battery (the controller log could also confirm that), an issue with the controller (unlikely), or it could even be faulty disks (you could run diagnostics to ensure all disks are healthy).  I would also suggest running a Consistency Check on your array if you have not done so recently.

 

50 Posts

August 24th, 2014 15:00

OK, that's a laundry list.  I don't recognize ESM/BMC. I'll check the PERC driver. 
It's not clear to me where HDD firmware fits into this though.

Here's what I've collected from the Server Administrator App:


PHYSICAL DISKS

     HDD (all three) SATA ST3250310NS 232.25GB MA08MA08

FIRMWARE/DRIVER Information for PERC6/i Adapter

             Firmware Version 6.3.0-0001
               Driver Version 2.24.00.32
      Storport Driver Version 5.2.3790.4173
 
BATTERY on CONTROLLER PERC6/i Adapter

                                   Name Battery 0
                                     State Charging
    Predicted Capacity Status Ready
                           Learn State Idle
                  Next Learn Time 89 days 23 hours
         Maximum Learn Delay  7 days 0 hours

System BIOS

      Manufacturer ...........  DELL
      Version ....................  1.5.2
      Release Date ...........  11/02/2010


FIRMWARE INFORMATION

      Baseboard Management Controller 2.46

PROCESSORS

      Xeon 3323 @ 2500 MHz Model 23 Stepping 10 Core Count 4

I'll go see about drivers / firmware.  I'm not sure how I'm supposed to update the firmware on the SATA drives though.

Thanks for the come-back.

50 Posts

August 24th, 2014 20:00

Please excuse me. I replied before I saw your reply.

It helps a bit.

Thanks,

BRPv

50 Posts

August 24th, 2014 20:00

What do the log entries about the BBU say?  The PERC comes with - and requires - the battery unit.  It is used to keep uncommitted writes alive in the cache.  The write cache significantly improves performance.  If the cache is disabled (battery loses charge, manually disabled, etc.) and the system is trying to send a large number of writes to the disk, it is very possible for it to hang or slow while waiting for the writes to complete.

Every 90 days, the battery goes through a "learning cycle", where the controller disables write caching, drains the battery completely, recharges the battery, and re-enables write caching to determine the health of the battery.  During this time, the system may experience sluggish performance.  If you noticed this one time, or several times over a 24 hour period, then that is likely why, since OMSA is indicating the battery is healthy.  You can check to see if this learning cycle coincided with your "outage" by looking at the controller's log (OMSA, Storage, PERC, Information/Configuration (link at top of page), Export Log from dropdown menu of Available Tasks for the controller).  Attach it or post it somewhere for review if you need assistance with it.

My suggestion (since the battery appears to be fine) is to make sure that all the system firmware (BIOS, "ESM/BMC, PERC (driver first), HDD, etc.) is up to date.  If it is something other than the learning cycle to blame, but the battery is healthy, then it may be a glitch that can be corrected with up to date firmware.  It could also be an intermittent issue with the battery (the controller log could also confirm that), an issue with the controller (unlikely), or it could even be faulty disks (you could run diagnostics to ensure all disks are healthy).  I would also suggest running a Consistency Check on your array if you have not done so recently."

 I've run a Consistency Check - Turned out OK.
I forced a "Learn" cycle which ended up OK.

Tried running "live" diagnostics and each spindle would to from 10% done to 90% done, then recycle back and repeat. This was a "quick" diagnostic and it was anything but.  I finally gave up and aborted these diagnostics.  FWIW, all three spindles behaved the same.  I didn't think of running them separately until now but it'll have to wait.


I've downloaded drivers and firmware but I haven't a clue exactly which I should install. 
Getting proper drivers and firmware at DELL has always been a puzzle for me.
I guess I'll have to put in a call and see if they'll talk to me and tell me which ones apply.
I don't really have a clue.  Here are my downloads:

Bcom_LAN_17.8c.4.3_DOSUtilities_18.2.0.51.exe
ESM_Firmware_PTWX7_WN32_1.06_A00.exe
SAS-RAID_Firmware_3P52K_WN32_6.3.3-0002_X00.exe
Serial-ATA_Firmware_WM0W0_WN32_MA10_A0.exe
T300_ESM_Firmware_3TMY6_WN32_2.50_A00.exe

The first one is a NIC driver package. 

I have the original config but it's like a BOM and the driver lists just don't really correlate for me.

BRPv

50 Posts

August 24th, 2014 21:00

From what you posted, the Learning Cycle just completed, so if that fits the time frame of the issue you described, that is likely the cause.

BIOS firmware is obvious - it is the foundation of all the system hardware.

ESM (Embedded Server Management)/BMC (Baseboard Management Controller) works together with the BIOS to manage server hardware, and is primarily responsible for monitoring components.

PERC firmware is the interface between the system and the drives.  Drivers should be updated before firmware.

HDD firmware is the interface between the controller and the actual storage medium.  PERC firmware and HDD firmware should always be at the latest to provide the smoothest operation between the two.

From your screenshot, your BIOS and PERC driver and firmware are up to date.  The latest version of the BMC is 2.50:
http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=3TMY6&fileId=3078114375&osCode=WNET&productCode=poweredge-t300&languageCode=EN&categoryId=ES

The latest firmware version for the drive appears to be MA0D.  The update is a bootable utility that flashes any Dell drives to the latest firmware:
http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=N36TG&fileId=3368488753&osCode=WNET&productCode=poweredge-t300&languageCode=EN&categoryId=AS

OK, the BMC update ran as expected.

But the firmware in NAUTILUS package didn't behave as expected.
It did unzip the downloaded file and opened the containing folder, but nothing was launched.
There were no error messages, and I've no idea how to proceed from here. There appear to
be some script files but I've no idea how to launch them or which ones to launch.  The folder it creates is named "N36TG", which contains a folder named "Nautilus_efi_A12" and a small text file named "version.txt".  Should I contact DELL tech support on this?

Thanks,

BRPv

50 Posts

August 25th, 2014 10:00

[quote user="theflash1932"]

Support could certainly walk you through updating the firmware, but you should be able to create a bootable USB flash drive by running /Nautilus_efi_A12/UsbMake.exe.

Thanks.  No problem there.

BRPv

[/quote]

Guess I spoke too soon.  USBMAKE.EXE gives an error message:

USBMake.exe - Ordinal Not Found
The ordinal 344 could not be located in the dynamic link library COMCTL32.dll.
OK

My system is "Microsoft Windows Server 2003 for Small Business Server SP2"

Is this Nautilus package for some later version of Windows server?

Thanks,

BRPv

7 Technologist

 • 

16.3K Posts

August 25th, 2014 13:00

I've always been afraid to run DELL packages on anything other than the intended target machine.

No need to be afraid ... some utilities, like 32-bit Diagnostics, Nautilus HDD utility, Repository Manager, etc., are not intended to actually run on the target machine.  Update packages for firmware or drivers are run directly on the target machine, and will abort if they aren't relevant to the hardware.

50 Posts

August 25th, 2014 15:00

I had no problem running it on Windows 8.1.  Run it on a normal/modern workstation/OS, and it should be fine.

I did that on a Fat32 Flash.  I also took the opportunity to build an .ISO -- just in case. :)

The Flash drive was recognized by the PE T300 but using the boot menu to boot from it resulted in a big nothing.

So, I burned a CD from the ISO and tried it that way.

I got two (2) spaced followed by the letters "NK". (no quotes)

Nothing else happened.

I don't know what to do from here.

BRPv

7 Technologist

 • 

16.3K Posts

August 25th, 2014 16:00

I haven't used this utility for a while (I use Repository Manager now), so I can't remember the command you need (if it is even still needed).

50 Posts

August 25th, 2014 16:00

I haven't used this utility for a while (I use Repository Manager now), so I can't remember the command you need (if it is even still needed).

OK, I'll try to contact DELL for some help with this.

Thanks,

BRPv

No Events found!

Top