This post is more than 5 years old
12 Posts
0
112872
PE 1900 w/PERC 5i SAS RAID Will not boot.
I'm working on a client's Power Edge 1900 that suddenly and without warning (for lack of a better phrase) stopped serving about 3 days ago. The client reported LAN and WAN problems Friday but did not request an onsite visit until Saturday. The client reported that she had someone in her office power cycle the server Friday, but that person did not select the correct server from the KVM switcher so no knowledge as to error messages or any information. When I arrived Saturday (yesterday), I found the server to be electrically on, but stalled still displaying the BIOS messages.
I tried several troubleshooting steps to isolate the problem to the PERC 5/i SAS controller. As you see in the photo, the RAID 5 logical drive is still found on the controller and when I checked the three drive RAID configuration it reported the RAID in Optimal condition. No errors. If I unplug the RAID controller from the system board, the machine will boot past the BIOS messages, but stop looking for a hard drive. If I plug in a USB bootable "thumb drive" with Ubuntu Linux, I can boot to it with or without the PERC controller plugged in, but if the PERC controller is plugged in, then I have to first press F11 and specify booting from the USB drive. The boot up menu does offer the PERC as a device to boot from, but if selected will not boot, just stall.
I know that the PERC card has a DIMM and an external battery. I've been reading other posts and it appears that both the DIMM and external battery would not cause the situation I have described. I figure if the controller was bad, that it would not report the logical drive and would not be managable, but its the only thought I have.
I tried to upload a photo of the BIOS messages, but the site would not accept it.
Thanks for your time,
Tom
TCWNetOps
12 Posts
1
March 21st, 2012 11:00
Just to close out this post with my final results.
As it turns out the PE 1900 that I was working on had a double failure. While I was still attempting to recover data (booted via USB with Ubuntu Linux) from the RAID 5 partitions, I attempted to wake up the video from being asleep (after running for 3 days), the monitor turned on but nothing would display. Then I ran the machine (Linux) over night to run a Deep Scan and the next morning I re-set the lost partitions’ partitions type which requires a reboot. This time upon reboot, a hardware error appeared. The front display began flashing orange and reporting:
W1228 ROMB Batt < 24hr
Which according to the owner’s manual means:
Warns predictively that the RAID battery has less than 24 hours of charge left. Replace RAID battery. See "Replacing the SAS RAID Controller Daughter Card Battery"
So, this could explain the crash and resulting corruption of the drive partitions. At this point the machine would not let me boot to my USB thumb drive, but rather runs Dell Diagnostic which is confirming the displayed error with : 2900:0221, 2900: 0325
I ordered and replaced the battery. Attempted again to recover partition data, but eventually gave up, so I started a system rebuild when the second failure occurred.
The second failure was one of the three hard drives that made up the RAID 5. I isolated and confirmed which drive was failing. While the drive itself may still be good, the SMART circuitry was not reporting per Western Digital diagnotics. I purchased and installed a replacement drive, rebuilt the OS and have had the machine back in service for the last 2 weeks.
Thank you to all who tried to help and hopefully this post may help someone else who experiences the same or similar combinations of strange events.
theflash1932
7 Technologist
7 Technologist
•
16.3K Posts
0
February 19th, 2012 21:00
What exactly does your system do? Where does it stop? What "BIOS message" is being displayed?
Check the BIOS Setup (F2), Integrated Devices, and make sure that the Integrated RAID Controller is Enabled.
You might also check the card in another slot (use slot 3) to see if the integrated/dedicated slot is the problem..
TCWNetOps
12 Posts
0
February 20th, 2012 08:00
In the BIOS configuration (F2) it reports the PERC SAS as being in place and it has assigned it an IRQ. As originally reported, the server stalls/ locks up/ stops as the BIOS is handing off to hard drive to boot the OS. The last line showing is for the 5 second count down (with 5 or 6 periods) to boot into Back Plane Management. No beeps, no other messages on screen, no reboot, just stops.
I've considered trying a different slot to "test", but I am puzzled that if it where the slot, why does the BIOS see the card and the card is managable? I can go into the controller card's configuration (CTRL-R).
I tried to upload a photo of the screen, but this site fails to accept the upload.
Thanks,
Tom
theflash1932
7 Technologist
7 Technologist
•
16.3K Posts
0
February 20th, 2012 08:00
I've seen many degrees of failure ... it may be completely non-functional or may be partially functional.
However, given what you have told me, I would suspect it is the OS at this point. Is the LCD panel amber and scrolling an error message or is it blue?
I would boot to your 2003 CD (you will need to slipstream the PERC driver at F6 from floppy or use nLiteOS.com), Recovery Console, and start with a CHKDSK /R and FIXBOOT.
TCWNetOps
12 Posts
0
February 20th, 2012 10:00
Thanks theflash1932 for give some suggestions.
Regarding your last post.
- No amber, just blue, unless I have the cover off, then an intrusion alert - normal operations.
- I download the PERC SAS driver from Dell's support site and booted from Win 2003 CD doing the F6 step. The PERC controller was located and the setup continued to the install or repair point. Choosing (R) repair using Recovery Console, but it could not find the hard drive (aka the logical drive). Actual message is "Setup did no find any hard disk drives installed in your computer. Make sure any hard disk drives are powered on and proplerly connected to your computer, and that any disk-related hardware configuration is correct. This may involve running a manufacture-supplied diagnostic or setup program."
- With that step failed, rebooted and ran Dell's System Test on the controller, systemboard, and hard drives - all passing.
Any further suggestions would be helpful.
Thanks,
theflash1932
7 Technologist
7 Technologist
•
16.3K Posts
0
February 20th, 2012 11:00
Describe the F6 step ... it should have brought you back to a screen where you selected a driver to load. Windows may have told you that it already had a driver you could use, asking you which one you were sure you wanted to use. Follow the prompts very carefully, as the very last question is worded oddly, such that even experienced techs hit the wrong key here (skipping use of the loaded driver).
I would suggest using nLite to integrate the driver into the installation media. Just make sure, at the appropriate times, that you choose MULTIPLE DRIVERS, select ALL drivers presented, then choose TextMode.
Make sure this is the driver you are using (says PERC 6, but the driver is the same):
www.dell.com/.../DriverDetails
If that doesn't work, then what is your PERC 5 firmwares at (CTRL-R, CTRL MGMT screen)?
TCWNetOps
12 Posts
0
February 20th, 2012 11:00
Pressed F6, then Pressed S. It displayed Dell SAS 5x and SAS 6x Controller Driver (Windows Server 2003 32-bit), pressed Enter. It then displays "Setup will load support for the following mass storage device(s): Dell SAS 5x and SAS 6x Controller Driver (Windows Server 2003 32-bit)." I had choices of S=Specify Additional Device, Enter=Continue F3=Exit. I choose Enter.
At the Welcome to Setup page, I choose R=Repair (out of Enter=Continue or F3=Quit). Then I get no drive message.
Product Name: PERC
Package: 5.0.2-0003
FW Version: 1.00.02-0157
BIOS Version: MT23
CtrlR Version: 1.02-007
theflash1932
7 Technologist
7 Technologist
•
16.3K Posts
0
February 20th, 2012 12:00
Then I would try nLite, making sure to use the driver I supplied a link for.
Your firmware is very old. According to the latest driver 2.24.0.32, you should have:
So, if that still doesn't work, try version 1.21.0.32 which is the driver package released with firmware 5.0.2.0003; for this one, you should have:
theflash1932
7 Technologist
7 Technologist
•
16.3K Posts
0
February 21st, 2012 09:00
CHKDSK /R is what you want to run ... I don't see /P as a valid option. Typo maybe - or maybe only in RC? CHKDSK /F is fine, but CHKDSK /R goes one step further in fixing errors.
CHKDSK often requires 2-3 passes for substantial errors. I would recommend running it again. If still nothing, try FIXBOOT from RC.
TCWNetOps
12 Posts
0
February 21st, 2012 09:00
Your drivers in combination with the Windows 2003 Standard R2 cd that came with the machine got me to the Recovery Console. I was unable to run DIR on the volume. CHKDSK /F is not one of the options, only /P and /R. I ran CHKDSK /P which reported CHKDSK found one or more errors on the volume. Now can run DIR on C drive. Rebooted the server and the Windows LOGO came up and now I see a grey page with a solid (an occasional blink) drive light, now for 20 minutes.
theflash1932
7 Technologist
7 Technologist
•
16.3K Posts
0
February 21st, 2012 10:00
No, it isn't doing a repair ... it is likely bluescreening and is set to reboot on error. You might be able to disable this automatic restart by hitting F8 BEFORE the Windows screen comes up (just after the "within 5 seconds" ESM prompt), then choosing "Disable automatic restart on system failure". If it blue screens, it will stop so you can see what the message is - or at least to confirm that is what is happening.
I would just boot to the CD now to run the CHKDSK /R now.
TCWNetOps
12 Posts
0
February 21st, 2012 10:00
Thank you again for your advice. While I have done IT work for over 12 years, most of the time systems just work and I have not had too many failures like this so helpful to communicate with someone who has seen something like this before.
Regarding the latest status. Since my last message. The machine has self rebooted about 10 times. The reboot cycles are getting shorter (now about every 2 minutes, is it running its own repair? I'll give it a little more time on its own, then I will proceed with your further recommendation of running CHKDSK 2 or 3 more times.
At this point, I believe that there was a problem with the OS before Friday. The client manually power cycled / restart on Friday, which I believe further damaged / corrupted the boot sector which is why the machine could not see the drive to boot from.
TCWNetOps
12 Posts
0
February 21st, 2012 10:00
Yep your are correct, blue screen with:
STOP: c000021a {Fatal System Error}
The Windows Logon Process System process terminated unexpectedly with a status of
0x00000080 (0x0000000000 0x0000000000)
The system has been shut down.
I'll run CHKDSK /R now.
TCWNetOps
12 Posts
0
February 21st, 2012 10:00
Yep your are correct, blue screen with:
STOP: c000021a {Fatal System Error}
The Windows Logon Process System process terminated unexpectedly with a status of
0x00000080 (0x0000000000 0x0000000000)
The system has been shut down.
I'll run CHKDSK /R now.
TCWNetOps
12 Posts
0
February 21st, 2012 11:00
Ran CHKDSK /R three more times - the first time through it reported that there were errors that were fixed, the two other times through came through clean. Upon rebooting, received the same Stop - Blue screen as my 10:39 post. I'll try FIXBOOT.