Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

39554

November 19th, 2011 22:00

SC1435 Hard Disk Error on SAS 5/IR

I have a SC1435 with 2 Seagate 146GB SAS hard disks attached to a SAS5/IR PCI-E card. The server had been running fine for 11 months since its last reboot. My hosting provider needed to move the cabinet it was housed in, so I powered it down remotely. It's a LAMP server with full specs below. My provider confirmed it had powered off, made the move and powered it back up. Since then, I've been getting "Device Not Available at HBA0" and "Hard Disk Error" every time I boot. The boot stops on the hard disk error.

My LAMP server is using software RAID. So in the SAS Adapter BIOS, I don't have any RAID arrays setup -- just 2 independent SAS disks.

I'm trying to determine if the SAS5/IR card has gone bad or if one or both of the hard disks are now dead.

I've done some diagnostics and trial-and-error testing with these results:

- SAS adapter BIOS utility (Ctrl-C) saw both disks. A test on the first disk failed immediately. A test on the second disk passed fine over the course of about 25 minutes.

- Dell diagnostics on the first disk failed on the Confidence Test & Drive Self Test with Error Code 4400:011A Target Not Ready. The same tests passed fine on the second disk.

- I have removed the SAS5/IR card and reseated/reconnected it. No change in error messages.

- I have removed both hard disks and re-installed/reconnected them. No change in error messages.

- I have disconnected the cable from the first drive but left the second drive connected. Don't get the "Device Not Available at HBA0" message, but *still* get the "Hard Disk Error" and no boot.

- I have switched the cables between the hard drives.Still get the "Hard Disk Error" and no boot.

I am looking for suggestions for pinpointing the source of the problem before I purchase a replacement SAS5/IR card or hard disks.

Thanks in advance for any help,

Chris

Server Specs:

  • 1U Dell PowerEdge SC1435 running Fedora 10 Linux OS
  • (2) AMD Opteron 2212 2.0 GHz Dual-Core
  • 24 GB RAM ECC DDR2 667
    • 4 x 2GB and 4 x 4GB modules PC2-5300P DDR2-667 5-5-5 240-pin Registered DIMM w/ cmd/addr Parity, 1.8V
  •  (2) 146GB Seagate Cheetah T10 Serial Attached SCSI (SAS) 10K RPM (ST3146755SS T107)
  • SAS5/IR PCI-E Controller Card
  • (2) Gig Ethernet
  • ATI RN50 PCI video controller with 16MB RAM
  • CD-RW/DVD-ROM
  • 600-W Power supply

2 Posts

December 8th, 2011 13:00

After some additional troubleshooting, a poster in another forum suggested I try again to access the 2nd drive from a bootable Linux CD and examine the boot partition. This reminded me of something I read a few years back with setting up RAID1 on boot partitions for Linux.

So I removed the presumably bad first drive and booted from the Fedora install CD with just the good drive connected. That gave me access to a shell where I could see the hard drive partitions. Then it was a matter of reassembling the logical RAID device for the boot partition. I was able to do that with a few mdadm commands, mount the /boot partition and then reinstall the boot loader (GRUB).

I restarted the system and sure enough Fedora booted. All the partitions looked good.

So indeed the 2nd drive was fine, after all. I apparently just missed a step when I initially set up the RAID1 for the /boot partition -- so the 2nd drive would be bootable if the 1st drive failed.

I'm going to contact DELL to get a replacement for the 1st drive as I believe it's still under warranty (only 4 years old). That should get me back up and running at full capacity.

No Events found!

Top