Start a Conversation

Unsolved

This post is more than 5 years old

2831

April 9th, 2016 17:00

2850 w/ PERC 4 - How do I properly terminate the SCSI drives?

We have a legacy PowerEdge 2850 with what appears to be a two-channel PERC in it.


The boot drives for this ESXi host are in slots 0 and 1 and work great. However, we had a failure on a pair of 143Gb SCSI drives on the second channel. Being largely clueless about replacing the failed disk, we fumbled around for a bit and thought we got it up and running, but now the volume seems to intermittently disappear when it's under load.


I now think that this is because the SCSI chain isn't properly terminated. After days of looking at technical docs, I'm still unable to determine if the drives are terminated on the drive unit, or if the termination is on the actual drive cables (I suspect the latter).

Could someone please point me to some docs or toss me a clue that tells me:

1. How the termination is made, whether on the cable or the disk.

2. If it's on the cable, how to get to the drive cables to check where they are terminated. If it's on the disk, do I have to take the drive out of the sled to find the jumper?


Sincere thanks in advance.


Bob

Moderator

 • 

6.2K Posts

April 10th, 2016 17:00

Hello

The terminator is at the end of the cable.

Thanks

8 Posts

April 10th, 2016 20:00

I just watched a take-apart video on Youtube. It appears that in fact, there are no "drive cables". The hot-swap disks land on the backplane board and the backplane board mates with the logic board of the system.

So, are there jumpers or DIP switches on the backplane that need setting?

8 Posts

April 10th, 2016 20:00

Cool, thank you. Any idea how I find the end of cable? Is there a take-apart we can follow?

7 Technologist

 • 

16.3K Posts

April 10th, 2016 22:00

No. With the 2850 backplane, NO termination is required. If you suspect a termination issue, you'll need to replace the backplane.

Check the controller log for punctures - double read faults across multiple disks. The array may have been damaged during the "fumbling" to replace the failed drive. To replace a failed drive, simply remove the drive "HOT" - do NOT power down to replace - wait 60 seconds, insert replacement "HOT", wait 60 seconds for rebuild to start. If it doesn't start, then you'll need to assign it as a hot-spare or "rebuild" it.

If no punctures, make sure all the system firmware is up to date (BIOS first, then ESM/BMC, then backplane, PERC, HDD, etc.).

8 Posts

April 11th, 2016 13:00

Thank you for the confirmation, theflash1932. But if that's correct, we're looking at a different issue.

In our ignorance, we simply rebuilt the RAID 1. We had a good backup of the only VM on it, the ESXi RAID 1 was intact, so we simply nuked the RAID config and started over.

So, the boot RAID 1 has always been fine. However, twice now we've successfully rebuilt the second RAID 1 on the second channel. The array is fine for a few days, and then, and I think it's under load, the array fails. The second time it happened, we swapped *both* disks, rebuilt the array again and this weekend, it seems to have failed once more.

I don't know how to check the controller log - can this be done from the PERC Utility at boot time?

We *do* have a second 2850 as a parts spare, so I believe we have a spare backplane we can swap in.

Suggestions on how to proceed?

Sincere thanks for the help with this.

8 Posts

April 12th, 2016 09:00

Bump.

7 Technologist

 • 

16.3K Posts

April 12th, 2016 10:00

Yes, that does mean you have a different problem. It is nice to be able to blame hardware and package the problem nice and neat, but that's not always what happens.

Have you tested the drives?

What make/model of drives are you using? This type of behavior can be normal if you are using desktop/laptop drives or even non-certified drives.

7 Technologist

 • 

16.3K Posts

April 12th, 2016 10:00

Sorry ... forgot we are talking about the 2850, so desktop/laptop drives are out.

Still, I would test the drives, as there are many drives with many many hours of use on them circulating the Internet.

7 Technologist

 • 

16.3K Posts

April 12th, 2016 22:00

You can pull the log using OMSA - managed node from the OS (although you are likely running too old a version of ESXi to run the OMSA VIB) or an OMSA live disc.

8 Posts

April 12th, 2016 22:00

theflash1932: We've got a mittful of spare drives too (143Gb SCSI). Unless I can find a log somewhere to narrow this down, I'm thinking of replacing the backplane and then rebuilding the failed RAID.

Unless anyone has a better idea.

8 Posts

April 13th, 2016 08:00

Downloading the OMSA Live .iso now. Thank you!

8 Posts

April 13th, 2016 21:00

OMSA live isn't supported by the 2850.

Swapped the backplane for our spare. Now we wait.

7 Technologist

 • 

16.3K Posts

April 13th, 2016 22:00

Yes it is, but probably just not the latest Support Live discs. Looks like the older versions (like 5.1-6.4) versions have been pulled again.

Chris or Daniel ... you guys have the pull to tell somebody to put the older OMSA live images back up?!

You could do a CentOS 5.4 (or even 6.4), then download and install OMSA manually (OMSA Live is just a CentOS live disc with OMSA and Online Diagnostics pre-installed and ready to go). Since you are just after the logs, you could probably pull the "tty" logs from any Linux live disc.

No Events found!

Top