We have a legacy PowerEdge 2850 with what appears to be a two-channel PERC in it.
The boot drives for this ESXi host are in slots 0 and 1 and work great. However, we had a failure on a pair of 143Gb SCSI drives on the second channel. Being largely clueless about replacing the failed disk, we fumbled around for a bit and thought we got it up and running, but now the volume seems to intermittently disappear when it's under load.
I now think that this is because the SCSI chain isn't properly terminated. After days of looking at technical docs, I'm still unable to determine if the drives are terminated on the drive unit, or if the termination is on the actual drive cables (I suspect the latter).
Could someone please point me to some docs or toss me a clue that tells me:
1. How the termination is made, whether on the cable or the disk.
2. If it's on the cable, how to get to the drive cables to check where they are terminated. If it's on the disk, do I have to take the drive out of the sled to find the jumper?
Sincere thanks in advance.
I just watched a take-apart video on Youtube. It appears that in fact, there are no "drive cables". The hot-swap disks land on the backplane board and the backplane board mates with the logic board of the system.
So, are there jumpers or DIP switches on the backplane that need setting?
No. With the 2850 backplane, NO termination is required. If you suspect a termination issue, you'll need to replace the backplane.
Check the controller log for punctures - double read faults across multiple disks. The array may have been damaged during the "fumbling" to replace the failed drive. To replace a failed drive, simply remove the drive "HOT" - do NOT power down to replace - wait 60 seconds, insert replacement "HOT", wait 60 seconds for rebuild to start. If it doesn't start, then you'll need to assign it as a hot-spare or "rebuild" it.
If no punctures, make sure all the system firmware is up to date (BIOS first, then ESM/BMC, then backplane, PERC, HDD, etc.).
Thank you for the confirmation, theflash1932. But if that's correct, we're looking at a different issue.
In our ignorance, we simply rebuilt the RAID 1. We had a good backup of the only VM on it, the ESXi RAID 1 was intact, so we simply nuked the RAID config and started over.
So, the boot RAID 1 has always been fine. However, twice now we've successfully rebuilt the second RAID 1 on the second channel. The array is fine for a few days, and then, and I think it's under load, the array fails. The second time it happened, we swapped *both* disks, rebuilt the array again and this weekend, it seems to have failed once more.
I don't know how to check the controller log - can this be done from the PERC Utility at boot time?
We *do* have a second 2850 as a parts spare, so I believe we have a spare backplane we can swap in.
Suggestions on how to proceed?
Sincere thanks for the help with this.
Yes, that does mean you have a different problem. It is nice to be able to blame hardware and package the problem nice and neat, but that's not always what happens.
Have you tested the drives?
What make/model of drives are you using? This type of behavior can be normal if you are using desktop/laptop drives or even non-certified drives.
Sorry ... forgot we are talking about the 2850, so desktop/laptop drives are out.
Still, I would test the drives, as there are many drives with many many hours of use on them circulating the Internet.
theflash1932: We've got a mittful of spare drives too (143Gb SCSI). Unless I can find a log somewhere to narrow this down, I'm thinking of replacing the backplane and then rebuilding the failed RAID.
Unless anyone has a better idea.