Start a Conversation

Unsolved

A

6 Posts

983

March 17th, 2021 18:00

Swapping drives in identical servers (hot swap from one to replace predicated failure in another)

I have two identical PowerEdge 420 servers. I have three drives in each, running Raid 1, so two active drives and one hot swap. One is a test server and one is live server. Note they are out of warranty.

I have a predicted failure on a drive on the live server. I know I can take the drive "offline" in Dell OSMA and the hot swap will rebuild, then I can replace the drive that is predicated failure with a new drive.

I have order a new drive from Dell that I was told is a valid replacement for my current drive, but my question is if it doesn't work for some reason, can I take the hot swap drive from my test server and put that in my live server? Will it not recognize it be cause it was technically a part of another raid?

The hot swap drive in the test server has been written to in the past, I've tested taking drives offline to make sure setup for hot swap worked. But then I put the drive back as a hot swap status afterwards.

Thanks in advance for any help!

Moderator

 • 

3.1K Posts

March 18th, 2021 00:00

Hi,

 

You are right, it will not recognize it as part of another server's configuration. I would say, you may need to clear the drive's configuration. I have seen such issue before where the metadata of the RAID controller stored in the drive. From what I experience, you can try a few things. Use the CLEAR function on the drive in lifecycle controller. If there is a foreign config when you installed the drive into the live server, clear it. Check if it does able to rebuild the RAID.  


PS: Remember not to initialize the data disk. 

 

Let me know if it works.

Moderator

 • 

8.4K Posts

March 18th, 2021 12:00

Aprilhyd,

 

If the rebuild doesn't automatically start shortly after inserting the replacement then you will need to assign the drive as a hot spare and it will then proceed to rebuild.

 

Let me know how it goes, or if you have any other questions.

 

 

6 Posts

March 18th, 2021 12:00

Ok thanks for the info, just wondered if that's an option.  I'm hoping the the drive I just bought will work.  They are dell certified, so when I put this brand new drive in will my system automatically see it and start building it? Or is there something I will need to do in Dell OMSA like bring it online or something?

6 Posts

March 18th, 2021 13:00

Ok thank you, I will be swapping this out tomorrow evening after work hours I will post any more questions or hopefully my successful results!

6 Posts

March 19th, 2021 12:00

I'm putting new drive in and it still shows the predicted failure status.  Sees a disk but doesn't rebuild, doesn't see the disk as physical disk under the virtual disk menu so I can't add as a hot swap.  The only option I have in the drop down menu is to "convert to raid capable' and when I do that it still just sits.

I still see a message under Virtual Disks  - Virtual disk has bad blocks.  Could this be why its not seeing anything new?

 

 

2.9K Posts

March 19th, 2021 12:00

Hello,

 

Is this the H310 controller? If so, would you mind installing your replacement drive, rebooting, then converting it to RAID capable? I believe this should help correct the issue, as it will force the controller to reacquire the drive on reboot, since the H310 doesn't have cache to preserve the config. If it seems like the behavior remains the same, try one final reboot.

 

I'd also ask if you could share the firmware version on the controller. There may be a relevant update.

 

EDIT: Sorry, I missed your message about the bad virtual blocks. Yes, those can create problems. Bad blocks can potentially lead to faults in an array.

6 Posts

March 19th, 2021 12:00

Yes is it the H310 controller.    I have PTSD when it comes to rebooting a server with a drive not in place, is that a huge risk?    Just concerned its running two live VM's....    Yes I have backups but not a fun task full recovery from backup!!

The firmware is definitely out of date but I will have to look at the exact version.  The server is marked to be replaced within the next three months so not much has been updated for quite a while from what I understand.

2.9K Posts

March 19th, 2021 13:00

This is a behavior I've seen with the H310, particularly with older firmware. The process I'm describing typically does resolve the issue. Regarding Rebooting the host though, I can really only speak to what I've experienced. My experience has been that rebooting a server, virtual host or not, hasn't been an issue. If you mean specifically with a virtual disk issue, then I'd say no. Rebooting won't do any more damage to the array than what is already there, because the controller has no battery to preserve volatile storage (the H310 also doesn't have controller cache, anyway) and, more importantly, the OS already can't access the data. The RAID configuration is stored on the drives and has to be read each boot. Considering the data is already inaccessible, I would feel that you're not exposing yourself to any additional data problems than what you have already.

 

I did specify data problems, because if in the case that the array is punctured, you can see accelerated hard drive failure. I'll provide a link describing punctures below.

 

https://dell.to/2QmqQGy.

 

Regarding your concern about rebooting, above really only addresses the storage specifically. I can't promise that you might not run into issues for some other reason, or even with the storage with a reason I didn't account for. I can say that I would be surprised and I wouldn't expect this to make anything worse, but this is general information over a community thread. If your server is out of warranty and you want to take the extra bit of caution, I believe there are instance-based paid support options, as opposed to time based contracts. If that is something you think you might be interested in, I can help you get in touch with Sales, but we'll want to take the conversation to PMs at that point. 

 

Sorry, if that was a bit wordy. I think I touched on your concerns, but please let me know if I missed anything.

6 Posts

March 19th, 2021 14:00

Its not wordy at all I appreciate any info.  I always learn more than I even intended reading these posts!!

I have already contacted Dell regarding an extended warranty, I should be able to call Dell support on Monday with this issue.

I'm becoming more convinced its a firmware issue, if you say you have seen the H310 have this problem before.  I do not have extensive experience updating firmware in esxi environment so I will wait to do that.

On an very interesting note, while I was trying all this with my live server this afternoon (I have a live and test server bought at same time with exact same build) my test server started with the same issue on the same exact drive!!  So now I'm even more curious that its firmware since they have the exact same version.

I will post more when I figure it out.   Thanks for the responses and the information.

 

No Events found!

Top