Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

9270

June 8th, 2013 21:00

How do you reformat a boot drive back to factory defaults?

How do you reformat a boot drive back to factory defaults when it has been used in one node, to be used as a replacement in another node?

34 Posts

June 11th, 2013 08:00

Hello Tanya,

We appreciate your ingenuity. We strongly discourage any plan or attempt to format the boot disk and add it to another node.

The best course of action would be to allow the system’s hardware and software preservation to run as designed.

Best practices for this situation: If the node is in a state that is less than optimal, we would recommend initiating a smartfail on the affected node. This will re-protect the data and allow the cluster to resume its intended operation. We would first want to confirm that the cluster has more than four nodes of the same type and that sufficient capacity is available to successfully smartfail out the node.

The affected node can be assessed at a later date and brought back up to par once is out of theater or when replacement parts can be made available or spares obtained.

Any attempts to circumvent the system’s protection mechanisms will end poorly. The connection adapter to the boot drive is a specialized connection. As you’d mentioned the boot drive once out of the original node is unusable for any other means.

Pat

1 Message

June 10th, 2013 07:00

Boot drives are a part that is not recommended to be replaced by the customer. The reason for that is,  that in some situations if a good drive is pulled by accident, it could result in a potential Data Loss scenario. Please feel free to contact me via email to dicuss in detail.

Isilon Technical Support

5 Practitioner

 • 

274.2K Posts

June 10th, 2013 08:00

TanyaLB,

Travis is a technical support engineer and he can help you if you private message him. We just want to make sure things are done right because mishandling this question could result in issues. You can also email me and I will put you in touch with him.

Cordially,

Ron Steed

Isilon Technical Support Manager

ron.steed@isilon.com

7 Posts

June 10th, 2013 21:00

I realize that.  However, there is a special circumstance with the NGA (BAE Systems) that has a cluster in the middle of a war zone.  They are authorized to do all parts replacements, and keep nodes on site strictly for pulling parts.  When these nodes are delivered, they are powered on to determine that they are working.  Consequently, the boot drive gets partitioned rendering them unusable unless they are formatted back to factory default.

I was advised that if they have a PC with a sata connection they can insert the boot drive and run a regular format.  Is this accurate?

1.2K Posts

June 11th, 2013 10:00

Tanya and Pat, not sure wether I got this right, but is the failed drive a boot drive or a normal drive?

My understanding so far is, you seek to replace a normal drive by a (now) root drive.

If this is correct, my question would go to Pat: Wouldn't Tanya be better off with just having that single

drive (smart)failed and the cluster reprotected rather than having a full node smartfailed and removed?

Peter

34 Posts

June 11th, 2013 11:00

Hello Peter,

Great questions, thank you for asking for clarification.

The recommendation previously communicated was specific to critical event ids regarding actual boot drives as discussed in article id emc14003516, “How to identify if a boot flash drive has failed on 108NL, NL400, S200, X200, or X400 nodes.”

Event ID: 400120001 – Only one boot disk

Event ID: 100010040 - Device root0, provider ad4p4 disconnected. Boot mirror is critical.

Event ID: 100010040 - Device var0, provider ad4p6 disconnected. Boot mirror is critical.

These drives, if re-purposed from one node to another node would be disastrous despite any attempts to format prior.

If this scenario were solely for a hard drive or SSD drive, I’d definitely concur with the recommendation to smartfail the drive. Article Id emc14002894 outlines how to format a standard drive. This process won’t work on a boot drive.

Emc14000713 – How to smartfail a drive using the command line

Emc14002894 – New SSD drive will not add to node and shows state in isi devices.

Pat

1.2K Posts

June 12th, 2013 07:00

Thanks a lot, Pat -- and Good Luck to Tanya!

3 Posts

October 17th, 2013 14:00

Late to the party but same question as the original question has not been answered:

Is there a way to restore/recover both 8Gig boot "drives" to the factory or shipped state (on X200 node) ?

Let's say that a rogue dd sourced from /dev/zero overwrote partition tables on both of the boot drives (or worse, zeroed all 8Gig of each drive).

Re-imaging from 7.0.1.10 (as an example) doesn't work, dies in the middle complaining of missing files and manufacturing info.  Running isi_update_cto looks like it is collecting data but in the end doesn't help.

So, is there a way to reformat/reinit both of the flash drives?  especially considering that one has an extra 2Gig partition so they are not exactly identical.

Also, is there any special in the mfg partition that is directly tied to any physical components of a particular box?

Would isi_reformat_node --factory help in above case?

thanks

Michael

3 Posts

October 18th, 2013 00:00

Thanks for getting back to me.  Here is more info.  I recovered the node with the erased flash boot drives, with a caveat.

I booted Ubuntu on another node and copied both 8G images to separate files.  Then I booted Ubuntu on the erased node and copied the saved images to respective flash drives.  That fixed it, I was able to re-image the 3rd node and the cluster is fully operational.

The question remains - are there any fields in the CTO info (mfg partition) that HAVE to be different between all 3 nodes?  So far I haven't detected any problems with any of the cluster operations so if there is no dependency on some UNIQUE CTO info then I'll just leave it alone..

Another item, how this whole thing started.  The cluster originally had OneFS 7.0.1.6 (on all 3 nodes).  Without shutting down the cluster (as I probably should have) I started re-imaging to 7.0.1.10.  First two nodes re-imaged with no problems (other than having to use different cluster name but using the same IP config info).  Second node had no problem joining the new cluster.  But when I tried to re-image the 3rd node the process failed with an error that it could not start/mount var1 (or I guess enable that mirror).  Several retries resulted in the same error.  That's when I decided to zero flash partition tables thinking that the reimage process simply re-initializes all of the boot flash - didn't know about the mfg/cto info

Hope this info helps anyone encountering the same problem during upgrades.

October 18th, 2013 00:00

You will need to engage Isilon engineering as you had wiped the CTO information.  It would not be a part of the generic image file that you download and boot from USB.  As for resolving it while in the field, (don't quote me) but as CTO nodes, unless you have nodes exactly identical to this one running currently where the CTO information can be retrieved, they may need to be returned to Isilon to be reimaged.  Either way though, it is an undocumented process and requires engineering assistance.  Again follow-up with support and explain to them your situation: "CTO nodes with a complete erasure of the boot drives that you are trying to reimage".

In fact, our own data erasure services specifically note in the procedure if the nodes are CTO and will be repurposed/reimaged to not wipe the boot drives.  On the other hand, if they are being erased because they are being returned to EMC, then there isn't any concern doing so.

November 30th, 2020 08:00

Hello mishka991, I encounter the same problem on the third X200 node of my cluster. I tried your procedure but nothing has done, impossible to reimage it. Which tool did you use to clone the boot disk? (diskdump?) Thanks

Moderator

 • 

7.1K Posts

November 30th, 2020 13:00

Hello Olivier CRISTIANI,

Did you have an SP fail or was it just a drive that failed?  When it comes to reimage it is best to contact support for assistance as doing it incorrectly can lead to data loss.

December 2nd, 2020 02:00

Hello,
In a cluster of 3 nodes X200 12TB CTO, we have 2 nodes which no longer boot. (FreeBSD error @ boot : Failed to find bootable partition)
I've booted with a linux rescue USB stick and see that the 12 partitions are still present on the 2 boot cards. If you try to mount partitions boot, mfg, keystore, root0 or root1, all the files are present ???
I have the impression that the boot loader has been corrupted, perhaps due to a power failure?
I've tried to reimage to OneFS 8.2.2.0 but could not, I got an error message saying '' could not find serial number '' We did not renew EMC support on these machines because production was migrated to a new A200 cluster. We just wanted to recycle this old X200 cluster to store non-critical archives.

Thanks

Moderator

 • 

8.7K Posts

December 2nd, 2020 10:00

You may want to try and older version of the OS, before 8.1. Reimaging might also wipe all of the data.

3 Apprentice

 • 

592 Posts

December 2nd, 2020 14:00

@Olivier CRISTIANI,

X400 will not take OneFS 8.2.2.0 code, it can only take OneFS 8.1.x code as Josh mentioned

No Events found!

Top