Start a Conversation

Unsolved

This post is more than 5 years old

M

2185

July 23rd, 2018 12:00

PS5000 stuck cancelling vacate

I have an array that was being decommissioned from production and re-purposed to a test environment. I removed it from the storage pool and... waited. Then I waited. Then I realized it had stalled for whatever reason. There were no network issues and the remaining member has more then adequate storage. 

Regardless, I decided to cancel the process and approach it from a different angle. I cancelled the process only to find that it then stalled at 0% while "cancelling". I received advice from a fellow admin that I should leave it as it could take quite a bit of time. That was several months ago and I can safely assume that this isn't expected behavior.

I'm deathly scared that I'll lose data if I don't approach this properly and these arrays are running critical systems. Can anyone provide any insight or advice? I'm unsure of where to turn or how to proceed.

3 Apprentice

 • 

1.5K Posts

July 23rd, 2018 15:00

Hello,

 Was there replication going at the same time you tried to vacate the member?

 It's most likely stuck on a page waiting for it to move.  Replication can cause that.

 Or a problem on the network. Where the transfer keeps timing out.  A common cause of that is too little bandwidth between iSCSI switches.  That interswitch link (ISL) should have at least 75% of the total group bandwidth.  This can happen especially if you only have 1xGbE link ISL.

 If not those, then you need to open up a support case.  If you are out of warranty you can open a one time call for a fee.  They can review the diags for a better idea of what's happening.

 Regards,

Don

 

9 Posts

July 23rd, 2018 15:00

Thanks for responding...

We're looking into replication currently but have never used it with these models before. As for networking, we run 10G between switches so I doubt that's it.

I know this sounds funny, but I've hunted everywhere for this as an option. Is it offered after calling in or is there a special spot on the site to create a 1 time SR?

Cheers,

!k

 

 

 

3 Apprentice

 • 

1.5K Posts

July 24th, 2018 08:00

Hello,

 Re: ISL. I agree that's unlikely but still should be checked.

 Re: 1 time SR.  I don't believe you can do it via the site, you have to call Dell support.

 Regards,

Don

 

9 Posts

December 13th, 2018 12:00

And after tons of back and forth, Dell won't offer even paid support for it. :|

Does anyone have any suggestions about who I can turn to? I desperately need some guidance and the systems that run on this array are for providing 911 emergency services 24/7/365. 

3 Apprentice

 • 

1.5K Posts

December 13th, 2018 13:00

Hello, 

What happened when you called in for a one time support call?   You are basically buying support time just no parts. 

 There is a newer program you can ask about.  Dell Extended Maintenance that might help as well.  The PS5000 is long since EOL and not eligible for standard support contracts. 

  Without looking at the internal diagnostic report, there's no sure way of knowing what's preventing the vacate.

  What versions of EQL firmware are you running? 

  Regards, 

Don  

9 Posts

January 7th, 2019 09:00

I actually didn't call it in. I had my account rep look into it for me and ended up on the phone with a regional account rep and a national rep who preceded to tell me that they couldn't help and pointed me to contact an outside consulting company. I'll look into the program you mention. Thanks for the info.

Firmware is v7.0.10

Have you worked with these arrays for Dell before? I only ask as you seem more knowledgeable then your average customer.

 

 

 

3 Apprentice

 • 

1.5K Posts

January 7th, 2019 10:00

Hello, 

 Yeah they might not have been aware that you can in effect buy troubleshooting hours. How long ago did you call in? 

Re: Experience. Yeah you could say that.  I've been working for EQL/Dell for almost 15 years. I was with EQL since before Dell acquired it.  When Dell  moved this site over, my Dell userid got corrupted.  So until they fix that I'm using my old account here. 

Regards, 

Don 

9 Posts

January 11th, 2019 08:00

The last I spoke with them about this was probably 2 months ago.

Do any consulting? :Wink:

EDIT: 

I've also picked up a new secondary controller to replace the one that caused the initial issue. Its horribly out of date (v.4.3.6) so I'll need to schedule some downtime to bring it up to the same firmware version as the primary.

Is there any undocumented commands I can use to 'clear' out the group operation? I'm assuming the EQL uses its own internal DB to manage / persist information and there must be a way to access them. If so, I'm betting I can restart it with the new controller and come out alright.

3 Apprentice

 • 

1.5K Posts

January 11th, 2019 09:00

Hello, 

 No, sorry I don't do any consulting as I work for Dell. :)  

  I would still call again, and let them know there is such a service.  I'm helping on a case right now for an out-of-warranty array. 

  Even if there were undocumented commands I could not give them out. 

 If you could, just clearing it out would not likely resolve the problem.  Something is preventing a page from being processed.  There could be a hidden (orphaned) snapshot for example.

  The array actually has several DBs by the way. 

 RE: Update.  You won't be able to use the normal process to update that SD card. I.e. you can't boot from that CM and try to update the firmware.  It won't match what's on the RAIDset and won't allow you to boot up. 

So you have the SD card from the bad controller?    You should take it out of that CM and install it in the new one.  That's the normal process.  

 Regards, 

Don 

9 Posts

January 14th, 2019 07:00

Re: Consulting - My mistake. I thought you no longer worked there. I'd much rather they pay you to help us. ;)

I've migrated the SD care over and it shows as being a newer firmware then the active controller at v7.1.9. Any advice to get them matched up again? I assume failover and then 'update'?

As a software engineer, I've always wondered about the inner working on these appliances. Pretty amazing feature set for their time.

I've reached out to the rep and await a response. I would assume your name is Don Williams?

 

3 Apprentice

 • 

1.5K Posts

January 14th, 2019 09:00

Hello, 

 Re: Firmware.  Do NOT failover the controller.  It won't boot up if the SD card firmware version doesn't match what is on the array already.  Instead, go to the Group CLI and run 'update'.    Do NOT transfer a firmware kit file.  The firmware from that active controller will be copied to the passive CM downgrading it to match firmware levels.  Then the passive CM will be rebooted automatically.   This will bring them back into sync and allow for failover to occur. 

 Regards, 

Don

9 Posts

January 15th, 2019 07:00

Thanks for the additional info about Dell Services... my rep sent me a number to call.

I've never been out of sync with CM firmware versions beforehand so excuse my ignorance. Will the update cause any outage / downtime?

3 Apprentice

 • 

1.5K Posts

January 15th, 2019 10:00

Hello, 

 That's great news!   Hopefully get it figured out quickly. 

Re: Update.  Not at all.  Since it's only updating the passive CM.  Which does not have access to the drives or network ports until it becomes active.  Even the restart won't impact production, since it's technically offline anyways until the firmware matches. 

 Regards, 

Don 

 

9 Posts

January 16th, 2019 07:00

I logged in via CLI to update the passive controller and when I ran the 'update' command the following message returned immediately:

 % Error - Member cannot be updated because a vacate is already in progress.

3 Apprentice

 • 

1.5K Posts

January 16th, 2019 08:00

Hello, 

Well that's annoying!   Only other option I can think of would require an outage on that member. 

 If you could shutdown, then remove the primary SD card then make an image of it and overwrite the SD card from the secondary.  

 I would wait until you have the vacate issue resolved if at all possible.  Then run update.  But if you really wanted to get the passive back online that's what you'd have to do. 

Regards,

Don

No Events found!

Top