PhiNor1

2 Posts

4573

June 19th, 2013 03:00

Celerra NS-120 / Storage processor ready to restore

Hi Guys,

My in house EMC guy is away and I have what may be a very basic issue.

A long power outage that drained our UPS's causing a dirty shut down of the SAN, now powered back on I see the following error message under 'Storage' - 'Systems' in the GUI

"Storage processor is ready to restore"

Other than that our SAN's Status is "OK" however if I run 'nas_storage -c -a' in the CLI I see the following, I assume the root disk fail over is due to the SPA failover?

Discovering storage (may take several minutes)

Error 5017: storage health check failed

CKM00095100510 SPA is failed over

CKM00095100510 root_disk is failed over

CKM00095100510 d51 is faulted/removed

CKM00095100510 d52 is faulted/removed

I did have a failed disk '0' pre power outage which has since been replaced but doesn't to want to rebuild, which may be due to the fact SPA is failed over.

Any help would be greatly appreciated guys, I am moving servers off this SAN but need it to keep running in a healthy state for now.

Responses(7)

dynamox

2 Intern

•

20.4K Posts

0

June 19th, 2013 04:00

like you said, typically that means that LUNs have been trespassed to non-default SP but the fact that you are having issue with drive rebuilds could indicate there is something more serious, i would open a ticket with support.

l2adius

19 Posts

0

June 19th, 2013 06:00

Hey Phinor,

From those errors, I would assume that there is still a drive failure issue...

Can you provide the failed disk location? How was the drive replaced? When a drive fails they get copied over to a HS. Do you know if the Transition was completed prior to the unexpected shut down? Usually when a drive is replaced they get rebuild via hs or the raid group, you can find out the current state of the drive by issue

/nas/sbin/navicli -h SPA getdisk (bus)_(enclusure)_(disk) -state -type -rb

e.g. /nas/sbin/navicli -h SPA getdisk 1_1_2 -state -type -rb

(I would do the same to the HS location that was copied over)

then you can compare results to see the state of the current condition

As for the trespass lun,

I would double check agianst the luns that are affected and ensure there are no other hardware failure aside from that single drive then issue

/nas/sbin/navicli -h SPA getlun -trespass ( to determine if what luns are trespassed)

nas_storage -l ( to dertermine the primary backend storage from the nas prospective)

nas_storage -failback id=1 (id will be indicated from the command above)

nas_storage -c -a ( you will probably still get failure due to the drive)

I would still recommend to open up a case with EMC since the array was shutdown unexpected which you may encounter some dirty cache although EMC SPS on the celerra should give enough time for the cache to destage to disk.

Good luck

PhiNor1

2 Posts

0

June 19th, 2013 15:00

Hi guys,

Thanks for the replies,

The disk that had failed was removed and one of the HS had fully taken over while waiting for new disk to arrive.

Outputs are.

$ /nas/sbin/navicli -h SPA getlun -trespass

LOGICAL UNIT NUMBER 0

Default Owner: SP A

Current owner: SP B

$ nas_storage -l

id acl name serial_number

1 0 CKM00095100510 CKM00095100510

If I do go ahead and restore this will the servers running from this storage temporarily loose their connection to the SAN?

The Support contract on this SAN expired last month, I didn't want to renew as I am moving everything off this SAN over the next several weeks. Just my luck!

thanks again

dynamox

2 Intern

•

20.4K Posts

0

June 19th, 2013 19:00

LUN 0 will be trespassed back to default SP, should be transparent to the server/datamover

l2adius

19 Posts

0

June 21st, 2013 05:00

dont forget to mark your status to answered if resolved and any post that is helpful to you. So others can reference to it if they are in the same situation Happy Friday!

l2adius

19 Posts

0

June 21st, 2013 05:00

Lun 0 is the control lun for the nas side, you should able to trespassed back to the default sp with little performance(pending on IO) or no impact at all.

It should be transparent to the host like dynamox mentioned above.

If I may recommend, (safest way)

I would replace your disk first, using USM which you can probably downlaod from support.emc.com
Wait till it fully transitioned, once complete. Make sure there are no other hw failure
Trespass the lun using nas_storage -f id=1 (wait a few minute untile prompt comes back saying "done")
then run nas_checkup to ensure the entire array is clean.

Cheers

Aswinkumar_bafd83

46 Posts

0

June 23rd, 2013 21:00

LUN 0 will be trespassed back to default SP, should be transparent to the server/datamover

login to the Unispere(GUI),double click/rt-click properties on the connected storage and click on the restore SP.

View All

No Events found!