Start a Conversation

Unsolved

This post is more than 5 years old

2978

August 25th, 2015 17:00

celerra file system creation failed and now allocated as raw

Hello,

I tried to create a new file system like I've done many times in the past but this time when I clicked apply/ok it came back with an error and now I don't see the new file system created but the space is allocated in the storage pool.  But it did look like it created the file system but is a 'raw' file system because it see this message:

This is a raw file system.

This raw file system is used by replication and other such services. It cannot be changed or used for any other purpose without affecting the service which it was created.

Errors:

Command failed: volume delete 58236.

Logical Volume 58236 not found.

Command failed: volume delete 58673.

Logical Volume 58673 not found.

Command failed: volume delete 58674.

Logical Volume 58674 not found.

Command failed: volume disk 58236 num_paths=2 c0t2I12 c16t2I12 size=1465753 disk_id=31

Basic Volume 58236 not created, can not read diskmark.

Warnings:

Device c16t2I12's Serial Number has changed.

Device c0t2I12's Serial Number has changed.

I just need to know how do I go about deleting this raw file system and retry creating a file system again?

Any suggestion is appreciated and I currently have 6.3TB of space allocated that I can't use or get at.

8.6K Posts

August 26th, 2015 05:00

Please open a service request with support

There seems to be something inconsistent in you’re your config – esp. the message about not can not read diskmark

Try running a nas_storage –check

89 Posts

August 27th, 2015 09:00

I would normally open a SR but this is an old CX3 which is EOL and out of support.

I ran the nas_storage -check -all command and it came back with 'done' and nothing else so I'm assuming it didn't find anything wrong.

Any other suggestion?

This failed file system is on a storage pool consisting of 14 FC drives across 2 RG and 5 LUNs. The good thing is that there is nothing else on these LUNs and RGs.  So I'm wondering if I can just unbind the LUNs from the Clariion side which should remove them from the Celerra NAS side and then rebind the LUNs again.  You think it will work?

8.6K Posts

August 28th, 2015 04:00

I dont think it will help – if you just unbind the LUNs the config on the NAS side is still there referencing now missing objects and getting you more errors.

674 Posts

August 28th, 2015 07:00

fl wrote:

..

So I'm wondering if I can just unbind the LUNs from the Clariion side which should remove them from the Celerra NAS side and then rebind the LUNs again.  You think it will work?

No this will not work. You need to start the cleanup on the NAS.

Why has the serioal number changed? looks strange

89 Posts

August 28th, 2015 08:00

I don't know why there were warning message about the serial number changed because I verified and the serial number is still the same so it didn't change.

I can see the file system listed when I run the command "nas_fs -list".

id     inuse     type     acl     volume     name     server

23022     n     5     0     58674     ns20_dr_fs8

I can also see the file system when I run the command "server_mountpoint server_2 -list"

/ns20_dr_fs8

If I browse the c$ root of the nas (ie. \\ns20\c$) I do not see the file system there like all the other file systems I currently have listed there.

The command "server_mount server_2" also does not list the file system either so I tried to mount it with the command "server_mount server_2 ns20_dr_fs8 /ns20_dr_fs8" and I get the following error.

Error 4020: server_2 : failed to complete command

In the event log I still see the following errors right after the failed server_mount command.

command failed: volume delete 58236

logical volume 58236 not found

command failed: volume delete 58673

logical volume 58673 not found

command failed: volume delete 58674

logical volume 58674 not found

command failed: volume disk 58236 num_paths=2 c0t2I12 c16t2I12 size=1465753 disk_id=31

basic volume 58236 not created, can not read diskmark

I think the problem has to do something with volume 58236 which I don't see listed in the storage volumes view in celerra manager.  In celerra manager the volume 58674 is shown and used by the file system in issue ns20_dr_fs8.

To me it seems like it started the creation process and then failed part way through but was then unable to clean-up the failed creation processes and now the space is like in limbo or orphaned.

Is there a command to check or clean-up failed processes like this?

Do you think rebooting the SPs would do anything?

674 Posts

August 30th, 2015 22:00

the diskmark of  c(0|16)t2l12 which contains the serial number of the celerra needs to be fixed.

89 Posts

August 31st, 2015 19:00

Can you shed some light on how to fix the diskmark and the serial number?

674 Posts

September 1st, 2015 06:00

this type of cleanup should be done by support

89 Posts

September 1st, 2015 21:00

I came to the community for help because the problem is on an end-of-life cx3 that EMC no longer support, therefore I can not engage EMC support for help.  I guess I'm just SOL on this which just gives me another reason to get rid of EMC.  Only reason I kept this is because it still work and I'm using the storage for testing only purposes but if I can't fix it myself then it's time take the shotgun to it.

40 Posts

September 1st, 2015 21:00

Have you tried to perform a rescan, server_devconfig server_2 -create -scsci -disk now  ?

September 2nd, 2015 00:00

I'm always amazed by how long these arrays keep plugging along.  You're at least a year past EOL support.  That CX3 is probably at least 7 years old.  You might get this issue resolved eventually -- on the other hand, a CX3 DPE on ebay would probably be cheaper than hiring someone who can fix it.

Better yet, do what I did and turn your old storage into a Kegerator...much better than taking a shotgun to it!

symmerator-2.jpg

89 Posts

September 2nd, 2015 08:00

I did a server_devconfig server_2 -list -scsi -disks and I can see all the disk including the c16t2l12 one which is generating the diskmark error. 

d31     c0t2l12     APM....2267     002C

d31     c16t2l12     APM...2267     002C

from what I see on the list the serial number is the same for all the disk.

Like I said it looks like the file system creation process failed and stop somewhere between allocating the space on disk and marking it as allocated and the step to create/present the file system.

If someone can give me details on each of the steps it goes though when you tell the celerra to create a new file system then maybe I can isolate where it failed and focus my troubleshooting efforts.

Thinking back I remember that this shelve of 15K FC drives had a bunch of drives failures (I think 5-6 in total) all within 24 hours.  Luckily I had nothing important on the disks at the time but I was able to move and delete all file systems that were on those disks.  But it was stuck in a rebuilt/transitioning state and I had to force delete the luns and RGs from the clariion side and recreate them after I replaced all the failed disks.  I have not created any new file systems on these new luns and RGs which are part of the same celerra storage pool until now.  I don't know if this contributed to the problem I'm having now as my other storage pools and disks are working fine and I can delete and create file systems in those pools just fine.

I'm thinking if there is a way to delete the raw allocation for this file system and then I can try again but I can't figure out how to do that.

September 14th, 2015 02:00

Firstly,  yes you have to start clean up from Celerra.

And i wonder if "14 FC drives across 2 RG and 5 LUNs." if this configuration is supported by DART. I guess you may have to create 4+1 R5 RG sets within CX and then present LUNs to DMs.

To clean up, you should get help from nas_fs and nas_disk commands.

Good Luck!
Sameer

674 Posts

September 14th, 2015 06:00

fl wrote:

I did a server_devconfig server_2 -list -scsi -disks and I can see all the disk including the c16t2l12 one which is generating the diskmark error.

d31     c0t2l12     APM....2267     002C

d31     c16t2l12     APM...2267     002C

from what I see on the list the serial number is the same for all the disk.

...

instead of -list (which shows the luns the last time a -create was done), you should do a -probe, then you will get the current luns

89 Posts

September 23rd, 2015 15:00

Ok after working on this off and on I finally resolved the problem.

These were the steps I used.

1) delete any file systems on the affected LUNs/volumes/disks (I didn't have any because I had already deleted them beforehand).

2) remove LUNs from the storage group on the Clariion side.

3) unbind the LUNs from the Clariion side.

4) deleted the LUNs from the Clariion side.

5) deleted the disks with the nas_disk -d d# command on the Celerra side.

6) ran nas_storage -c -a to make sure no errors.

7) ran sever_devconfig server_2 -create -scsi -all and server_devconfig server_3 -create -scsi -all to update the data movers.

8) ran nas_disk -list and nas_storage -c -a to verify disk is deleted and no errors.

9) recreated the LUNs on the Clariion side.

10) bind and added the LUNs to the Celerra storage group on the Clariion side.

11) ran server_devconfig server_2 -create -scsi -all and server_devconfig server_3 -create -scsi -all to update the data movers.

12) create new file system and everything is happy again.

Thanks all for your suggestions and thanks for Google where I found bits and pieces that I used.

No Events found!

Top