Unsolved
This post is more than 5 years old
10 Posts
0
4007
June 24th, 2013 19:00
Datamovers and Fail Back
Hi All,
I have had a probelm for a week now that I just cannot resolve. I have a VNX 5300 and the Primary datamover did a failover to the Secondary. The issue causing the failover has been resolved, but now I need to "fail back" to the Primary.
How do I do this? The pressing issue for me is that no new folders can be created via our automated tool onto our storage. I am doing this manually at the moment.
I do not have AFM installed.
ANY suggestions would be greatly appreciated.
Thanks
darrell
0 events found
No Events found!


cincystorage
2 Intern
•
467 Posts
0
June 24th, 2013 20:00
First i'd do a "nas_server -info server_2" to verify it has a standby server, which it should...
To restore server_2 you simply run the following:
server_standby server_2 -restore mover
That will start the process which makes the Primary active again.. keep in mind the impact this may have to connected users, depending on your environment..
dynamox
9 Legend
•
20.4K Posts
•
87.4K Points
1
June 24th, 2013 20:00
server_standby server_2 -restore mover
christopher_ime
4 Operator
•
2K Posts
0
June 24th, 2013 20:00
darrellx,
You have piqued my curiosity. How did failing over to the standby data mover prevent the tool from creating folders? How exactly is the tool scripted to reference the data mover? As you know, when the standby data mover takes over, the entire "personality", filesystem (mount) configuration, VDM/CIFS server, etc. of the original is transferred over. The standby is even renamed with the original server_# so that even the system's own internal scripts that rely on the same don't break, and of course the physical and logical network configuration is identical. (well.. at least on the data mover itself, the assumption is that the switch configuration is identical as is required and had been tested before putting into production, is that possibly the issue?)
I can understand backups no longer working (temporarily) in a 2-way NDMP setup but even that can be rectified if the standby data mover is expected to be active for an extended period of time; however, I'm trying to understand how your environment is no longer working properly when it is failed over. I think that should be reviewed so it doesn't happen again in the future, if possible. While I won't agree or disagree that failing back should be a priority (the data movers are after all hardware identical and should not be considered as inferior to its peer), you shouldn't necessarily feel pressure to fail-back ASAP unless you have more than the 2 data movers and it is a standby protecting other peers.
christopher_ime
4 Operator
•
2K Posts
0
June 24th, 2013 21:00
Even if the script used server_2, the standby when it takes over is reassigned that name; then as you've probably observed, the failed over data mover is reassigned: server_2.faulted.server_3. The system has its own internal scripts that would break if nothing responded to the name of the original active data mover.
Can you run:
nas_server -l
Let us ignore the details (-info ) for now. I am most interested in:
1) type (0=nas, 4=standby)
2) slot (physical location that never changes regardless of state)
3) state (0=enabled, 2=failed over)
4) name (current name)
darrellx
10 Posts
0
June 24th, 2013 21:00
UNCLASSIFIED
Christopher
Well, this is a long story. We had a guy who wrote a script to automate our folder creation on our previous storage - MD1000 and MD1200 devices.
When we made the move to EMC a year ago, he re-wrote his script for the new hardware. I believe he has done some hard-coding of device names and referenced the server_2 by name. He is not here at the moment, and I have tried to communicate via text messaging. So I am a little confused as to exactly why, but I think this is it in a nutshell.
So when we run the script, it appears that the folder (sub-folder) structure is created, and the "okay" message is displayed. But when I look on the volume, it is not there.
When he returns (the script writer), I will be asking about this.
I am not that nervous about the fail-over. I suspect if something happened to server_3 (DataMover 3), it would just fail over to server_2 (DataMover2). It is really this folder creation issue that is making life hard.
Having read what you have posted, I do hope this isn't a coincidental issue and the problem remains once I do the fail back routine.
Thanks
Darrell
DARRELL BETTS
TEAM LEADER
FORENSIC AND DATA CENTRES
Tel +61(0) 7 52221222 Ext 172428 Mob +61(0) 419484720
www.afp.gov.au
UNCLASSIFIED
1 Attachment
image002.jpg
christopher_ime
4 Operator
•
2K Posts
0
June 24th, 2013 21:00
Assuming you have just the two data movers, the nas_server -info server_2 output you provided at Mark's request, suggests everything is normal believe it or not. I'll wait though for the output of nas_server -l.
cincystorage
2 Intern
•
467 Posts
0
June 24th, 2013 21:00
You do not need to get the users to logoff - however the level of interruption they receive will vary from none to a lot - depending on a lot of things.. In theory, they should lose connectivity for a brief moment and it should restore itself.. in reality it might not work quite so perfectly..
But yes, it is that simple.. Some other useful commands are:
nas_server -l
/nas/sbin/getreason
christopher_ime
4 Operator
•
2K Posts
0
June 24th, 2013 21:00
Scrolling back to the top, if you have a VNX5300 as you mentioned already, then you could only have 2 data movers at most so I answered my own question there. Go ahead though and still run nas_server -l for me though.
darrellx
10 Posts
0
June 24th, 2013 21:00
UNOFFICIAL
Mark
Thanks very much for the quick response - I should have done this last week.
Anyway, I have run the command -
nas_server -info server_2
I got the response -
id = 1
name = server_2
acl = 0
type = nas
slot = 2
member_of =
standby = server_3, policy=auto
status :
defined = enabled
actual = online, active
So now, once I get the users to logoff, I can run the command -
server_standby server_2 -restore mover
???
Seem too easy.
Darrell
DARRELL BETTS
TEAM LEADER
FORENSIC AND DATA CENTRES
Tel +61(0) 7 52221222 Ext 172428 Mob +61(0) 419484720
www.afp.gov.au
UNOFFICIAL
1 Attachment
image002.jpg
christopher_ime
4 Operator
•
2K Posts
0
June 25th, 2013 00:00
So just to clarify, in its failed over state, the system would *not* failback automatically; that is always a manual process via the command that Mark had provided.
For instance, using the actual system naming:
1) server_2/slot 2 (server_3/slot 3 is its standby)
2) server_2/slot 2 itself has a fault that triggers an automatic failover (configurable)
3) System renames data movers as follows:
a) server_2: renamed server_2.faulted.server_3 (slot 2)
b) server_3: renamed server_2 (slot 3)
Then let's say you resolve the issue with the original server_2 (slot 2/currently named: server_2.faulted.server_3); however, while it is a very unlucky but possible scenario, original server_3 (slot 3/currently name: server_2) now faults before you have a chance to failback.
IMPORTANT: The system would not failback automatically for you. It is a manual process and unlike the failover process, can't be configured to be done automatically by the system.
So you should consider failing back when you have the chance. I'm thinking though if it weren't for the folder issue (but as discussed, I suspect that is something completely unrelated) and since you only have the two data movers and this standby isn't protecting more than one, I would assume you wouldn't be as eager to failback, since as noted above, you do have to plan for the brief downtime.
EDIT:
I just wanted quickly note that there is a whitepaper about standby data movers: "Configuring Standbys on VNX"
https://mydocs.emc.com/VNXDocs/Standbys.pdf
You can also search for it by name on support.emc.com, but looking to mix it up a bit and take you instead to the "My Documents" section (whitepaper available via the "Related documents" section).
https://mydocs.emc.com/VNX/requestMyDoc.jsp
darrellx
10 Posts
0
June 25th, 2013 13:00
UNCLASSIFIED
Sorry, I meant to include this.
Component Name: Control Station 0
Type: Control Station
Status: OK
Variant:
Version: N/A
Serial Number: N/A
History: N/A
Component Name: Control Station 1
Type: Control Station
Status: OK
Variant:
Version: 7.0.52-1
Serial Number: FCN00122200001
History: CPU_VENDOR_ID:GenuineIntel
CPU_FAMILY:6
CPU_MODEL_NUMBER:22
CPU_MODEL_NAME:Intel(R) Celeron(R) CPU 440 @ 2.00GHz
CPU_SPEED_MHZ:2000 MHz
CPU_CACHE_SIZE:512 KB
DARRELL BETTS
TEAM LEADER
FORENSIC AND DATA CENTRES
Tel +61(0) 7 52221222 Ext 172428 Mob +61(0) 419484720
www.afp.gov.au
UNCLASSIFIED
From: Betts, Darrell
Sent: Wednesday, 26 June 2013 6:27 AM
To: 'jive-991801997-a3gr-2-fvch@emc-ecn.hosted.jivesoftware.com'
Subject: RE: - Datamovers and Fail Back
UNCLASSIFIED
Mark
Sorry to keep bothering you, but I think I have an issue.
I ran the command -
Server_standby server_2 -restore mover
And I got the response -
server_2 :
Error 4004: server_2 : standby is not available, is active
So I assume that somehow server_2 has failed back and server_3 is now my secondary again. I am a bit confused because I don't understand how that could happen, but I will accept it for the minute.
My concern is that normally I log onto the Control Stations via the browser-based Unisphere and go to IP 10.66.68.90 - which is Control Station 0. Since the fail over, I have not been able to get to this IP and have been using 10.66.68.91. I thought the fail over must have had something to do with this, but maybe not.
Any suggestions?
Thanks
Darrell
DARRELL BETTS
TEAM LEADER
FORENSIC AND DATA CENTRES
Tel +61(0) 7 52221222 Ext 172428 Mob +61(0) 419484720
www.afp.gov.au
UNCLASSIFIED
1 Attachment
image002.jpg
dynamox
9 Legend
•
20.4K Posts
•
87.4K Points
0
June 25th, 2013 13:00
can you post output from nas_server -l
darrellx
10 Posts
0
June 25th, 2013 13:00
UNCLASSIFIED
Mark
Sorry to keep bothering you, but I think I have an issue.
I ran the command -
Server_standby server_2 -restore mover
And I got the response -
server_2 :
Error 4004: server_2 : standby is not available, is active
So I assume that somehow server_2 has failed back and server_3 is now my secondary again. I am a bit confused because I don't understand how that could happen, but I will accept it for the minute.
My concern is that normally I log onto the Control Stations via the browser-based Unisphere and go to IP 10.66.68.90 - which is Control Station 0. Since the fail over, I have not been able to get to this IP and have been using 10.66.68.91. I thought the fail over must have had something to do with this, but maybe not.
Any suggestions?
Thanks
Darrell
DARRELL BETTS
TEAM LEADER
FORENSIC AND DATA CENTRES
Tel +61(0) 7 52221222 Ext 172428 Mob +61(0) 419484720
www.afp.gov.au
UNCLASSIFIED
1 Attachment
image002.jpg
dynamox
9 Legend
•
20.4K Posts
•
87.4K Points
0
June 25th, 2013 13:00
they are back to normal state so they were either failed over manually or they panic again.
darrellx
10 Posts
0
June 25th, 2013 13:00
UNCLASSIFIED
Christopher
I have run the command -
nas_server -1
And I get the response -
Id type acl slot groupID state name
1 1 0 2 0 server_2
2 4 0 3 0 server_3
As I have already replied to Mark -
I ran the command -
Server_standby server_2 -restore mover
And I got the response -
server_2 :
Error 4004: server_2 : standby is not available, is active
So I assume that somehow server_2 has failed back and server_3 is now my secondary again.
Thanks
Darrell
DARRELL BETTS
TEAM LEADER
FORENSIC AND DATA CENTRES
Tel +61(0) 7 52221222 Ext 172428 Mob +61(0) 419484720
www.afp.gov.au
UNCLASSIFIED
1 Attachment
image003.jpg