Unsolved
This post is more than 5 years old
174 Posts
0
3392
VNX datamover mock failover
Hi All,
We have EMC VNX5500 (Unified) Storage and we are planning to upgrade both FLARE and DART code. I knew that, FLARE is NDU.
We have provisioned two NFS shares (500GB, 2TB). 500GB NFS share mounted as "/kdump" partition over 100 more Linux Servers.
2TB NFS share provisioned to storage image/files.
What will happen to NFS share access, during datamover failover while DART upgrade?
I want to simulate the datamover failover test before we go for DART upgrade. how can i achieve the same?
Also,share the best practices, to upgrade DART without NFS downtime. Please note that, no CIFS shares configured in this box.
Proposed upgrade plan:
FLARE: 05.32.000.5.201 to 05.32.000.5.207
DART: 7.1.65.8 to 7.1.72.1
Regards,
Dhakshinamoorthy Balasubramanian
dynamox
2 Intern
2 Intern
•
20.4K Posts
0
June 26th, 2014 18:00
NFS clients are stateless and handle datamover failover very well, they might "pause" for 30-60 seconds but as soon as standby datamover takes over they will keep on running.
Jeffey1
2 Intern
2 Intern
•
2.8K Posts
0
June 26th, 2014 23:00
The sequence of operations involved in a Data Mover failover and the length of time required for each to complete is described in more detail below. The length of time required for the operations described below is generally independent of the Data Mover hardware involved. In general, Data Mover failover spend three parts of time:
a. For manually initiated failover, it depends on the system configuration since this will shut the "failed" Data Mover down gracefully before initiating the actual failover. The length of time required for a graceful shutdown of a Data Mover is dependent on the number of factors including the number of client connections, open files, locks held, file systems mounted, the amount of data that needs to be flushed from client and Data Mover buffers, and so on.
b. For DART OS panic, the system will spend 1 to 2 seconds to detect the issue, then up to 10seconds for panic handler and DART core dump initiation for a total of up to 12 seconds.
c. For failure of both internal network connections, the network connections are tested every 4 seconds, if both fail to respond 3 consecutive times, a failover is initiated. The system will spend 12 seconds.
d. For other reason, please refer to the white paper of high availability.
2. The time for fixed cost operations
The fixed cost operations associated with a Data Mover failover is approximately 14 seconds in releases 5.6 and 6.0, and 5 seconds for versions 7.0 and 7.1.
3. The time for variable cost operations
There are a number of deployment specific configuration items that the system must recover in a Data Mover failover situation. These include things such as network interfaces, CIFS servers, NFS exports and CIFS shares, but the time required is generally dominated by the time required to mount the file systems and checkpoints on the Data Mover. The speed at which file systems and checkpoints are mounted is dependent on the Celerra/VNX Operating Environment version.
The expected time required for a Data Mover failover to complete can be summarized by the following formula:
Data Mover failover time = the time for detecting the issue + the time for fixed cost operations + the time for variable cost operations.
Jeffey1
2 Intern
2 Intern
•
2.8K Posts
0
June 26th, 2014 23:00
Hi Dhakshinamoorthy,
The overview of the sequence of events that occur during a Data Mover failover is as follows:
For example: For Windows client, they need to re-login to the shared folder. (If the client uses application to access the share folder or map network driver, the re-login is automatically completed by application or operating system). For Linux client, the NFS client will continuously retry a failed mount attempt until a successful mount.
Anonymous User
375 Posts
0
June 27th, 2014 01:00
Hi Bala,
You don't need down time to perform this Upgrade. The only thing you need to check its policies. Have you applied the policy as Auto on Standby Data Mover ? If yes, the next thing is one of the Control Stations (in case if you have 2) has to be up and running. Cause the Failover will be done by CS.
NFS Clients will not be disconnected, they might only see "Server not responding message" for a fractions of seconds. They will get connected to standby DM (in case if the policy is set as Auto). Do not assign IP to Standby DM, it will take the same IP which Primary DM (was) is using.
FTP archive & NDMP sessions are lost and not re-connected. CIFS sessions are redirected by client's redirector to communicate with SDM. Designate the SDM & link the SDM with PDM
Failover by Standby DM will happen if these conditions are met :
1) Standby DM has to be Operational
2) Ensure that Standby DM's NIC & IP is equivalent to or greater than the Primary DM.
3) Make sure that Standby DM doesn't have any File System neither running any Application as well.
When you activate it after configuration, it will assume the IP of Primary DM.
Do not override the policies of Primary DM, by defining subsequent policies onto SDM (Standby DM).
To configure SDM type this command :
$ server_standby -create mover = [-policy ]
Example :
$ server_standby server_2 -create mover =server_3 -policy auto (after this command, the SDM will reboot)
Output :
server_2 : server_3 is rebooting as standby.
Here policy types would be (auto, retry & manual). Never set it manual, it can cause outage at client machine.
You can test the fail-over manually as well but I don't think it's needed, if you set the policy correctly.
You can't fail-over in below cases :
1) Manually restarting a DM (Physically).
2) Removing a DM from its' slot.
Hope would be useful info for you.
Thanks
Rakesh Pandey
Rainer_EMC
8.6K Posts
0
June 30th, 2014 07:00
See the manual or do a „man server_cpu”
Dhakshinamoorthy
174 Posts
0
July 1st, 2014 14:00
Hi All,
I have completed data mover failover test. Now, i want to know how long "server_3" took for takeover.
Which logs will help to find exact timings?
Rainer_EMC
8.6K Posts
0
July 6th, 2014 02:00
SPcollect doesn't make sense - use server_log