Dhakshinamoorthy

174 Posts

3392

June 26th, 2014 10:00

VNX datamover mock failover

Hi All,

We have EMC VNX5500 (Unified) Storage and we are planning to upgrade both FLARE and DART code. I knew that, FLARE is NDU.

We have provisioned two NFS shares (500GB, 2TB). 500GB NFS share mounted as "/kdump" partition over 100 more Linux Servers.

2TB NFS share provisioned to storage image/files.

What will happen to NFS share access, during datamover failover while DART upgrade?

I want to simulate the datamover failover test before we go for DART upgrade. how can i achieve the same?

Also,share the best practices, to upgrade DART without NFS downtime. Please note that, no CIFS shares configured in this box.

Proposed upgrade plan:

FLARE: 05.32.000.5.201 to 05.32.000.5.207

DART: 7.1.65.8 to 7.1.72.1

Regards,

Dhakshinamoorthy Balasubramanian

http://www.storageadmin.in/

Responses(7)

dynamox

2 Intern

•

20.4K Posts

0

June 26th, 2014 18:00

NFS clients are stateless and handle datamover failover very well, they might "pause" for 30-60 seconds but as soon as standby datamover takes over they will keep on running.

Jeffey1

2 Intern

•

2.8K Posts

0

June 26th, 2014 23:00

The sequence of operations involved in a Data Mover failover and the length of time required for each to complete is described in more detail below. The length of time required for the operations described below is generally independent of the Data Mover hardware involved. In general, Data Mover failover spend three parts of time:

The time for detecting the issue.

a. For manually initiated failover, it depends on the system configuration since this will shut the "failed" Data Mover down gracefully before initiating the actual failover. The length of time required for a graceful shutdown of a Data Mover is dependent on the number of factors including the number of client connections, open files, locks held, file systems mounted, the amount of data that needs to be flushed from client and Data Mover buffers, and so on.

b. For DART OS panic, the system will spend 1 to 2 seconds to detect the issue, then up to 10seconds for panic handler and DART core dump initiation for a total of up to 12 seconds.

c. For failure of both internal network connections, the network connections are tested every 4 seconds, if both fail to respond 3 consecutive times, a failover is initiated. The system will spend 12 seconds.

d. For other reason, please refer to the white paper of high availability.

2. The time for fixed cost operations

The fixed cost operations associated with a Data Mover failover is approximately 14 seconds in releases 5.6 and 6.0, and 5 seconds for versions 7.0 and 7.1.

3. The time for variable cost operations

There are a number of deployment specific configuration items that the system must recover in a Data Mover failover situation. These include things such as network interfaces, CIFS servers, NFS exports and CIFS shares, but the time required is generally dominated by the time required to mount the file systems and checkpoints on the Data Mover. The speed at which file systems and checkpoints are mounted is dependent on the Celerra/VNX Operating Environment version.

The expected time required for a Data Mover failover to complete can be summarized by the following formula:

Data Mover failover time = the time for detecting the issue + the time for fixed cost operations + the time for variable cost operations.

Jeffey1

2 Intern

•

2.8K Posts

0

June 26th, 2014 23:00

Hi Dhakshinamoorthy,

The overview of the sequence of events that occur during a Data Mover failover is as follows:

Failure event occurs on the Data Mover.
Failure is detected by the control station.
The Control station re-configures the system so that the standby Data Mover can assume the identity and configuration (for example: name, IP address, MAC address, etc.) of the failed Data Mover and the failed Data Mover will boot with a Minimum configuration, thereby avoiding any risk of a Split brain scenario.
Standby Data Mover initializes first its non-deployment dependent configuration and then the deployment dependent configuration (for example: CIFS servers, user data file systems, NFS exports, CIFS shares, etc.).
At this point the Data Mover Failover is complete. Clients can connect to the Data Mover. The length of time required for Data Mover failover depends on numerous factors outside the control of Celerra/VNX.

For example: For Windows client, they need to re-login to the shared folder. (If the client uses application to access the share folder or map network driver, the re-login is automatically completed by application or operating system). For Linux client, the NFS client will continuously retry a failed mount attempt until a successful mount.

AU

Anonymous User

375 Posts

0

June 27th, 2014 01:00

Hi Bala,

You don't need down time to perform this Upgrade. The only thing you need to check its policies. Have you applied the policy as Auto on Standby Data Mover ? If yes, the next thing is one of the Control Stations (in case if you have 2) has to be up and running. Cause the Failover will be done by CS.

NFS Clients will not be disconnected, they might only see "Server not responding message" for a fractions of seconds. They will get connected to standby DM (in case if the policy is set as Auto). Do not assign IP to Standby DM, it will take the same IP which Primary DM (was) is using.

FTP archive & NDMP sessions are lost and not re-connected. CIFS sessions are redirected by client's redirector to communicate with SDM. Designate the SDM & link the SDM with PDM

Failover by Standby DM will happen if these conditions are met :

1) Standby DM has to be Operational

2) Ensure that Standby DM's NIC & IP is equivalent to or greater than the Primary DM.

3) Make sure that Standby DM doesn't have any File System neither running any Application as well.

When you activate it after configuration, it will assume the IP of Primary DM.

Do not override the policies of Primary DM, by defining subsequent policies onto SDM (Standby DM).

To configure SDM type this command :

$ server_standby -create mover = [-policy ]

Example :

$ server_standby server_2 -create mover =server_3 -policy auto (after this command, the SDM will reboot)

Output :

server_2 : server_3 is rebooting as standby.

Here policy types would be (auto, retry & manual). Never set it manual, it can cause outage at client machine.

You can test the fail-over manually as well but I don't think it's needed, if you set the policy correctly.

You can't fail-over in below cases :

1) Manually restarting a DM (Physically).

2) Removing a DM from its' slot.

Hope would be useful info for you.

Thanks

Rakesh Pandey

R

Rainer_EMC

8.6K Posts

0

June 30th, 2014 07:00

See the manual or do a „man server_cpu”

D

Dhakshinamoorthy

174 Posts

0

July 1st, 2014 14:00

Hi All,

I have completed data mover failover test. Now, i want to know how long "server_3" took for takeover.

Which logs will help to find exact timings?

R

Rainer_EMC

8.6K Posts

0

July 6th, 2014 02:00

SPcollect doesn't make sense - use server_log

View All

No Events found!