Question on firmware upgrade issues

Question

Would the number of volumes and/or connections to a member affect the length of time it takes for a controller failover to be successful when upgrading the firmware?

Reason I ask is this . . .

I have a PS6500M populated with 2TB drives. It hosts 138 volumes and lists 426 connections.

When I upgraded from 4.1.2 (?) to the current 5.0.4 it took nearly 12 minutes before connections were re-established . . . and this caused massive issues with our servers (all of which had been configured according to best practices documents, and all confirmed again after the disruption).

I have been advised by Dell Support to upgrade to the latest and greatest to resolve another issue I am having. However, basically turning off my company to upgrade firmware is going to get me in trouble . . . as is cratering 80+ servers if the firmware takes a noticeable fraction of an hour again.

Joe S586 · Answer

6500M? I think you mean the E or E/X.

The firmware update will first update the secondary controller, once the secondary is updated, you are prompted to restart the array, once restarted this controller becomes the primary and all connections are now on this controller. At this point, the former primary controller is now the secondary and the update continues until the firmware is updated on the secondary controller (the former primary). The restart (failover) only takes 15-30 seconds.

One to the best way to do an update is to have a serial connection to each controller in order to watch the process on both controllers at the same time while the update is running (two separate terminal screens open).

It’s important to ensure that all host servers (and VM’s) have the proper iSCSI disk timeouts setup properly to “ride out” the controller failover. Additionally, ensure to enable spanning tree portfast on all ports facing the iSCSI subnet (host and array ports). This is listed in the “iSCSI Initiator and Operating System Considerations” document on the FW download page.

What you might be experiencing is a switch and/or host NIC/HBA configuration issue. This can be caused by the connection storm or flood of packets hitting the switch. If this is the case you can enable unicast flood (if disabled the switch will temporarily drop these packets, which may cause your server to take longer to connect). But to be honest 12 minutes for all your server to reconnect indicates something other than just the unicast flood settings.

Review this link to see if you have missed anything in your configuration:

en.community.dell.com/.../3615.rapid-equallogic-configuration-portal-by-sis.aspx.

VMware has it's own set of configuration that Don has mentioned many times:

en.community.dell.com/.../19516886.aspx (see his comments)

Also ensure that the iSCSI facing interfaces on your hosts do not have a default gateway setup on the host server, and only have IP enabled (v4 or v6) (if a windows server in the NIC properties uncheck the client for MS networks, file and print sharing, qos, link-layer, etc.).

One sure fire way to test that the issues isn’t related to the FW update is to failover the array (issue a restart) and see how long your hosts take to reconnect. This would eliminate the FW update as the cause.

-joe

EqualLogic

Question on firmware upgrade issues

Was this post helpful?