3 Argentum

Firmware Update (v6.0.1) Error Messages

Hi,

While upgrading the firmware from v5.2.5/v5.2.2 to v6.0.1, I've received numerous error messages from the group.  They were:

  • Connection login timed out
  • Connection failed because target offline
  • Free space in pool default is low. Performance on thin provisioned volumes, if any, might be temporarily decreased.
  • The reported size of the volumes currently belonging to pool default exceeds the pool capacity.
  • Volume state transition is in progress
  • Initiator disconnected from target during login.
  • The maximum in-use space limit for the volumes currently belonging to pool default exceeds the pool capacity.

Some background information: We have two PS6000 (been in production for about 2 years) and a PS6100 (just purchased and installed).  The PS6100 was going through RAID verification during the firmware upgrade and the group automatically started moving data to the new array after setup (did not know about the delay-data-move command option during RAID initialization). 

Also, the PS6000 arrays are:

  • RAID 10 - 3.66 TB
  • RAID 50 - 5.23 TB

And the PS6100 array is: RAID 50 - 48.3 TB

The process in which I took to upgrade the firmware was the PS6100 first, then the RAID 50 PS6000, and finally the RAID 10 PS6000.  Also, SAN HQ would disconnect me from the group.  Group Manager GUI was also extremely slow to show me information about the group, member, just about everything.

While I have completed quite a few firmware upgrades in the past, this was the first time the Equallogic group sent this many error messages.  I thought during array restart, the array would still be fine since it upgrades the secondary controller first, fails over to the secondary controller, and updates the ex-primary controller which should not have the arrays "screaming" at me. 

Can someone please tell me what I'm doing wrong to keep the Equallogic group from yelling at me?

Thank you

0 Kudos
17 Replies
Moderator
Moderator

Re: Firmware Update (v6.0.1) Error Messages

Q1: Connection login timed out

Login goes through many stages before successful completion. If the login stays in any state for more than 15 seconds, this timeout occurs.

Recommended Action

None. The initiator will usually try to log in again and, usually, will be successful.

Q2: Connection failed because target offline

- The connection failed because the target (volume) is offline. First, make sure that this is true. If so, set the volume online and try to connect again. If the problem is still there, contact support

Q3: Free space in pool default is low. Performance on thin provisioned volumes, if any, might be temporarily decreased.

- The “default” pool (you may have been renamed it, but it is the original pool that is created by default, when the group is first setup), is low on space.

Q4: The reported size of the volumes currently belonging to pool default exceeds the pool capacity.

- The total size of the volumes in the specified pool exceeds the pool capacity. In any pool, the total of the volume sizes cannot exceed the available space. There is not enough space in the pool to support the potential growth of all the thin-provisioned volumes to their total reported size.

Q5: Volume state transition is in progress

- 7.3.18: Volume state transition in progress

Management request to shut down when a new login comes in

Recommended Action

Retry login.

Q6: Initiator disconnected from target during login.

- The login process failed because the initiator broke the TCP connection to the array before logging in to a volume. A connection closure was received from the initiator before completion of the login process

- You should open a support case so we can determine what happened, might be a network issue.

Q7: The maximum in-use space limit for the volumes currently belonging to pool default exceeds the pool capacity.

- You can do any of the following:

• Reduce the size of one or more thin-provisioned volumes until this warning no longer occurs

• Add more space to the pool (for example, by adding another member to the group).

• Move thin-provisioned volumes to a pool with more capacity.

• Reduce the maximum in-use space warning limit for one or more thin-provisioned volumes in the pool.

-joe

-Joe

Social Media and Community Professional
#IWork4Dell
Get Support on Twitter - @dellcarespro

Follow me on Twitter: @joesatdell 

0 Kudos
3 Argentum

Re: Firmware Update (v6.0.1) Error Messages

Thank you Joe.

I thought since the controllers are failing over (one controller is always online), then the pool space would never decrease and the connections would never drop.  To me, it seems like an entire member is offline while upgrading which is why the pool space decreased and the connections were dropping.  

As for the volume stating that it's offline, the volumes were not offline.  I had to verify this by going to the server using that volume since the Group Manager wasn't loading (which is another issue).  

0 Kudos

Re: Firmware Update (v6.0.1) Error Messages

When you restarted one of the members in the pool,  that member no longer provided that space to the pool.  That's what the "free space in pool" messages are referring to.  Also, since volume data is striped across the members, when one is restarting, all the volumes they have in common will go offline while the member is failing over.

Yes, the upgrade process updates the secondary, then restarts it.  However, the actual failover time before that passive becomes active, and services I/O & login requests varies.  Based on load and model.

Is this a VMware ESX environment?   ESXi v5.x has a very short login timeout value of 5 seconds, by default.  That should be adjusted to 60 seconds.   That will typically prevent the "timeout during login" error messages.

Here's a KB article from Vmware explaining how to change that.

kb.vmware.com/.../search.do

Regards,

Social Media and Community Professional
#IWork4Dell
Get Support on Twitter - @dellcarespro

3 Argentum

Re: Firmware Update (v6.0.1) Error Messages

Thank you, Don.

This is a Windows environment.  

So since one member (PS6100) is substantially bigger than the other two members (PS6000), I have to tweak the snapshot reserve space (since that's the only change I've done other than adding the new member) so that it'll reserve less space and fit within the two members.  Is this correct?

Is this upgrade (v6.0.1) a little different than past upgrades?  I just find it odd that this upgrade was much more messy than past upgrades.  Also, the information for the Group Manager GUI was extremely slow (I thought the entire group went down)...

0 Kudos

Re: Firmware Update (v6.0.1) Error Messages

No.  Data is striped across the members.  More on the 6100 because its larger.  What's changed is now you have a multimember pool.   So both members are required to provide the larger pool of space.  To get around that, you would need to move a member to a new pool.   Then move some volumes to the old pool, to the new one.   Then it will work like it did in the past.   The benefit of multimember pools is that all I/O is handled by the members.  So they work cooperatively on every I/O request.  If you install the Windows HIT kit that will also provide an enhanced MPIO algorithm that will help you better leverage I/O from both members compared to the standard Windows MPIO code.

re: slowness.  See how it goes once the build has completed.  The GUI now needs information from two members, not just one.

re: Windows.  In the OS considerations guide it covers how to extend the disk timeout value to 60 seconds.  That's a registry change so that will require a reboot to become effective.

Social Media and Community Professional
#IWork4Dell
Get Support on Twitter - @dellcarespro

3 Argentum

Re: Firmware Update (v6.0.1) Error Messages

Thank you, Don.

We have SQL Server on the PS6000XV (RAID 10) and PS6000E/PS6100E (RAID 50) which we failover to the backup site before updating the Equallogic firmware.  We also have VMs on on the PS6000/6100 as well as file shares and Windows clustering volumes.  

Is there a process where I can upgrade the firmware without taking volumes offline?  

0 Kudos

Re: Firmware Update (v6.0.1) Error Messages

If you set the disk timeout values on the servers and the VM's they should "ride" out the restart process.  It needs to be done not only for upgrades, but in case of a HW problem that causes the CM to failover.  

Otherwise only other option is have another array in the group that's empty.   Add it to the pool and move out the members one at a time to a temporary pool.  Upgrade and restart that member, then move it back into the pool.  Move out the next one and repeat.    That prevents any downtime but requires another array and takes a lot longer.

Social Media and Community Professional
#IWork4Dell
Get Support on Twitter - @dellcarespro

3 Argentum

Re: Firmware Update (v6.0.1) Error Messages

Thank you, Don.  

I have the iSCSI Initiator and Operating System Considerations document open.  Just be to certain that I'm looking at the same thing you're talking about, I will go into the registry and:

Increase the value of the TimeOutValue parameter (HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Services/Disk/TimeOutValue) to at least 60 seconds (the default is 10 seconds). Make sure to set the type to DWORD and enter the decimal value (60).  And reboot the server (and VMs).

Is that the only setting?

0 Kudos

Re: Firmware Update (v6.0.1) Error Messages

Yes, that's the important one.  It has to be done on the server AND inside any VMs on your Hyper-V servers.  So the host and VMs require a restart.

Social Media and Community Professional
#IWork4Dell
Get Support on Twitter - @dellcarespro

0 Kudos