I am asking because I want to know if anyone else has experienced disruptive upgrades specifically from 4.0.15-20 to 4.0.15-24?
I am trying to figure out if I should be asking the community before proceeding with upgrades.
I have a support case open and I am sure DELL/EMC will have answers soon on what cause the issue, but I would like to avoid issues when it comes to affecting production.
Solved! Go to Solution.
The XtremIO upgrade went well last week, All AIX servers involved stayed up. Successful NDU! The fix is below. Customer Upgrade Preparation Guide - XtremIO EMC added the below line. * Is using AIX, please review KB491002 prior to NDU. KB491002 Referenced the IBM fix. * Apply the IBM Authorized Program Analysis Report (APAR) mentioned in IBM IV84862 - Improve Handling of Aborted Commands on the host side. Note the fix has been rolled into a service pack. My AIX systems that made it through the Successful NDU were running the below AIX OS level. # oslevel -s 7100-04-03-1642
4.0.15-20 to 4.0.15-24 is an NDU (non disruptive upgrade) process. Dell EMC Support will run pre-upgrade checks in your environment to make sure we are aware of any potential issues before the NDU process starts.
Avi, thank you for your comment, and yes my earlier updates were NDU, but the 4.0.15-20 to 4.0.15-24 was NOT a NDU.
EMC did all the pre-checking and everything passed, but a production database crashed during the update.
I will update this post when I get the RCA results, because many things can cause issues during an update.
In my case the previous update was done less than 90 days earlier without issues.
I believe my hosts were all configured correctly, so that is why I was asking the community if anyone else has experienced issue with the specified version jump from 4.0.15-20 to 4.0.15-24.
I was very surprised the upgrade had issues because the previous upgrade had no issues at all.
We have got this upgrade done (same versions) as there was an advisory recommending upgrade. We didn't faced any issues. Maybe EMC can suggest more on this.
4.0.15-20 to 4.0.15-24 is about as simple an upgrade as they come. There are no "firmware" changes in this version, so there's no need to reboot any of the storage controllers - just a quick blip as we reload the new XIOS code, and that's it.
This is one additional step that the person carrying out the upgrade will do due to the fact that there wasn't an reboot, but that's completely transparent.
There's certainly no expectation of any problems for this (or any other) upgrades. Most of the times we see issues during upgrades it's down to things like multipathing or timeouts not being set correctly on the host, but I'm sure support will be working with you to try and work out exactly what went wrong in this case and get it fixed. We're actually working on a set of scripts that will validate the host-sided configuration before an upgraded (or at any other time) to help avoid such issues - the one for VMware is in final testing, and (physical) Windows and Linux will follow shortly.
From a supportability standpoint, NDU from 4.0.15-20 -> 4.0.15-24 is incredibly stable. This is a minor release and does not include a Kernel update. This is good because it means that no reboots are required.
The only issues we've encountered thus far originated from "noisy" SAN environments. If you are concerned about the possibility of hitting an issue resulting a disruption of service, I would start by vetting your fabric infrastructure.
Hi Mdeitrick and Scotthoward, thank you for your input.
EMC Support suggested I open a support case with the switch vender. Which I have done and that investigation is taking place. No root cause identified yet, but we are still digging.
Just an update, the switch vender did not find any FC switch issues shortly before, during, and after the upgrade.
Did the Fabric analysis mention anything about credit starvation or slow draining devices? Are they any ISLs present and were all switches review (not just those with links to the XtremIO)? I had one recently where a reset of unrelated switch ports on a switch that was connected via ISL resulted in excessive noise. If you would like a second pair eyes to review the case, please provide the service request number and I'll have a look.