I do not think anyone has answered your original question. Shut your AIX system down, In NaviSphere use the Tools - failover setup wizard to change the failover mode from 1 to 3. Power up your AIX system.
I do not know if an online process exists. You would have to open a support call to EMC see if an online process exists for changing failover mode online.
I would engage EMC Support to help with this one. We just did a FLARE 24 to 26 upgrade on a new array we put on the floor with two AIX hosts attached to test out the NDU requirement for going to 26. It was an incredibly painful process involving unmounting file systems changing settings and remounting. The worst part was that after we had the HEAT reports generated we were told by a CLARiiON specialist that the HEAT didn't actual highlight everything we had to do, and they gave us more APARs, HBA settings, and various other setting to change.
I don't think our account team finds it as amusing as I do when I point out that the concept of NDU isn't really valid in the real world on CLARiiONs. And they especially don't like it when I point out that 2/3 of NDU is DU (otherwise know as Data Unavailable). Of course, it doesn't help that the last three major FLARE code upgrades we did caused outages to large numbers of hosts (even after EMC checked everything and said it would be OK).
When you run an EMCGrab or EMCReports from the host and send it in to EMC they run it through a tool that generates a "HEAT" report. I can't remeber what it stands for off the top of my head, but it is supposed to tell you the state of the host and if everything is configured up to required specifications. If there are patches missing, incorrect setting, etc, it should be highlighted in the HEAT report.
Allen, Were the NDU (actually I think it is Non-Disruptive Upgrade) issues you had isolated to AIX hosts? I have never supported AIX on a SAN and am not familiar with the platform dependencies.
I am curious because I have rarely seen issues resulting from NDU FLARE upgrades (all Windows hosts), and even upgrades from one platform to another (Cx600 to Cx700 for example). Of course that does not mean they don't happen but it just doesn't appear to be a common occurrence to me.
Amen to that. This really simplifies the process. Now if they could just make the HEAT report comprehensive I'd be ecstatic. It catches a lot of the stuff, but the last time we had issues a manual review by a CLARiiON specialist found half a dozen things that hadn't been identified on the HEAT.
neither did mine ...i think it became available a couple of weeks ago, can't find the thread now but somebody posted it in the symmetrix forum. I was excited as hell ..don't have to beg my CE to run them for me or open tickets with support.
Almost all the problems we have had related to AIX. There have been the occasional Windows or Solaris host that had trouble because something was misconfigured on the host side, but those were always obvious errors when we investigated.
i agree, i use it mostly to check software versions for PowerPath/drivers. I did a CX600 upgrade two weeks ago and it did not catch incorrectly configured DMP configuration on 3 Solaris boxes, so all three lost access to drives during SP failover. So much for NDU , you are not alone
My environment contains multiple AIX systems. I have had my share of bad flarecode updates where I have had problems with AIX. The good news is my last flarecode update Fare 19 to the latest 24 code went well for the four systems I did not shut down.
Getting everything configured is not easy, I had to use ELAB, EMC support matrix for Veritas, and the AIX host connectivity guide. I did have to take a two hour outage to get my system updated to fit into the matrix the above three docs created. Good luck just wanted you to know it is possible. In my case the system only had a single hba and I was using PowerPath SE. I changed the failover mode to 3 when I was updating PowerPath to 5.0 part of the work I did in the 2 hour outage I referenced above.
So a planned outage was required to get my system to a supported configuration. The four AIX systems, that I did get to a supported configuration had no problems while the flarecode update took place. I did test the configuration before the flarecode update. Doing trespass at the array and disabling initiators on the array. I recommend if you mess with the initiators on the array, reboot your server before you put it back into production. In my tests, I was not running production data to the disk. My production application was down, and I was running a large copy while I did the trespass tests.
I would change the failover mode from 1 to 3. A EMC tech note does explain the difference between failover mode 1 and 3, I just do not remember the number. All my systems were down when I changed the failover mode. I do NOT recommend changing the failover mode on a running system.
I'm glad to hear you were able to get an NDU FLARE upgrade in with four AIX hosts up, but I have to wonder if we can still call it an NDU when you had to take a 2 hour outage before the upgrade to make sure the NDU would work. I'm thinking that we are just going to have to bite the bullet in the future and take all our AIX hosts down for FLARE upgrades and just have the single outage.
There is no guarantee that EMC isn't going to come back with more changes required to support the next FLARE upgrade... and thy will probably require more AIX outages to ensure that AIX won't take an outage!!!
tim.koopman
73 Posts
0
April 28th, 2008 11:00
I do not think anyone has answered your original question.
Shut your AIX system down, In NaviSphere use the Tools - failover setup wizard to change the failover mode from 1 to 3. Power up your AIX system.
I do not know if an online process exists. You would have to open a support call to EMC see if an online process exists for changing failover mode online.
Allen Ward
4 Operator
•
2.1K Posts
0
April 25th, 2008 06:00
I don't think our account team finds it as amusing as I do when I point out that the concept of NDU isn't really valid in the real world on CLARiiONs. And they especially don't like it when I point out that 2/3 of NDU is DU (otherwise know as Data Unavailable). Of course, it doesn't help that the last three major FLARE code upgrades we did caused outages to large numbers of hosts (even after EMC checked everything and said it would be OK).
ovivier
142 Posts
0
April 25th, 2008 08:00
Allen, could you please explain what "HEAT" is ?
Does anyone have a better experience with NDUs involving AIX servers ?
Allen Ward
4 Operator
•
2.1K Posts
0
April 25th, 2008 10:00
dynamox
9 Legend
•
20.4K Posts
0
April 25th, 2008 10:00
https://servicetools.emc.com/heat.php
AranH1
2.2K Posts
0
April 25th, 2008 10:00
Were the NDU (actually I think it is Non-Disruptive Upgrade) issues you had isolated to AIX hosts? I have never supported AIX on a SAN and am not familiar with the platform dependencies.
I am curious because I have rarely seen issues resulting from NDU FLARE upgrades (all Windows hosts), and even upgrades from one platform to another (Cx600 to Cx700 for example). Of course that does not mean they don't happen but it just doesn't appear to be a common occurrence to me.
Allen Ward
4 Operator
•
2.1K Posts
0
April 25th, 2008 11:00
Allen Ward
4 Operator
•
2.1K Posts
0
April 25th, 2008 11:00
My CE doesn't even know about this!
SKT2
2 Intern
•
1.3K Posts
0
April 25th, 2008 11:00
https://forums.emc.com/forums/thread.jspa?threadID=66560&tstart=0
dynamox
9 Legend
•
20.4K Posts
0
April 25th, 2008 11:00
Allen Ward
4 Operator
•
2.1K Posts
0
April 25th, 2008 11:00
AIX has never gone well for us.
dynamox
9 Legend
•
20.4K Posts
0
April 25th, 2008 12:00
tim.koopman
73 Posts
0
April 25th, 2008 15:00
My environment contains multiple AIX systems. I have had my share of bad flarecode updates where I have had problems with AIX. The good news is my last flarecode update Fare 19 to the latest 24 code went well for the four systems I did not shut down.
Getting everything configured is not easy, I had to use ELAB, EMC support matrix for Veritas, and the AIX host connectivity guide. I did have to take a two hour outage to get my system updated to fit into the matrix the above three docs created. Good luck just wanted you to know it is possible. In my case the system only had a single hba and I was using PowerPath SE. I changed the failover mode to 3 when I was updating PowerPath to 5.0 part of the work I did in the 2 hour outage I referenced above.
So a planned outage was required to get my system to a supported configuration.
The four AIX systems, that I did get to a supported configuration had no problems while the flarecode update took place. I did test the configuration before the flarecode update. Doing trespass at the array and disabling initiators on the array. I recommend if you mess with the initiators on the array, reboot your server before you put it back into production. In my tests, I was not running production data to the disk. My production application was down, and I was running a large copy while I did the trespass tests.
I would change the failover mode from 1 to 3. A EMC tech note does explain the difference between failover mode 1 and 3, I just do not remember the number.
All my systems were down when I changed the failover mode. I do NOT recommend changing the failover mode on a running system.
dynamox
9 Legend
•
20.4K Posts
0
April 25th, 2008 17:00
all of your AIX systems are running Veritas ? I do not have Veritas on mine ..so wonder if the process would be any different.
Allen Ward
4 Operator
•
2.1K Posts
0
April 27th, 2008 07:00
There is no guarantee that EMC isn't going to come back with more changes required to support the next FLARE upgrade... and thy will probably require more AIX outages to ensure that AIX won't take an outage!!!