Also, if you right-click on the array and select faults, it should tell you if something is faulted, like a fan or SPS as RRR mentions.
Also, you could use naviseccli -h faults -list which will tell you the same information.
If "the array is operating normally" yet the array is still faulted, I would open a case with support. You may just need to restart the management server on each SP.
The nice replies and explanations from Rob and Mike might have provided the answers you were looking for. If so, please mark the question as answered and select the replies as "Correct" and/or "Helpful". If you need any more details, please feel free to revert back to this forum.
What version of Flare are you running on the CX500. There are some known bugs with older versions of flare that always set off false alerts in our environmetn till we upgraded flare 16 and better.
(this is 159deka formerally known as 159eka) thank you all for the responses. I had a small issue with my account, but am back live & kicking.
1- On expanding NO ITEM was detected as faulty! 2- After business hours the whole system was rebooted, the fault was then mirrored to both Navisphere windows, an SPS was then identified as faulty, then turned to "T", & then the system into normal operation mode without any further errors. 3- Since then (2200 hrs. Tuesday ) the system has been behaving itself. 4- however we have taken the precauion of logging a call, & the event log was collected. A filtered event log pertaining to the error only, is copied below 5-Flare version is 2.19
thanks again for the kind interest
=============
1. Date:05/07/2008 Time:09:51:35 PM Event Code:0xfd5 Description:WSAGetLastError() returned error: An address incompatible with the requested protocol was used. Subsystem:CK200061400543 Device:N/A SP:N/A Host:DFCC-SPB Source:TlntSvr Category:NT Application Log Log:NT Application Log Sense Key:N/A Ext Code1:N/A Ext Code2:N/A Type:Error
This generally indicates that the SP reporting the error could not contact its peer SP - this could be caused by a number of things but most likely there was a reboot of one of the SPs.
Service will be able to tell what caused this - may be a patch level
The latest development is that SP B needs to be replaced. And we plan to do it this evening.
However two concerns before we do that. We have implemented soft zoning, hence the zoning configuration will go for a fix the moment we have a new SP with new two new WWNs, as the zoning is based on WWNs in soft zoning. The easiest remedy that I can see it is if we can edit the alias names to have the new WWNs. Can that be done ?
Second concern is can this be done on-line as we plan to do it online ? Will there be a inherent disabling/enabling of zone configurations which we hinder work ?
wwn's of a new SP do not change !!! So don't worry about that.
The replacement can be done online as long as all hosts are HA connected (each host with 2 HBA's and connected to both SP's and failover software implemented (VMware or Powerpath))
Replacing a SP will not change the WWN. The world wide number is associated with the Storage Processor Enclosure - new SP will get the same WWN - so no worries at all.
The activity is Online - however, SPB will be removed from the System - means, all LUNs owned by SPB will be trespassed to SPA - ensure all the hosts connected are running proper failover software. It may be a good idea to do this activity during low I/O period.
EMC Customer Engineer who will be doing this activity may guide you properly.
Finally we all are so glad to see that, your post on this forum helped to successfully identify the issue.
Thankks sandeep.. yes the SP was replaced last evening. However it has not yet come on-line. So we are working with EMC to see whether there is anything elso wrong. It has been in "POST" level for the last 13 hours or so.
GEnerally ho long does it take to update teh new SP & bring it on-line
RRR
4 Operator
•
5.7K Posts
0
May 7th, 2008 05:00
Drill down in the physical section and open all physical +'s you can see. I suspect an SPS, fan or PS has an issue.
Please remember that each SPS will do a self test each Sunday morning at around 2AM.
Ofcourse you can also check the event log (right click SPA/SPB and choose Event log).
If in doubtt you can open a case with EMC and have them check the machine.
the_san_man
40 Posts
0
May 7th, 2008 06:00
Also, you could use naviseccli -h faults -list which will tell you the same information.
If "the array is operating normally" yet the array is still faulted, I would open a case with support. You may just need to restart the management server on each SP.
nandas
4 Operator
•
1.5K Posts
0
May 8th, 2008 08:00
Cheers,
Sandip
Mpalumbo
35 Posts
0
May 8th, 2008 08:00
Mike
159deka
18 Posts
0
May 9th, 2008 04:00
thank you all for the responses. I had a small issue with my account, but am back live & kicking.
1- On expanding NO ITEM was detected as faulty!
2- After business hours the whole system was rebooted, the fault was then mirrored to both Navisphere windows, an SPS was then identified as faulty, then turned to "T", & then the system into normal operation mode without any further errors.
3- Since then (2200 hrs. Tuesday ) the system has been behaving itself.
4- however we have taken the precauion of logging a call, & the event log was collected. A filtered event log pertaining to the error only, is copied below
5-Flare version is 2.19
thanks again for the kind interest
=============
1.
Date:05/07/2008
Time:09:51:35 PM
Event Code:0xfd5
Description:WSAGetLastError() returned error: An address incompatible with the requested protocol was used.
Subsystem:CK200061400543
Device:N/A
SP:N/A
Host:DFCC-SPB
Source:TlntSvr
Category:NT Application Log
Log:NT Application Log
Sense Key:N/A
Ext Code1:N/A
Ext Code2:N/A
Type:Error
2.
Date:05/07/2008
Time:09:51:09 PM
Event Code:0x36d
Description:There was error [DATABASE OPEN FAILED] processing the driver database. 00 00 00 00 02 00 64 00 00 00 00 00 6d 03 00 c0 00 00 00 00 6d 03 00 c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Subsystem:CK200061400543
Device:N/A
SP:N/A
Host:DFCC-SPB
Source:Application Popup
Category:NT System Log
Log:NT System Log
Sense Key:N/A
Ext Code1:N/A
Ext Code2:N/A
Type:Error
3.
Date:05/07/2008
Time:09:51:09 PM
Event Code:0x4
Description:Dynamic strings:AMLI0xcfc0xcf8 - 0xcff 00 00 00 00 04 00 52 00 00 00 00 00 04 00 05 c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Subsystem:CK200061400543
Device:N/A
SP:N/A
Host:DFCC-SPB
Source:ACPI
Category:NT System Log
Log:NT System Log
Sense Key:N/A
Ext Code1:N/A
Ext Code2:N/A
Type:Error
4.
Date:05/07/2008
Time:09:51:09 PM
Event Code:0x5
Description:Dynamic strings:AMLI0xcf80xcf8 - 0xcff 00 00 00 00 04 00 52 00 00 00 00 00 05 00 05 c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Subsystem:CK200061400543
Device:N/A
SP:N/A
Host:DFCC-SPB
Source:ACPI
Category:NT System Log
Log:NT System Log
Sense Key:N/A
Ext Code1:N/A
Ext Code2:N/A
Type:Error
5.
Date:05/07/2008
Time:09:51:09 PM
Event Code:0x36d
Description:There was error [DATABASE NOT LOADED] processing the driver database. 00 00 00 00 02 00 64 00 00 00 00 00 6d 03 00 c0 00 00 00 00 6d 03 00 c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Subsystem:CK200061400543
Device:N/A
SP:N/A
Host:DFCC-SPB
Source:Application Popup
Category:NT System Log
Log:NT System Log
Sense Key:N/A
Ext Code1:N/A
Ext Code2:N/A
Type:Error
6.
Date:05/07/2008
Time:09:48:43 PM
Event Code:0x904
Description:VSC Shutdown/Removed
Subsystem:CK200061400543
Device:Enclosure 0 Power A
SP:SPA
Host:DFCC-SPB
Source:N/A
Category:N/A
Log:Storage Array
Sense Key:0x0
Ext Code1:0x0
Ext Code2:0x4
Type:Error
7.
Date:05/07/2008
Time:09:48:40 PM
Event Code:0x941
Description:Battery Online
Subsystem:CK200061400543
Device:Enclosure 0 SPS A
SP:SPA
Host:DFCC-SPB
Source:N/A
Category:N/A
Log:Storage Array
Sense Key:0x0
Ext Code1:0x0
Ext Code2:0x1
Type:Error
8.
Date:05/07/2008
Time:09:48:39 PM
Event Code:0x941
Description:Battery Online
Subsystem:CK200061400543
Device:Enclosure 0 SPS B
SP:SPA
Host:DFCC-SPB
Source:N/A
Category:N/A
Log:Storage Array
Sense Key:0x0
Ext Code1:0x0
Ext Code2:0x1
Type:Error
9.
Date:05/07/2008
Time:09:48:39 PM
Event Code:0x908
Description:Fault - Cache Disabling
Subsystem:CK200061400543
Device:SP A
SP:SPA
Host:DFCC-SPB
Source:N/A
Category:N/A
Log:Storage Array
Sense Key:0x0
Ext Code1:0x0
Ext Code2:0x0
Type:Error
10.
Date:05/02/2008
Time:04:59:27 PM
Event Code:0x2580
Description:Storage Array Faulted Bus 0 Enclosure 0 : Faulted Bus 0 Enclosure 0 SPS B : Removed SP B : Removed
Subsystem:CK200061400543
Device:N/A
SP:N/A
Host:DFCC-SPB
Source:N/A
Category:N/A
Log:Application
Sense Key:N/A
Ext Code1:N/A
Ext Code2:N/A
Type:Error
11.
Date:05/02/2008
Time:04:59:21 PM
Event Code:0x944
Description:Hard Peer Bus Error
Subsystem:CK200061400543
Device:SP A
SP:SPA
Host:DFCC-SPB
Source:N/A
Category:N/A
Log:Storage Array
Sense Key:0x2
Ext Code1:0x0
Ext Code2:0x0
Type:Error
12.
Date:05/02/2008
Time:04:59:21 PM
Event Code:0x944
Description:Hard Peer Bus Error
Subsystem:CK200061400543
Device:SP A
SP:SPA
Host:DFCC-SPB
Source:N/A
Category:N/A
Log:Storage Array
Sense Key:0x1
Ext Code1:0xaa975cf4
Ext Code2:0x0
Type:Error
13.
Date:05/02/2008
Time:04:59:12 PM
Event Code:0x908
Description:Fault - Cache Disabling
Subsystem:CK200061400543
Device:SP A
SP:SPA
Host:DFCC-SPB
Source:N/A
Category:N/A
Log:Storage Array
Sense Key:0x0
Ext Code1:0x0
Ext Code2:0x0
Type:Error
Message was edited by:
159deka
kelleg
4 Operator
•
4.5K Posts
0
May 9th, 2008 11:00
Description:Hard Peer Bus Error
This generally indicates that the SP reporting the error could not contact its peer SP - this could be caused by a number of things but most likely there was a reboot of one of the SPs.
Service will be able to tell what caused this - may be a patch level
regards,
glen kelley
159deka
18 Posts
0
May 12th, 2008 21:00
Awaiting further details/instructions
159deka
18 Posts
0
May 13th, 2008 00:00
However two concerns before we do that.
We have implemented soft zoning, hence the zoning configuration will go for a fix the moment we have a new SP with new two new WWNs, as the zoning is based on WWNs in soft zoning. The easiest remedy that I can see it is if we can edit the alias names to have the new WWNs. Can that be done ?
Second concern is can this be done on-line as we plan to do it online ? Will there be a inherent disabling/enabling of zone configurations which we hinder work ?
A response ASAP is appreciated.
RRR
4 Operator
•
5.7K Posts
0
May 13th, 2008 01:00
The replacement can be done online as long as all hosts are HA connected (each host with 2 HBA's and connected to both SP's and failover software implemented (VMware or Powerpath))
nandas
4 Operator
•
1.5K Posts
0
May 13th, 2008 08:00
The activity is Online - however, SPB will be removed from the System - means, all LUNs owned by SPB will be trespassed to SPA - ensure all the hosts connected are running proper failover software. It may be a good idea to do this activity during low I/O period.
EMC Customer Engineer who will be doing this activity may guide you properly.
Finally we all are so glad to see that, your post on this forum helped to successfully identify the issue.
Cheers,
Sandip
159deka
18 Posts
0
May 13th, 2008 21:00
yes the SP was replaced last evening. However it has not yet come on-line. So we are working with EMC to see whether there is anything elso wrong. It has been in "POST" level for the last 13 hours or so.
GEnerally ho long does it take to update teh new SP & bring it on-line
RRR
4 Operator
•
5.7K Posts
0
May 14th, 2008 00:00
dynamox
9 Legend
•
20.4K Posts
0
May 14th, 2008 03:00
RRR
4 Operator
•
5.7K Posts
0
May 15th, 2008 00:00
dynamox
9 Legend
•
20.4K Posts
0
May 15th, 2008 04:00