'F' on Navisphere

Question

When I log on to Navisphere (using IP of SP A) a 'Fault' is listed under the DPE. However on expanding, none of the components report faults. Further, when I look at Navisphere on the second IP (i.e. IP of SP B) no fault is reported. The storage system is a CX-500. Has anyone experienced a similar incident ? Thanks for any possible assistance, which is sought, in advance.

RRR · Answer

Yup

Drill down in the physical section and open all physical +'s you can see. I suspect an SPS, fan or PS has an issue.

Please remember that each SPS will do a self test each Sunday morning at around 2AM.

Ofcourse you can also check the event log (right click SPA/SPB and choose Event log).

If in doubtt you can open a case with EMC and have them check the machine.

the_san_man · Answer

Also, if you right-click on the array and select faults, it should tell you if something is faulted, like a fan or SPS as RRR mentions.

Also, you could use naviseccli -h faults -list which will tell you the same information.

If "the array is operating normally" yet the array is still faulted, I would open a case with support. You may just need to restart the management server on each SP.

nandas · Answer

The nice replies and explanations from Rob and Mike might have provided the answers you were looking for. If so, please mark the question as answered and select the replies as "Correct" and/or "Helpful". If you need any more details, please feel free to revert back to this forum.

Cheers,
Sandip

Mpalumbo · Answer

What version of Flare are you running on the CX500. There are some known bugs with older versions of flare that always set off false alerts in our environmetn till we upgraded flare 16 and better.Mike

159deka · Answer

(this is 159deka formerally known as 159eka)
thank you all for the responses. I had a small issue with my account, but am back live & kicking.

1- On expanding NO ITEM was detected as faulty!
2- After business hours the whole system was rebooted, the fault was then mirrored to both Navisphere windows, an SPS was then identified as faulty, then turned to "T", & then the system into normal operation mode without any further errors.
3- Since then (2200 hrs. Tuesday ) the system has been behaving itself.
4- however we have taken the precauion of logging a call, & the event log was collected. A filtered event log pertaining to the error only, is copied below
5-Flare version is 2.19

thanks again for the kind interest

=============

1.
Date:05/07/2008
Time:09:51:35 PM
Event Code:0xfd5
Description:WSAGetLastError() returned error: An address incompatible with the requested protocol was used.
Subsystem:CK200061400543
Device:N/A
SP:N/A
Host:DFCC-SPB
Source:TlntSvr
Category:NT Application Log
Log:NT Application Log
Sense Key:N/A
Ext Code1:N/A
Ext Code2:N/A
Type:Error

2.
Date:05/07/2008
Time:09:51:09 PM
Event Code:0x36d
Description:There was error [DATABASE OPEN FAILED] processing the driver database. 00 00 00 00 02 00 64 00 00 00 00 00 6d 03 00 c0 00 00 00 00 6d 03 00 c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Subsystem:CK200061400543
Device:N/A
SP:N/A
Host:DFCC-SPB
Source:Application Popup
Category:NT System Log
Log:NT System Log
Sense Key:N/A
Ext Code1:N/A
Ext Code2:N/A
Type:Error

3.
Date:05/07/2008
Time:09:51:09 PM
Event Code:0x4
Description:Dynamic strings:AMLI0xcfc0xcf8 - 0xcff 00 00 00 00 04 00 52 00 00 00 00 00 04 00 05 c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Subsystem:CK200061400543
Device:N/A
SP:N/A
Host:DFCC-SPB
Source:ACPI
Category:NT System Log
Log:NT System Log
Sense Key:N/A
Ext Code1:N/A
Ext Code2:N/A
Type:Error

4.
Date:05/07/2008
Time:09:51:09 PM
Event Code:0x5
Description:Dynamic strings:AMLI0xcf80xcf8 - 0xcff 00 00 00 00 04 00 52 00 00 00 00 00 05 00 05 c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Subsystem:CK200061400543
Device:N/A
SP:N/A
Host:DFCC-SPB
Source:ACPI
Category:NT System Log
Log:NT System Log
Sense Key:N/A
Ext Code1:N/A
Ext Code2:N/A
Type:Error

5.
Date:05/07/2008
Time:09:51:09 PM
Event Code:0x36d
Description:There was error [DATABASE NOT LOADED] processing the driver database. 00 00 00 00 02 00 64 00 00 00 00 00 6d 03 00 c0 00 00 00 00 6d 03 00 c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Subsystem:CK200061400543
Device:N/A
SP:N/A
Host:DFCC-SPB
Source:Application Popup
Category:NT System Log
Log:NT System Log
Sense Key:N/A
Ext Code1:N/A
Ext Code2:N/A
Type:Error

6.
Date:05/07/2008
Time:09:48:43 PM
Event Code:0x904
Description:VSC Shutdown/Removed
Subsystem:CK200061400543
Device:Enclosure 0 Power A
SP:SPA
Host:DFCC-SPB
Source:N/A
Category:N/A
Log:Storage Array
Sense Key:0x0
Ext Code1:0x0
Ext Code2:0x4
Type:Error

7.
Date:05/07/2008
Time:09:48:40 PM
Event Code:0x941
Description:Battery Online
Subsystem:CK200061400543
Device:Enclosure 0 SPS A
SP:SPA
Host:DFCC-SPB
Source:N/A
Category:N/A
Log:Storage Array
Sense Key:0x0
Ext Code1:0x0
Ext Code2:0x1
Type:Error

8.
Date:05/07/2008
Time:09:48:39 PM
Event Code:0x941
Description:Battery Online
Subsystem:CK200061400543
Device:Enclosure 0 SPS B
SP:SPA
Host:DFCC-SPB
Source:N/A
Category:N/A
Log:Storage Array
Sense Key:0x0
Ext Code1:0x0
Ext Code2:0x1
Type:Error

9.
Date:05/07/2008
Time:09:48:39 PM
Event Code:0x908
Description:Fault - Cache Disabling
Subsystem:CK200061400543
Device:SP A
SP:SPA
Host:DFCC-SPB
Source:N/A
Category:N/A
Log:Storage Array
Sense Key:0x0
Ext Code1:0x0
Ext Code2:0x0
Type:Error

10.
Date:05/02/2008
Time:04:59:27 PM
Event Code:0x2580
Description:Storage Array Faulted Bus 0 Enclosure 0 : Faulted Bus 0 Enclosure 0 SPS B : Removed SP B : Removed
Subsystem:CK200061400543
Device:N/A
SP:N/A
Host:DFCC-SPB
Source:N/A
Category:N/A
Log:Application
Sense Key:N/A
Ext Code1:N/A
Ext Code2:N/A
Type:Error

11.
Date:05/02/2008
Time:04:59:21 PM
Event Code:0x944
Description:Hard Peer Bus Error
Subsystem:CK200061400543
Device:SP A
SP:SPA
Host:DFCC-SPB
Source:N/A
Category:N/A
Log:Storage Array
Sense Key:0x2
Ext Code1:0x0
Ext Code2:0x0
Type:Error

12.
Date:05/02/2008
Time:04:59:21 PM
Event Code:0x944
Description:Hard Peer Bus Error
Subsystem:CK200061400543
Device:SP A
SP:SPA
Host:DFCC-SPB
Source:N/A
Category:N/A
Log:Storage Array
Sense Key:0x1
Ext Code1:0xaa975cf4
Ext Code2:0x0
Type:Error

13.
Date:05/02/2008
Time:04:59:12 PM
Event Code:0x908
Description:Fault - Cache Disabling
Subsystem:CK200061400543
Device:SP A
SP:SPA
Host:DFCC-SPB
Source:N/A
Category:N/A
Log:Storage Array
Sense Key:0x0
Ext Code1:0x0
Ext Code2:0x0
Type:Error

Message was edited by:
159deka

kelleg · Answer

The key entry is the message:Description:Hard Peer Bus ErrorThis generally indicates that the SP reporting the error could not contact its peer SP - this could be caused by a number of things but most likely there was a reboot of one of the SPs.Service will be able to tell what caused this - may be a patch levelregards,glen kelley

159deka · Answer

A case was opened, & SP collects requested have been sent.Awaiting further details/instructions

159deka · Answer

The latest development is that SP B needs to be replaced. And we plan to do it this evening.

However two concerns before we do that.
We have implemented soft zoning, hence the zoning configuration will go for a fix the moment we have a new SP with new two new WWNs, as the zoning is based on WWNs in soft zoning. The easiest remedy that I can see it is if we can edit the alias names to have the new WWNs. Can that be done ?

Second concern is can this be done on-line as we plan to do it online ? Will there be a inherent disabling/enabling of zone configurations which we hinder work ?

A response ASAP is appreciated.

RRR · Answer

wwn's of a new SP do not change !!! So don't worry about that.

The replacement can be done online as long as all hosts are HA connected (each host with 2 HBA's and connected to both SP's and failover software implemented (VMware or Powerpath))

nandas · Answer

Replacing a SP will not change the WWN. The world wide number is associated with the Storage Processor Enclosure - new SP will get the same WWN - so no worries at all.

The activity is Online - however, SPB will be removed from the System - means, all LUNs owned by SPB will be trespassed to SPA - ensure all the hosts connected are running proper failover software. It may be a good idea to do this activity during low I/O period.

EMC Customer Engineer who will be doing this activity may guide you properly.

Finally we all are so glad to see that, your post on this forum helped to successfully identify the issue.

Cheers,
Sandip

159deka · Answer

Thankks sandeep..yes the SP was replaced last evening. However it has not yet come on-line. So we are working with EMC to see whether there is anything elso wrong. It has been in 'POST' level for the last 13 hours or so.GEnerally ho long does it take to update teh new SP & bring it on-line

RRR · Answer

Minutes, not even 1 hour.

dynamox · Answer

i had an instance where Dell shipped two replacement SPs and both were DOA.

RRR · Answer

if Georgia then if Atlanta then if no Jamaica then sp := fail; end_if end_if end_if

dynamox · Answer

ahahaha ...you need to stop hanging out with Stefano

CLARiiON

Was this post helpful?