Disk Jockey

1 Rookie

•

92 Posts

0

2909

January 13th, 2009 12:00

Several not logging into CX700 SPA2 R26

6 hosts W2k3, 6 hosts Intermittent ESX 3.0.2 and, Celerra running 5.6 missing path.

Brocade switches.

Connectivity status shows not logged in. PPath shows path to SPA2 dead or only 3 paths instead of 4. Celerra trespassed control luns. 5 ESX hosts intermittently logging out.

SPA0 is fine.

Not isolated to one switch and other hosts on this fabric are talking to SPA2 and SPA0 just fine.

Have set these ports to standby in PPath.

Tried to unregister and reregister one of these initiators-result was that initiator logged into array but refuses to register.

Anyone have ideas? Access Logix confused? Thinking of toggling the SPA2 switch port.

I have a case open, but it's a head scratcher.

Responses(26)

dynamox

1 Rookie

•

20.4K Posts

0

January 21st, 2009 11:00

try these two things, re-enable active zone set on the switch ..even though you have not done any changes to it. Second thing ..you might need to "tickle" your hosts to log back in. On windows go to Disk Management and hit Rescan. On ESX rescan in the Virtual Center/VIC as if you were presenting new LUNs.

DJ

Disk Jockey

1 Rookie

•

92 Posts

0

January 21st, 2009 13:00

I'll try those tasks. I spent some time reading Brocade release notes and found "Bug 6336972: Brocade 3800 and 3850 can incorrectly route frames, causing loss of storage access."

We have a pair of old 3800s and this array is attached to them. The bug is was fixed in FOS 3.2.1 and we have 3.2.0 running on these 3800s.

We plan to upgrade all of the Brocades to the latest EMC approved code on the 31st. I think 3.2.1c is the latest for the 3800.

I've wanted to get rid of these 3800s for some time, but our HP-UX guy never has time to export his VGs and reimport...

DJ

Disk Jockey

1 Rookie

•

92 Posts

0

January 21st, 2009 13:00

Yes, replaced the SP, Celerra logged in , the rest not.

AranH1

2.2K Posts

0

January 21st, 2009 13:00

Good idea. Wonder if it might be something that simple?

On windows if you want to use the cli just go to Run > diskpart and type rescan

AranH1

2.2K Posts

0

January 21st, 2009 13:00

Well them make sure the HPUX guy's ports "don't" work until he migrates off of the 3800s

DJ

Disk Jockey

1 Rookie

•

92 Posts

0

January 31st, 2009 13:00

OK, Upgraded all of the switches on the B fabric this morning to latest EMC approved fw and still no joy. We were a few revs back, so it took about 5 hours to do them all.

All of the switches except for the 3800 were HA reboots. I hope I don't need to go in and hard reboot all of them.

I was told that the Clariion is seeing resets:

- Below ktrace snippet shows host has logged in successfully but immediately sends a logout command to the array.

From Support:

A 01/17/09 19:43:53 TCD5 843ba3a0 Initiator 50014380029A47E2 Logging in... <--- initiator logged in - Win2k3
A 01/17/09 19:43:58 FCDMTL 9 (FE2/SC) fb0a9e20 Abort received of type 31038 from node name 50014380 029a47e2
A 01/17/09 19:45:07 FCDMTL 9 (FE2/SC) 84eb9c70 DVM RSCN Device affected by RSCN- Id: 41600, Device State: 3
Device State: 3 >>> FC_DVM_S_3_DEVICE_READY << received RSCN from switch
A 01/17/09 19:45:08 FCDMTL 9 (FE2/SC) 844a8020 DVM DDB Shtdwn, ID: 41600, Pri/Sec Err: d0000, State: 3
A 01/17/09 19:45:08 TCD5 844a8020 Initiator 50014380029A47E2 Logging in...
A 01/17/09 19:45:08 FCDMTL 9 (FE2/SC) 844a8020 Abort received of type 31032 from node name 50014380 029a47e2
.....................................................
A 01/17/09 19:46:16 TCD5 84e653a0 Initiator 50060B0000EF2863 Logging in... <--- initiator logged in - host ESX3.0.2
A 01/17/09 19:46:16 FCDMTL 9 (FE2/SC) 84e653a0 CALLBACK: LOGIN - loop id 212., pend/login/cb 01000102, tgt_context a3413c40
A 01/17/09 19:46:16 FCDMTL 9 (FE2/SC) 845f2b30 CALLBACK: LOGIN_REQ'D - loop id 212., pend/login/cb 00000102, accept 01200001
A 01/17/09 19:46:16 FCDMTL 9 (FE2/SC) 845f2b30 IOCTL DECREMENT_EVENT_COUNT to 0
A 01/17/09 19:46:17 TCD5 845f2da8 CC 02\04\03 LUN 0x2 Initiator 50060B0000EF2863 OpCode 0x00
...
A 01/17/09 19:46:41 FCDMTL 9 (FE2/SC) 845f3da8 Abort received of type 31032 from node name 50060b00 00ef2863 << abort from host
A 01/17/09 19:46:41 FCDMTL 9 (FE2/SC) 845f3da8 CALLBACK: LOGOUT - loop id 212., pend/login/cb 00010102, reason 012003ff

- It seems that immediately after the login is received, the clariion is receiving an abort from the initiator as indicated by the ktrace message included below

Abort received of type 31032 from node name 50060b00 00ef2863
CALLBACK: LOGOUT - loop id 212., pend/login/cb 00010102, reason 012003ff
type 31032 = CPD_ABORT_EVENT_INITIATOR
reason 012003ff = CPD_LOGIN_NO_INFO_AVAILABLE >>> indicates that an abort occurred with no reason available.

END Support Synopsis---------------------------------------------------------

From the host side SPA2 status is offline in SANSurfer.
We had to remove the zones from the Celerra to SPA2 because it was causing the Celerra management agents to hang, making the Celerra unmanageable.
The celerra cannot see any devices on that chain.

I wish management had appoved my Finisar sniffer. I would have had this solved in hours instead of weeks. I may get it after this fire drill.
Once again management gets a reminder about why we build dual redundant fabrics...

Does anyone know of an open source fibre packet capture tool? I know wireshark will do FCIP. Too bad no native FC support.
Still Stumped.

Message was edited by:
spaceman

DJ

Disk Jockey

1 Rookie

•

92 Posts

0

February 2nd, 2009 07:00

No resolution yet. The work described above was completed this past weekend with no change in results.

nandas

1.5K Posts

0

February 2nd, 2009 07:00

Hi,

I wish your issue has been resolved by now. Is there any update for us? Was it really a fabric issue or a fault SP on the CLARiiON? Or are you yet to get a solution?

Regards,
Sandip

DJ

Disk Jockey

1 Rookie

•

92 Posts

0

February 2nd, 2009 08:00

Escalated to SEV1. I'm going to ask for the Finisar.

DJ

Disk Jockey

1 Rookie

•

92 Posts

0

February 18th, 2009 19:00

Thanks for all of your input. We were able to punch a problem host directly into the switch where SPA2 resides. Host was able to log into SPA2 normally. The problem was loss of packets over ISLs. Problem resolved while inserting the Finisar to capture traffic over the ISLs. Moral of the story is to have your taps in place before the problem occurs. You may end up chasing your tail because as you toggle ISLs and cause fspf to redirect traffic, the problem can resolve itself...

nandas

1.5K Posts

0

February 19th, 2009 07:00

Thanks for the update and great to hear that the problem is solved and issue identified.

1
2

View All

No Events found!