Unsolved

This post is more than 5 years old

5 Posts

823

February 13th, 2009 09:00

Boot from SAN

Hello All!

As this is my first post in the forum, I wanted to take a second to thank all of you who have answered previously, there is a TON of great information here, and you guys rock!

On to my question...our current setup: NS20, iSCSI only, no CIFS, NFS, or fibre. HP DL365 with QLogic QLA4052C-E. These are dual-port HBAs, which have a port to two separate switches. Each port is on it's own subnet (see below). The purpse is to use MS MPIO to allow redundancy for switch failure (the HP servers only have 1 PCI-X slot). The card can see the SAN, the SAN can see the card, it *should* work, but Windows will not see the LUN. LUN shows up correctly during boot, and when I am looking at server_log server_2 -f -s on the NS20, this is what it shows:

2009-02-13 11:58:08: ISCSI: 6: Accepted a connection from host: 192.168.150.80:3260
2009-02-13 11:58:08: ISCSI: 3: IscsiTargetPort::createSession: session reinstatement - new sess=0xdd664404, old sess=0xddb5a404 ini=iqn.2000-04.com.qlogic:qla4052c.gs10821a60080.1, ISID=T:0x01 A:0x00 B:0x000f C:0x21 D:0xeab0
2009-02-13 11:58:08: ISCSI: 3: IscsiSession::closeSession: I_T nexus lost, sess=0xddb5a404 from [iqn.2000-04.com.qlogic:qla4052c.gs10821a60080.1,i,0x40000f21eab0_iqn.1992-05.com.emc:apm000837044780000-1,t,0x1]
2009-02-13 11:58:32: ISCSI: 3: IscsiSession::closeSession: I_T nexus lost, sess=0xdd7db004 from [iqn.2000-04.com.qlogic:qla4052c.gs10821a60080.2,i,0x40000f21eab0_iqn.1992-05.com.emc:apm000837044780000-1,t,0x1]
2009-02-13 11:59:34: ISCSI: 3: IscsiSession::closeSession: I_T nexus lost, sess=0xdd664404 from [iqn.2000-04.com.qlogic:qla4052c.gs10821a60080.1,i,0x40000f21eab0_iqn.1992-05.com.emc:apm000837044780000-1,t,0x1]

I am unsure why we are receiving the "I_T nexus lost" part. The card is set so only one port has the boot LUN (the 192.168.150.x network), and the other is disabled.

We have two Cisco 2960G 48 port switches that are stricly iSCSI traffic (although there is a port for managment to LAN, that's why iSCSI is on VLAN 5). Here is the relevant config from one of those:

NS20 interface
interface Port-channel1
switchport access vlan 5
switchport trunk allowed vlan 1
switchport mode access
flowcontrol receive desired
spanning-tree portfast

Host interface
interface GigabitEthernet0/25
switchport access vlan 5
switchport mode access
flowcontrol receive desired
spanning-tree portfast
spanning-tree bpdufilter enable

As you can see, it is configured as an Etherchannel group, and the NICs on the NS20 are also configured as such, as follows:

cge0, cge4 -> ec0 (etherchannel0) -> IP 192.168.150.1 (switch 1)
cge1, cge3 -> ec1 (etherchannel1) -> IP 192.168.160.1 (switch 2)

There is no fsn set up (at this point in time), and we have spent the past two weeks banging out heads against the wall. We've called QLogic, who told us to check the card - we've tried 3. We changed our settings on the switches, changed the settings on the cards, played with cables, and we're stuck.

If anyone has any ideas, I would greatly appreciate them. We have an outage scheduled for this evening, and any changes that would require a reboot / etc. will have to be performed then.

Please let me know if you need more information / clarification / etc.

Thank you in advance!!!

Tim

11 Legend

 • 

20.4K Posts

 • 

87.4K Points

February 14th, 2009 23:00

so you are about to load windows ?

5 Posts

February 16th, 2009 07:00

Correct. During the Windows setup process, the LUN does not show up.

As I stated earlier, we had an outage on Friday night to work on this, and this is what we found:

Disabled layer 2 VLANs on the Cisco switches. Once we did that, the errors stopped (for the switch we modified), and we were able to boot from SAN. Not sure why adding the storage ports to a layer 2 VLAN would cause these issues, but there it is. Since these are strictly iSCSI switches, this is not a big deal, I just wanted to separate the managment traffic (which does go to the LAN) from the storage traffic, even though they are on completely different subnets.

I would still be interested to know if anyone has any idea why these errors show up, though.

Thanks!

Tim

6 Operator

 • 

8.6K Posts

February 16th, 2009 09:00

I'm not a Cisco expert but maybe you had by accident also enabled VLAN tagging and the cards werent setup to deal with it

6 Operator

 • 

8.6K Posts

February 16th, 2009 10:00

If you havent already - also take a look at

EMC Host Connectivity with QLogic Fibre Channel and iSCSI Host Bus Adapters (HBAs)
and Converged Network Adapters (CNAs) in the Windows Environment
P/N 300-001-164


available from Powerlink

5 Posts

February 16th, 2009 10:00

We tried that too, to no avail. It wasn't until we removed the configs from the ports that things went back to normal. Either way, it's working (the boot part), and we didn't *really* need the VLANs, it was just to appease my boss a bit.

In regards to the above post about Layer 3 VLAN (or tagging), the config you see was copied directly out of running code, so no layer 3 VLANs at all.

It's not much of an issue, except that I'm seeing some of our working non-SAN-boot servers show the above errors on the NS20, even though access to the LUNs is OK.

Thanks again guys for your input,

Tim

6 Operator

 • 

8.6K Posts

February 16th, 2009 10:00

or a side effect of this comment in the support matrix:

QLA4010 iSCSI boards booting from external storage require "Spanning Tree" features to be disabled on network switches used in the boot path.

5 Posts

February 16th, 2009 11:00

Yeah, I've been through that about a hundred times in the last week, just to make sure that everything is correct (which it appears to be).

There is just something not quite right about the whole setup, as I'm seeing these errors on boot now from the box I'm working on, as well as from our file server (non-SAN-boot). Lots and lots of them, and looking at the MS iSCSI initiator details for the LUN (multipathing between the two networks), the ports keep logging in / out every three (3) seconds. Really odd...

Rainer, thanks again for your help, I really appreciate it,

Tim

6 Operator

 • 

8.6K Posts

February 16th, 2009 13:00

Your're welcome

I couldnt find these "I_T nexus lost" message in the knowledgebase or the documentation so you probably have to open a service request.

My best guess is that it could to be an interop issue between Qlogic HBA firmware, network infrastructure und MS ISCSI framework ....

Rainer

5 Posts

March 17th, 2009 13:00

Well, we finally found the cause of our boot-from-SAN issues, and the nexus_lost error messages. There were two different problems causing the issues. The first was the boot-from-SAN, which ended up being the layer 2 VLANs. If we removed the VLAN, it worked great, put the servers back into the VLAN, and they stopped working. Disabled all spanning tree on the VLAN, which still didn't work. Disabled spanning tree on the switch (or so we thought), and that didn't work. Disabled spanning tree on the VLAN, and that fixed our problem straight away. This is the code for the switch:

no spanning-tree vlan 5

Onto the second issue - the nexus_lost error messages. We had been updating the firmware / boot code for all of the HBAs, which went flawlessly. The only problem was - most of the cards were already set up prior to the updates, and we were not resetting to factory defaults after updates. Once we went through and reset all the cards to factory defaults and reconfigured, no more errors, and the NS20 has been quiet ever since.

Thank you to everyone who helped with this issue!

Tim
No Events found!

Top