Start a Conversation

Unsolved

This post is more than 5 years old

T

3847

January 22nd, 2010 06:00

Unable to format LUN on Solaris 10

Hello - I hope someone here can assist me with this issue. I have been beating my head against the wall over this and have exhausted all possibilities.

I have 2 x Sun V480 with an Emulex LP10000-E card connected to a Clariion CX3-10f. I'm using the SUN branded Emulex driver provided with the OS. This server is a new install "no patches" what soever. One server has full access to the LUN that's assigned to it the other can see the LUN device but I am not able to format it nor can I mount it. I'll call the non-working server Server-B and the working server Server-A.

Server-A Config ( used fcinfo command to get results )
HBA Port WWN: 10000000c946878a
OS Device Name: /dev/cfg/c1
Manufacturer: Emulex
Model: LP10000
Firmware Version: 1.91a1 (T2D1.91A1)
FCode/BIOS Version: Boot:5.00a7 Fcode:1.41a4
Serial Number: VM51733465
Driver Name: emlxs
Driver Version: 2.31p (2008.12.11.10.30)
Type: L-port
State: online
Supported Speeds: 1Gb 2Gb
Current Speed: 2Gb
Node WWN: 20000000c946878a
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 10
Loss of Signal Count: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 78894
Invalid CRC Count: 0

Server-B Config ( used fcinfo command to get results )
HBA Port WWN: 10000000c9468c76
OS Device Name: /dev/cfg/c1
Manufacturer: Emulex
Model: LP10000
Firmware Version: 1.91a1 (T2D1.91A1)
FCode/BIOS Version: Boot:5.00a7 Fcode:1.41a4
Serial Number: VM51734366
Driver Name: emlxs
Driver Version: 2.31p (2008.12.11.10.30)
Type: L-port
State: online
Supported Speeds: 1Gb 2Gb
Current Speed: 2Gb
Node WWN: 20000000c9468c76
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 6
Loss of Signal Count: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 32
Invalid CRC Count: 0

Whenever I try to format the Server-B LUN I get lots of errors.

# format
Searching for disks...Jan 19 16:57:07 celhsol0200 scsi: WARNING: /pci@8,600000/lpfc@1/fp@0,0/ssd@w5006016941e0ca40,0 (ssd2):
Jan 19 16:57:07 celhsol0200     drive offline
Jan 19 16:57:07 celhsol0200 scsi: WARNING: /pci@8,600000/lpfc@1/fp@0,0/ssd@w5006016941e0ca40,0 (ssd2):
Jan 19 16:57:07 celhsol0200     drive offline


The device does not support mode page 3 or page 4,
or the reported geometry info is invalid.
WARNING: Disk geometry is based on capacity data.

The current rpm value 0 is invalid, adjusting it to 3600
done

c1t0d0: configured with capacity of 734.08GB


AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@8,600000/lpfc@1/fp@0,0/ssd@w5006016941e0ca40,0
1. c2t0d0
/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e010a8ec21,0
2. c2t1d0   bootmirr
/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w2100000c50fd5457,0
Specify disk (enter its number):


The drive that is reporting offline /pci@8,600000/lpfc@1/fp@0,0/ssd@w5006016941e0ca40,0 is the LUN from the EMC Clariion

Here's is the mapped drive
c1t0d0s0 -> ../../devices/pci@8,600000/lpfc@1/fp@0,0/ssd@w5006016941e0ca40,0:a

It almost seems like I can read the drive but cannot write to it. There isn't any security within the EMC Navisphere tool to restrict write access. So I'm stuped as to why I cannot format this LUN.

Any help is very much appricated

4.5K Posts

January 22nd, 2010 09:00

Todd,

The Array LUN number is OK, it's the Host ID number that I wan concerned about.

Open the storage group for the host - you should see thre tree items lists - open the Host tree item and you should see the host listed - right click on the host and select Connectivity Status - the paths should be listed as Logged In and Registered - make sure all paths are listed correctly.

For the LUN 9, right click on the LUN and select Properties - on the General tab it should list the Default Owner and Current Owner. Make sure that the Current is the same as the Default and the the SPB that owns the LUN is also one the paths from the host to the array from the Connectivity status.

Are you using PowerPath?

glen

7 Posts

January 22nd, 2010 09:00

HI Glen and thanks for the reply. The LUN ID from the Clariion side is presented as LUN 9 on the non-working Host. The Host ID is '0'. I tried removing and readding the LUN but it is still presented as LUN 9. The other working servers LUN is presented as LUN 8.

I ran luxadm against both server to compare and Server B "non-working" has a Path status of Not ready, while Server A "working server" Path status is O.K. plus it shows the Read cache information where on Server B it does not. So is this an issue on the host side of the Clariion side, I'm not sure. How could I check the Clariion side to make sure everything there is ok?

Non-Working Server

luxadm display /dev/rdsk/c1t0d0s2
DEVICE PROPERTIES for disk: /dev/rdsk/c1t0d0s2
  Vendor:               DGC
  Product ID:           RAID 5
  Revision:             0326
  Serial Num:           APM00082001710
  Unformatted capacity: 751735.000 MBytes
  Device Type:          Disk device
  Path(s):

  /dev/rdsk/c1t0d0s2
  /devices/pci@8,600000/lpfc@1/fp@0,0/ssd@w5006016941e0ca40,0:c,raw
    LUN path port WWN:          5006016941e0ca40
    Host controller port WWN:   10000000c9468c76
    Path status:                Not Ready

Working Server

luxadm display /dev/rdsk/c1t0d0s2
Password:
DEVICE PROPERTIES for disk: /dev/rdsk/c1t0d0s2
  Vendor:               DGC
  Product ID:           RAID 5
  Revision:             0326
  Serial Num:           APM00082001710
  Unformatted capacity: 819200.000 MBytes
  Read Cache:           Enabled
    Minimum prefetch:   0x0
    Maximum prefetch:   0x0
  Device Type:          Disk device
  Path(s):

  /dev/rdsk/c1t0d0s2
  /devices/pci@8,600000/lpfc@1/fp@0,0/ssd@w5006016141e0ca40,0:c,raw
    LUN path port WWN:          5006016141e0ca40
    Host controller port WWN:   10000000c946878a
    Path status:                O.K.

4.5K Posts

January 22nd, 2010 09:00

On the Clariion - look in the Storage Group that contains the LUN and server B - right click on the storage group name and select "Select LUNs" - this should give you a list of the LUNs assigned to this host. There is a column called Host ID - the first LUN should be 0 - if not, then you need to move the LUN out of the storage group then add it back in and before clicking on apply make sure that the Host ID is zero - yu cna click in the column to engage a drop down to select the Host ID number. If you do not have a Host ID 0, the array will present a LUNZ to the host - this looks like a LUN to the host, but you can't do anything with it.

glen

7 Posts

January 22nd, 2010 09:00

In looking at the Storage Processor event log I see a few events related to that host but am not sure what it means:

Initiator (20:00:00:00:C9:46:8C:76:10:00:00:00:C9:46:8C:76) on Server (celhsol0200) registered with the storage system is now inactive. It does not have a working physical connection. See Navisphere Manager for details.

7 Posts

January 22nd, 2010 10:00

The Host connectivity status looks good. The owner and current owner of the LUN are the same SP-A. Here's a thought...The host is plugged into SP-B Would that matter? Oh and no I am not using PowerPath.

Hmm - The working host is plugged into SP-A...could this be the issue because SP-B is not active at the moment?

4.5K Posts

January 22nd, 2010 10:00

Right click on the LUN and select Trespass - that will move the LUN to SPB -

glen

7 Posts

January 22nd, 2010 10:00

That was it....Thanks for your help....

61 Posts

January 22nd, 2010 10:00

Hello,

Your host now has access to the LUN, but only through SPB.  Your problem looks like their was an issue with multipathing (especially considering you noted that there is no PowerPath) which means their is a high likelihood that you have a single point of failure.  I would recommend you continue to trouble shoot the issue until you identify and resolve the reason why the host could not access the LUN through SPA.  Otherwise, a failure of SPB, the cable, the switch port, or the HBA would result in a situation where the host would have no access to the LUN.

Also, if you have a switch in your environment (a single HBA connected to a switch, which is connected to both SPs) you may consider installing the free version of PowerPath to provide basic failover functionality.

4.5K Posts

January 22nd, 2010 10:00

Todd,

There are two issues at work - the zoning from the host to the array and the use of failover software.

If you only have a single HBA on the host you can zone the HBA to both SPA and SPB - that means you have some level of protection in case one of the paths fails but none if the HBA fails. The host will only see the actual LUN down the path that is the Current Owner - with one path to SPB and the LUN owned by SPA, you could not access the LUN until you trespassed the LUN to SPB. You should now change the Default Owner to SPB. This also means that on the array the failover mode is set to 1 - see below for more info on the failover.

For the above to work, you need failover software on the host - if you do not have failover software, then you need to be aware of the failover mode settings on the array for the host. Without failover software on the host, the failover mode must be set to 0 (zero) and with failover software the mode is set to 1 - depending on the type of failover software.

On PowerLink look for Knowledgebase Article emc99467 for more information on the array settings for different operating systems and failover software.

There is also a little used setting on the LUN properties called "Auto-Trespass" - this is only enabled when the host does not have any fialover software - this will trespass a LUN if one of the SP's dies. Failover mode would probably be set to 0 in this case.

glen

7 Posts

January 22nd, 2010 11:00

I was under the impression that if my HBA only had a single port I could only connect to one SP. So your saying that if I use mpath software I can still only have one physically connection but be able to failover to the other SP in the event the current one goes down?

4.5K Posts

January 22nd, 2010 12:00

If you have one HBA and you are connecting to the array using a switch, you can create two zones:

HBA <--> SPAx

HBA <--> SPBx

x = port number

Then if not using failover software set the failover mode on the array for this host to 0 - that will allow the HBA to see the LUN down the path that owns the LUN. Then set the LUN properties "auto-trespass" enabled - then if an SP files the array will failover (trespass) the LUN to the other SP - what happens on the operating system is not clear to me - you probably need to test that by blocking the zone to one of the SP's  and manually trespassing the LUN to the other SP. Auto-trespass will only trespass the LUN when the SP fails so that's a bit hard to test.

BTW, there is a version that comes with each array called PowerPath Basic that provides for one HBA and two paths - this should be in your user CD's - you also need to go to PowerLink and register all the software that came with the array including the Navisphere Agent/CLI and the array Navisphere and PowerPath Basic - that way you can go to PowerLink and download the most current software.

glen

7 Posts

January 22nd, 2010 12:00

at the moment we are connecting directly to the Clariion. So other than an auto trespass conducted by the Clariion I don't see mpath being used. I would assume I'd need either a dual HBA with each one connected to a different SP or like you said a fabric switch in the middle so you could create two connections one to each SP. So I guess at this point I'll have to rely on the auto trespass way and hope it works.

4.5K Posts

January 22nd, 2010 13:00

Todd,

Auto-trespass only works if you have two paths to the array - one to SPA and one to SPB. With a direct connection you can only have one path. If SPB dies, you would need to physically move the cable to SPA and have the host log in on that port - if you did this manaully - take the cable and connect to SPA, go into the Storage Group and add the new path into the Stirage Group, then if one SP failed, you should be able to move the cable to the other SP and the path would already be in the Storage Group.

This is not a very good solution - you might want to look at a dual port HBA if you are slot limited or if you have an empty slot a second HBA connected to SPA would also work. Again, if you do not use failover software, then you need make the failover mode setting on the array 0 for each path and set auto-trespass on the LUN.

Best solution is two HBAs and using a switch along with failover software (like PowerPath) - then you could get automated failover and load balancing:

HBA1 <--> SPA0

HBA1 <--> SPB1

HBA2 <--> SPA1

HBA2 <--> SPB0

Then when the host talks to LUN x owned by SPA, both HBAs would see LUN x and the host would send IO down both HBA's.

glen

No Events found!

Top