9 Legend

 • 

20.4K Posts

June 22nd, 2009 20:00

interesting indeed, Jeff can you take this 35G LUN and take it out of the storage group, hit apply. Put it back into storage group but before you click apply scroll the left and you will see a column called "Host id". By default Navisphere will try to use the lowest number available but you go ahead and set it to something high, whatever the highest Host id is plus 1. I am just thinking that maybe something is stuck in the kernel and moving it to a higher host id will let it come in ok. Worth a shot.

26 Posts

June 23rd, 2009 18:00

I'll give it a try and see. It doesn't appear to be tied to the lun ID for the host though. The original 58 gig lun was lun ID 44 and had the errors. When it was destroyed and the 1 gig lun created first, the 1 gig lun became lun ID 44 and that worked. The remaining piece became lun 45 and both size creations of that have failed. The lun migrated from the other server worked on server A with the array ID and got a new lun ID when it went to server B. It has I/O errors on server B. When it was put back to server A, there were no errors for that same lun back on host A.

I've been more of a hpux guy so I don't know if the device file creation is correct for linux. I've noticed that each time we have introduced a lun to server B, it gets a new emcpowerxx device name. On hpux it would assign a new disk to the lowest available cxtxdx device name.

Is it possible that whatever is managing the lun mapping for PowerPath is somehow corrupted? If its not updating mappings correctly I can see problems occurring. I just don't know how to trace or fix something like this :-)

Jeff

9 Legend

 • 

20.4K Posts

June 24th, 2009 09:00

remove this lun from server B ,on server B run "powermt check" and remove any dead luns/paths it finds, then present the lun back (rescan HBAs to make sure that LUN is seen by linux) and then run powermt config, powermt save.

26 Posts

June 25th, 2009 09:00

The admin had already removed the lun from the server and run the powermt check. That worked fine and the pvscan showed no errors on any disks. I added a problem lun back in, ran the ql-scan-lun.sh script so the host and adapter could find the lun. I then ran the powermt config and the save which ran with no errors. When I ran a pvscan I get this error on the new lun:

/dev/emcpowercg: read failed after 0 of 4096 at 104152891392: Input/output error

This is one of the luns that was working fine on server A previously. The local admin used another disk from server A that was previously in the same file system as this lun and it works fine. I guess I need to get a call open and have someone look at the SP collects. Nothing shows up in the SP event log.

Jeff

2 Intern

 • 

1.3K Posts

June 25th, 2009 10:00

first of all which exact version of RHEL you have ( which update) , 32 or 64 bit?ALso make sure the qlogic script you are running supports . We have seen our LINUX RHEL AS4 U6 64bit serevrs rebooting when qlogic script is used.(what we learned is that script WE USED supports 32bit)

What you reported from dmesg is quite common and it comes when new devices are scaned.But why you get a I/O error during pvscan is a mystery.

26 Posts

June 26th, 2009 08:00

The qlogic scripts we use do seem to work fine. It is the 1.6 release version of their tools. The OS on this system is RHEL WS 4.5.

The admin at that office is going to open a call with EMC and see if they can get to the bottom of it. We'll see if they are able to figure anything out.

Thanks,

Jeff
No Events found!

Top