Unsolved

This post is more than 5 years old

5 Posts

1467

April 25th, 2008 01:00

Cluster validation failing

Help! I'm stuck - I've installed a Windows 2003 server to connect to a Clariion CX3-40 SAN, it's a HP server with a FC2143 HBA.

Installed the storport fix
I installed the HP HBA drivers from the HP web site (tried the EMC website before)
upgraded the bios to the level in the support matrix from the Emulex web site
installed the navisphere agent,
installed the powerpath 5.1 agent.
I do not use any LVM other than the one provided by the OS.

Configured 2 nodes and Fibre disks to the same storage group.

I can see the disk storage. Configured up the storage and then run clusprep or the microsoft cluster configuration validation wizard.

On running the full tests it takes a few hours (particularly slow on the Validate Existience on reserves test), but fails on the Validate Disk access Latency. With something along the lines:

Failed to access cluster disk 1, or disk access latency of 3000 ms from node servera is more than the acceptable limit of 1000 ms status 0

Do I have to use different Emulex driver parameters or are the ones provided by the HP drivers ok?

Would the LVM affect this result?

Any help?!

Alex

142 Posts

April 25th, 2008 02:00

The error message isn't very clear, it doesn't say if it is an access problem or a performance problem.

I would first test if the access of disk 1 is OK : just create a volume on the LUN and test if you can copy data on it. Then switch to the other node and see if it can access the LUN too. Then destroy the volume if it is OK.

What is the version of your drivers ? I always download EMC Storport drivers on Emulex site (recently emcstorportminiportkit_1-30a6-1b.exe).

Can you tell us more about the size of the LUN, the RAID type, disk type and speed ?

In my opinion if you use the LUN in a LVM it can not be faster than directly in Windows disk management.

5 Posts

April 25th, 2008 02:00

I can access all the disks and copy data onto them no problem, and see the data on the other node separately.

I've tried the version 130a9 from the Emulex and the 201a4 from the HP side. The HP later drivers allow me to see more of the SP's through powerpath.

1st disk quorum is 2GB from a 15000RPM drive Fibre Channel RAID 5
2nd-5th disk is 402GB each15000RPM drives Fibre Channel RAID 5

410 Posts

April 25th, 2008 03:00

are there any disk timeout errors in windows error log?

5 Posts

April 25th, 2008 03:00

I can see are Emulex SvcMgr 1030 event messages
Device remove complete: re-started servers

at the end of all the tests, and elxstor 118 message like:

The driver for device \Device\RaidPort0 performed a bus reset upon request.

on one of the nodes I get

emcmpio event 100: Path Bus 3 Tgt 0 Lun 3 to CK200071800361 is dead.
then
elxstor event 118: The driver for device \Device\RaidPort1 performed a bus reset upon request.
then about 5 seconds later an
emcmpio event 101:Path Bus 3 Tgt 0 Lun 3 to CK200071800361 is alive.

this happens to the other luns.

One of the tests is a device reset test - should I be concerned about this?

2.2K Posts

April 25th, 2008 08:00

It may be a problem with the clusprep tool. I wouldn't trust it 100% to validate the performance of a SAN configuration. If it reported latency issues on a SAN attached LUN then a better performance analysis tool would be to run something like SQLIOSim stress test tool from Microsoft. It will tell you if you have latency issues with your drives.

Also be sure that you run stress tests with only one node online. If you have both nodes online accessing the same LUN that has a file system with clustering enabled, you could corrupt the file system.

4 Operator

 • 

4.5K Posts

April 25th, 2008 12:00

When connecting to EMC arrays, you should always use the HBA drivers from the EMC section of the HBA vendors WEB page. If I look at this section, I do not see your HBA listed. This "could" mean that this HBA is not supported by EMC - either it has not been tested yet or some other reason.

You might want to check on PowerLink in the E-Lab section:

https://elabnavigator.emc.com/do/navigator.jsp

See if your HBA's are supported and what drivers are required.

regards,

glen kelley

5 Posts

April 25th, 2008 15:00

The problem is although it's on the EMC support matrix, it isn't on the Microsoft supported list.

The clusprep tool should be run before setting up the cluster and both nodes need to be online before you run it, it recommends zoning or LUN masking before running so that it can install a cluster agent on the nodes.

5 Posts

April 25th, 2008 15:00

I've looked at the EMC support matrix and the FC2143 was listed, just didn't say what driver I needed to install from, both are Emulex drivers...

The support company for the EMC said I had to use the HP ones so I've been going with that - I've now found something on this web site telling me I should use the EMC equivalent on the Emulex web site LP1150.. I've tested that before but am testing again.

The strange thing there is that the HP drivers sees If I recall correctly 6 SP's whereas the EMC version sees 4..

When I tested with the version 2 HP HBA drivers I was able to LUN mask all but one path and it passed the tests with those.. I've yet to retest to see if it is just luck..

In addition the test takes around 4 hours - especially on the 'Validate Existence of Reserves' test (which eventually passes)

0 events found

No Events found!

Top