Start a Conversation

Unsolved

This post is more than 5 years old

N

81053

May 1st, 2013 03:00

Windows cluster failover fails with EqualLogic iSCSI volumes

Hi,

We have windows 2012 and sql 2012 failover cluster with iSCSI EqualLogic storage.  We tried to failover to another cluster node but volumes reported many error messages in windows logs (no errors on EqualLogic). One K volume even didn’t come online.

-------------------------

Windows Error messages:

L, M, N drives:
The default transaction resource manager on volume T: encountered an error while starting and its metadata were reset.  The data contains the error code.
The file system structure on volume T: has now been repaired.
The file system structure on volume N: has now been repaired.
The IO operation at logical block address 12dfdf00 for Disk 3 was retried.

K drive:
A corruption was discovered in the file system structure on volume K:.
The Master File Table (MFT) contains a corrupted file record.  The file reference number is 0x6000000000006.  The name of the file is " ".
A corruption was discovered in the file system structure on volume K:.
The exact nature of the corruption is unknown.  The file system structures need to be scanned and fixed offline.

P drive:
Volume P: (\Device\HarddiskVolume38) needs to be taken offline to perform a Full Chkdsk.  Please run "CHKDSK /F" locally via the command line, or run "REPAIR-VOLUME " locally or remotely via PowerShell.
The default transaction resource manager on volume P: encountered an error while starting and its metadata was reset.  The data contains the error code.
Cluster disk resource ‘SQL2Logs' indicates corruption for volume '\Device\Harddisk6\ClusterPartition1'. Chkdsk is being run to repair problems. The disk will be unavailable until Chkdsk completes. Chkdsk output will be logged to file 'C:\Windows\Cluster\Reports\ChkDsk_ SQL2Logs _Disk6Part1.log'.
Chkdsk may also write information to the Application Event Log.
A corruption was discovered in the file system structure on volume P:.
The exact nature of the corruption is unknown.  The file system structures need to be scanned and fixed offline.

-------------------------

Windows iSCI initiator configuration:

Discovery targets:
I see all discovered targets as connected. If I click properties I see 2 Identifiers witch are not selected. 1 Target Portal Group, Status as connected and Authentication, Header Digest and Data Digest as None specified.  

Discovery:
Target portals – IP is specified and port 3260.

Target Favorites:
I see al drives on favorites.

Configuration:
iqn.1991-05.com.microsoft:sql2.company.local

Dell EqualLogic MPIO:
I found that 2 Source IPs are connected to 2 Target IPs on same volume but maybe this for redundancy.

------------------------- 

EqualLogic configuration:

Group configuration:
VDS/VSS: CHAP user – administrator, IP address: *, iSCSI Initiator: *

Volumes settings:
Access: CHAP user: *, IP address: *, iSCSI initiator: both Windows servers.

------------------------- 

I would be appreciated for any recommendations how to solve these errors,
thanks 

5 Practitioner

 • 

274.2K Posts

May 1st, 2013 05:00

Just to be clear, you are running the MS Cluster services with those servers, and the volumes are configured cluster volumes.   Not just connecting two servers to the same volume, correct?

If so, then please open a support case with Dell.  

Regards,

5 Practitioner

 • 

274.2K Posts

May 1st, 2013 09:00

What verison of FW is running on the arrays?  

There's no configuration issue that I can think of that would account for what you are seeing.  

89 Posts

May 1st, 2013 09:00

Hi Don,

yes disks are connected to both server and cluster service is configured to use them as cluster resources. It is strange because some disks came online without errors and some not, could it be some configuration issue or maybe even on EQL HIT MPIO? Is anything I could check?

We don't have primary support from Dell

thank you

5 Practitioner

 • 

274.2K Posts

May 1st, 2013 10:00

No. It's not an authentication issue.

4 Operator

 • 

1.9K Posts

May 1st, 2013 10:00

If you are responsible for the unit.... sign up for a account on the www.equallogic.com website. The information about the array (model number and servicetag together with the serialnumber of the backplane) are commited during this step. More units can be registered later if your account is aproved. Its a manual workflow and may take some time so dont worry.

We do this for our customers during the installation of the unit.

Regards,

Joerg

89 Posts

May 1st, 2013 10:00

Is there anything else could be checked? We use PS6100 model 70-0400, I even can not check if new firmware version is available as it requires to register with some information witch I don't have.

89 Posts

May 1st, 2013 10:00

could it be solved if I enable CHAP authentication? thanks

5 Practitioner

 • 

274.2K Posts

May 1st, 2013 10:00

Without diags and logs, there's not a known problem that explains what you are seeing.

The most current available firmware is 6.0.4.  

6100's are very new, they come with a 1YR warranty standard.   Where did you get this array?  

89 Posts

May 1st, 2013 10:00

Could it be some jambo frames, if so is it any way to check that? I am not managing network, but maybe that could cause the issues?

thanks

5 Practitioner

 • 

274.2K Posts

May 1st, 2013 10:00

Problems with jumbo frames typically result in dropped connections.   Which would be seen on the EQL events.

89 Posts

May 1st, 2013 10:00

FW version 5.2.

Also on iSCSI initiator Dell EqualLogic MPIO tab pressed MPIO settings and Auto-Snapshot manager started. It say another cluster host should be added to Auto-Snapshot Manager. Could some sttings be incorrect there?

thanks  

No Events found!

Top