MD3000 Windows event log error - event 10 & 801

Question

The StorageManager reports the array as optimal and there are no critical errors in its log. However, in the Windows Server 2003 Event Viewer/System there are numerous warnings (several a minute) repeated as below:

Event Type: Warning

Event Source: md3dsm

Event Category: NoneEvent ID: 10

Date: 2/4/2008

Time: 8:13:46 AM

User: N/A

Computer: NACNAS-ABB02

Description:

NACNAS-ARRAY:1 Failover command issued.

Data:0000: 00 00 00 00 03 00 52 00 ......R.

0008: 00 00 00 00 0a 00 04 80 .......

0010: 0a 00 00 00 00 00 00 00 ........

0018: 00 00 00 00 00 00 00 00 ........

0020: 00 00 00 00 00 00 00 00 ........

********************************************************

Event Type: Warning

Event Source: md3dsm

Event Category: None

Event ID: 801

Date: 2/4/2008

Time: 8:13:42 AM

User: N/AComputer: NACNAS-ABB02

Description:

Failover succeeded to NACNAS-ARRAY:0:0:0.

Data:0000: 00 00 00 00 05 00 52 00 ......R.

0008: 00 00 00 00 21 03 04 80 ....!..

0010: 21 03 00 00 00 00 00 00 !.......

0018: 00 00 00 00 00 00 00 00 ........

0020: 00 00 00 00 00 00 00 00 ........

There are similar warnings on the other node in the array.

I'm new to Dell StorageManager and Oracle RAC which we are running. I'm assuming it may have something to do with the MD3000 since md3dsm id a DEll MD3000 Device Specific Module.

Has anyone had these warnings and can offer some help?

dining_philosop · Answer

Does this happen only when you boot the server and then stops? Can you verify this?

ABB-GAP · Answer

Unfortunately they do not stop - just keep getting them several times a minute.

One hit I found while searching indicated that these events (10 & 801) in the event viewer should be due to non-optimal array status and a re-balance is needed. However I never see a non-optimal status in the StorageManager which would trigger the repair wizard.

dining_philosop · Answer

You may have an error in the way you have wired the hosts to the controllers. With the MD3000 you can have a maximum of two nodes in your cluster. When you connect the cluster nodes to the MD3000, you need to ensure that each node has a path to each of the controllers in the MD3000 array. This is because the MD3000 array is designed as an active/active asymmetric target device - each logical unit (virtual disk) has an owning controller through which IO is normally executed till a path failure causes the driver to move the virtual disk to the alternate controller and resume IO.

The reason you are constantly getting these messages is because Oracle is a shared resource cluster where each node accesses the same virtual disk with locks held by the cluster to control simulataneous access to the same block. If redundant paths are not provided from each node, you get a condition where each IO from a node causes a virtual disk to failover, resulting in the back to back errors.

In addition, there is a configuration parameter called "Auto-rebalance" for the failover driver that needs to be disabled in a cluster configuration. Please refer to your product documentation for supported cluster configurations and how to change the settings for the failover driver.

Krisk578 · Answer

We also experiencing the same problems that you guys are facing. We have managed to find some information regarding this problem. We found a registry setting called disable LUN rebalance and according to the documentation this should be set to 3.

We change these settings, and it didn't have any effect on the issue. So we are very keen to try and find the solution to this problem.
Does anyone know where to turn the auto rebalancing off?

Regards Kris
PS. We do not use Oracle. We'll just using md 3000 as a straight san

dining_philosop · Answer

To get a cluster to work with the MD3000 you need to ensure that the cluster is configured with redundant paths.

If you are in a non-cluster you could get this error message if your hosts are connected to the MD with non-redundant paths (single path from host to MD). In this case if the preferred controller for the LUN is different from the controller to which the host is attached you will get this error. A simple fix is to change the owning controller of the LUN. Check the CLI guide for the MD under the set virtualDisk commands for syntax to set the preferred controller.

Krisk578 · Answer

If you guys are interested

I think I may have found a workaround to the issue what I did was as follows upgraded the md3000 management software to the latest version updated the firmware of the sas 5e raid controller card. made sure that we had latest drivers for Windows.

Then we updated the firmware on the raid controller cards in the MD3000 and also the nvram was also updated.
The thing that fixed the problem for us was to change the host type from Windows 2000/2003 clustered to

Microsoft MSCS single path space clustered.

As soon as you made this change. The event of errors greater 801 & 10 stopped .
Regards Kris

It would be nice to know if this workaround fixes your problem as well

PowerVault

MD3000 Windows event log error - event 10 & 801

Was this post helpful?