Unsolved
This post is more than 5 years old
5 Posts
0
35650
February 4th, 2008 14:00
MD3000 Windows event log error - event 10 & 801
The StorageManager reports the array as optimal and there are no critical errors in its log. However, in the Windows Server 2003 Event Viewer/System there are numerous warnings (several a minute) repeated as below:
Event Type: Warning
Event Source: md3dsm
Event Category: NoneEvent ID: 10
Date: 2/4/2008
Time: 8:13:46 AM
User: N/A
Computer: NACNAS-ABB02
Description:
NACNAS-ARRAY:1 Failover command issued.
Data:0000: 00 00 00 00 03 00 52 00 ......R.
0008: 00 00 00 00 0a 00 04 80 .......
0010: 0a 00 00 00 00 00 00 00 ........
0018: 00 00 00 00 00 00 00 00 ........
0020: 00 00 00 00 00 00 00 00 ........
********************************************************
Event Type: Warning
Event Source: md3dsm
Event Category: None
Event ID: 801
Date: 2/4/2008
Time: 8:13:42 AM
User: N/AComputer: NACNAS-ABB02
Description:
Failover succeeded to NACNAS-ARRAY:0:0:0.
Data:0000: 00 00 00 00 05 00 52 00 ......R.
0008: 00 00 00 00 21 03 04 80 ....!..
0010: 21 03 00 00 00 00 00 00 !.......
0018: 00 00 00 00 00 00 00 00 ........
0020: 00 00 00 00 00 00 00 00 ........
There are similar warnings on the other node in the array.
I'm new to Dell StorageManager and Oracle RAC which we are running. I'm assuming it may have something to do with the MD3000 since md3dsm id a DEll MD3000 Device Specific Module.
Has anyone had these warnings and can offer some help?
0 events found


dining_philosop
60 Posts
0
February 6th, 2008 13:00
ABB-GAP
5 Posts
0
February 6th, 2008 13:00
Unfortunately they do not stop - just keep getting them several times a minute.
One hit I found while searching indicated that these events (10 & 801) in the event viewer should be due to non-optimal array status and a re-balance is needed. However I never see a non-optimal status in the StorageManager which would trigger the repair wizard.
dining_philosop
60 Posts
0
February 13th, 2008 12:00
The reason you are constantly getting these messages is because Oracle is a shared resource cluster where each node accesses the same virtual disk with locks held by the cluster to control simulataneous access to the same block. If redundant paths are not provided from each node, you get a condition where each IO from a node causes a virtual disk to failover, resulting in the back to back errors.
In addition, there is a configuration parameter called "Auto-rebalance" for the failover driver that needs to be disabled in a cluster configuration. Please refer to your product documentation for supported cluster configurations and how to change the settings for the failover driver.
Krisk578
2 Posts
0
February 14th, 2008 19:00
We also experiencing the same problems that you guys are facing. We have managed to find some information regarding this problem. We found a registry setting called disable LUN rebalance and according to the documentation this should be set to 3.
We change these settings, and it didn't have any effect on the issue. So we are very keen to try and find the solution to this problem.
Does anyone know where to turn the auto rebalancing off?
Regards KrisPS. We do not use Oracle. We'll just using md 3000 as a straight san
dining_philosop
60 Posts
0
February 15th, 2008 02:00
If you are in a non-cluster you could get this error message if your hosts are connected to the MD with non-redundant paths (single path from host to MD). In this case if the preferred controller for the LUN is different from the controller to which the host is attached you will get this error. A simple fix is to change the owning controller of the LUN. Check the CLI guide for the MD under the set virtualDisk commands for syntax to set the preferred controller.
Krisk578
2 Posts
0
February 21st, 2008 10:00
I think I may have found a workaround to the issue what I did was as follows upgraded the md3000 management software to the latest version updated the firmware of the sas 5e raid controller card. made sure that we had latest drivers for Windows.
Then we updated the firmware on the raid controller cards in the MD3000 and also the nvram was also updated.
The thing that fixed the problem for us was to change the host type from Windows 2000/2003 clustered to
Microsoft MSCS single path space clustered.
As soon as you made this change. The event of errors greater 801 & 10 stopped .
Regards Kris
It would be nice to know if this workaround fixes your problem as well