Start a Conversation

Unsolved

This post is more than 5 years old

2339

July 8th, 2017 07:00

PowerVault MD3220i - May be the RAID controller is faulty?

I'm trying to configure a MD3220i to go into production soon. But I hit an issue which I'm not sure whether it is hardware related or configuration related. I spent many days, trying various changes without success.  I hope one of you might be able to help :o)

I've connected the array as per Dell recommendations - all the 8 ports are connected to a Dell PowerConnect 5324 switch which was configured to handle iSCSI traffic alone, again as per the recommendations. Six NICs of a Dell PowerEdge R710 are connected to the switch. I've configured 3 LUNs on the array and only 2 of them are mapped to the ESX 6 host running on the R710. I can see in the ESX host that the 3 ports that are connected to the RAID controller in slot 1 resets and loses connection roughtly every 13 to 15 minutes. As a result the LUNs all degrade and then get back to normal and the cycle repeats.

From the usage point of view, any access to the LUNs from ESX host seems to burden the host a lot and fail to respond for a couple of minutes. When I browse the datastores on the array, it seems to take a long time.

When I look at the array events, the RAID controller in slot 1 seems to reset itself roughly about every 15 minutes which drops all the NIC connections which makes the ESX server to degrade and the cycle continues. But there are no actionable errors.

In order to rule out cable and swich issue, I used a different set up network cables and I've connected the ESX host directly to the 6 ports at the back of the MD 3220i. But the problem remains :o(

I've attached the support data with this. I had to remove the trace-buffers from the archive as the original archive is 1.4 MB which is well over the attachment limit of 1MB.

Can someone confirm whether this is a RAID controller fault or a configuration fault?

Many thanks,

Madhaha

1 Attachment

Moderator

 • 

6.9K Posts

July 10th, 2017 11:00

Hello Madhaha,

I am going to send you a private message. Can I get you to email me the full support bundle so that I can review it? The one you uploaded I can’t unzip it as it states it is not complete. Also have you changed your timeout settings for ESX to longer than 30seconds or no? Do you also have your multipathing for iSCSI set to Round Robin?

Please let us know if you have any other questions.

4 Posts

July 10th, 2017 13:00

Hi Sam,

I've sent the complete support bundle in reply to your email.

I haven't touched the ESX timeout settings and the multipathing setting is MRU. Do you think these could cause the issues I'm seeing?

Thanks,

Madhana

Moderator

 • 

6.9K Posts

July 11th, 2017 10:00

Hello Madhaha,

I reviewed both support bundles that you sent over. I am not seeing any messages other than informational messages. There are no errors that are reported or warning messages. When we see this that normally points to something in your setup is not correct.

You stated that you are using MRU for your multipathing. You will want to change that to Round Robin as that is the best practice for MD3xxx systems. I would also adjust your timeout settings as well & see if you are still getting the same issue happening.

Please let us know if you have any other questions.

4 Posts

July 11th, 2017 14:00

Hi Sam,

I've seen quite a few warnings before I cleared them all. All of them are about "Virtual Disk Not On Preferred Path".

I've increased the timeout to 30 secs which didn't make any difference. I haven't tried the RR multipathing yet. 

On a different note, another technician suggested using a 255.255.255.224 subnet as opposed to the 255.255.255.0 subnet I'm using. I was trying to change the iSCSI port settings using MDSM and the array became unresponsive from time to time. The RAID controller in slot 1 responds to ping requests but MDSM is unable to establish communication with it at all. It always connects using the IP address of the controller in slot 0. Also the amber battery LED in the RAID controller 1 briefly comes on when it resets. I've attached a couple of screenshots. Is this normal?

Thanks,

Madhana

1 Attachment

Moderator

 • 

6.9K Posts

July 12th, 2017 14:00

Hello Madhaha,

The alert on the system is due to a virtual disk not being on the preferred path. When virtual disks are created, they are assigned to one of the two controllers as a preferred communication path. If that path is not available – due to a network issue, controller reset, etc. – the array will communicate via the alternate controller. In some cases the communication does not shift back to the primary path when it becomes available again. To correct this you just need to manually redistribute the virtual disks back to their preferred paths.

To redistribute virtual disks:

1.           Open Modular Disk Storage Manager (MDSM)

2.           Click on the Support tab

3.           Select Manage Raid Controller Modules

4.           Click on Redistribute Virtual Disks

You will get an alert telling you that this will disrupt communications if you do not have the multipath drivers installed. You can ignore this message and proceed. If you still have access to the virtual disk that is not on preferred path, you have multipath drivers installed.

I would try testing using RR as that should resolve your issue.

Are you using in-band management or out-of-band management? If you are using the management ports on the controllers then you are using out- of- band management.

Please let us know if you have any other questions.

No Events found!

Top