MD3000i - Controller issues

Question

Over the last week we have been having issues with our MD3000i. Its currently being used with vsphere 4 and 2 attached hosts. The hosts become unreachable for about 2 minutes then recover. The errors are below.

This issue starts with a Destination driver error

Date/Time: 3/8/17 10:18:54 AM
Sequence number: 46011
Event type: 1012
Description: Destination driver error
Event specific codes: 1ff00/0/0
Event category: Error
Component type: Physical Disk
Component location: None
Logged by: RAID Controller Module in slot 0

Then RAID Controller Module rest by its alternate

Date/Time: 3/8/17 10:18:54 AM
Sequence number: 46012
Event type: 400F
Description: RAID Controller Module reset by its alternate
Event specific codes: 0/0/0
Event category: Internal
Component type: RAID Controller Module
Component location: RAID Controller Module in slot 0
Logged by: RAID Controller Module in slot 0

Followed by Optimal wide port becomes degraded and then failed

Date/Time: 3/8/17 10:18:55 AM
Sequence number: 46013
Event type: 1706
Description: Optimal wide port becomes degraded
Event specific codes: 0/0/0
Event category: Error
Component type: Enclosure Component (EMM, GBIC/SFP, Power Supply, or Fan)
Component location: Enclosure 0, Slot 0
Logged by: RAID Controller Module in slot 0

Date/Time: 3/8/17 10:18:55 AM
Sequence number: 46014
Event type: 1707
Description: Degraded wide port becomes failed
Event specific codes: 0/0/0
Event category: Error
Component type: Enclosure Component (EMM, GBIC/SFP, Power Supply, or Fan)
Component location: Enclosure 0, Slot 0
Logged by: RAID Controller Module in slot 0

Do any of the Dell techs here still analyze gathered support information? This device is colocated so I only have access to MDSM. I'd like to know whats actually wrong before scheduling a hands on tech in the datacenter.

Thanks!

DELL-Sam L · Answer

Hello emmet.fc

Wide port errors are generally associated with a controller reset. It may be that the controller reset during its boot sequence after the battery replacement. This is not unusual when changes are made. The controller will see the changes, commit them to running memory, then reboot to commit them to permanent memory.

What you need to look at the event log to see if there are corresponding wide port optimal messages. You also might look for Start of Day Routine start and completed messages. If that’s all completing then you are good to go as there is no error. If you have a support bundle we can review it to see what the errors are.

Please let us know if you have any other questions

emmet.fc · Answer

Thanks Sam,

There's definitely something going on. I have a battery failure on one controller that hasn't been replaced yet and the unit hasn't been rebooted in well over a year. Every night i get several times where vm's are unreachable for around 60 seconds.

How do I send you my support bundle?

emmet.fc · Answer

Hey Sam, I'm still trying to figure out who to send this bundle to?

I had a complete loss of communication today. All of these were in the recovery Guru.

Port reporting problem: RAID Controller Module 0, Physical Disk Channel 2, Port: Expansion Status: Failed

Component reporting problem: Host Board Left Status: Optimal RAID Controller Module: Slot 1 Service action (removal) allowed: No Service action LED on component: No

Degraded physical disk channels: 0, 1

Component reporting problem: Thermal sensor Status: Optimal Location: Expansion enclosure 0 Component requiring service: Temperature sensor

Port reporting problem: RAID Controller Module 0, Internal connection Status: Failed

Failed or Degraded SAS Port

PowerVault

MD3000i - Controller issues

Was this post helpful?