This post is more than 5 years old
25 Posts
0
35380
July 11th, 2015 09:00
MD3000i controller troubles
Hi,
It seems that my MD3000i continues having controller problems. I'm receiving tons of critical errors via email notification, but every time I go into MDSM - the status is optimal. The error messages I receive refer (as far as I understand) to both controllers (enclosure 0, slot 0 and enclosure 0, slot 1). Does that mean that both controllers are having issues or does it mean that one controller keeps going down (rebooting) and falling back on the other controller? The errors are:
-Optimal wide port becomes degraded (event error code 1706)
-Degraded wide port becomes failed (1707)
I would consider replacing a controller if I can confirm which of the two is failing. Is there are way to determine which of the two controllers is having trouble?
This MD3000i has reached its EOL, so we don't have support anymore...
Thank you, Alex.



DELL-Daniel Ca
243 Posts
0
July 15th, 2015 10:00
Hey, Alex.
After a deeper look into this bundle, it looks like the raid controller in slot 0 is trying to go south on you. You might prepare yourself to look for another one. We can verify this if you can get your console cable, plug it into that controller and capture the output via Putty.
If you need the settings for Putty, and how to do that, let me know and I can send the instructions.
BUT, it looks like controller 0 is continually being reset by the other controller. This generally means a replacement of the controller.
Edit: I forgot to mention also that the disk in Enclosure 0 slot 10 is on its way out. It would be good to replace that one.
Wish I had better news for you.
DELL-Daniel Ca
243 Posts
0
July 13th, 2015 13:00
Hello, akogan.
On the surface, it looks like you may have a controller rebooting, yes. I'd like to take a little deeper look though. I'll send you an email, and you can reply with the support bundle.
Let me know if you have any questions. Have a good day.
akogan
25 Posts
0
July 14th, 2015 07:00
Hi,
I sent you our support bundle yesterday. I saw some more controller errors last night and this morning server connection to the MD3000i was lost. MDSM shows array status as "optimal", but the filesystem went stale on the server. Could that be the result of the controller issues? Below are the errors I got overnight.
Thanks, Alex.
Summary
Node ID: NUPIC_Raid1
Host IP Address:
Host ID: Out-of-Band
Event Error Code: 1707
Event occurred: Jul 13, 2015 11:23:58 PM
Event Message: Degraded wide port becomes failed
Event Priority: Critical
Component Type: Enclosure Component (EMM, GBIC/SFP, Power Supply, or Fan)
Component Location: Enclosure 0, Slot 0
Summary
Node ID: NUPIC_Raid1
Host IP Address:
Host ID: Out-of-Band
Event Error Code: 1706
Event occurred: Jul 14, 2015 4:29:19 AM
Event Message: Optimal wide port becomes degraded
Event Priority: Critical
Component Type: Enclosure Component (EMM, GBIC/SFP, Power Supply, or Fan)
Component Location: Enclosure 0, Slot 1
Summary
Node ID: NUPIC_Raid1
Host IP Address:
Host ID: Out-of-Band
Event Error Code: 6900
Event occurred: Jul 14, 2015 4:39:04 AM
Event Message: Diagnostic data is available
Event Priority: Critical
Component Type: RAID Controller Module
Component Location: RAID Controller Module in slot 0
akogan
25 Posts
0
July 26th, 2015 15:00
Hi,
Thank you. I already replaced disk at [0,10]. I also ordered a replacement controller. As far as I understand the controller is hot-pluggable. In order to replace a controller do I just need to pull the old one and plug in the new one while the device is on? Or is there another procedure to follow? Should I reconfigure anything through the MDSM?
Thank you, Alex.