14 Posts

December 11th, 2017 09:00

I forgot to include these errors  that appear frequently when controller in slot 1 in online...

Date/Time: 12/9/17 8:01:39 AM
Sequence number: 54719
Event type: 2837
Event category: Internal
Priority: Informational
Description: Discrete lines diagnostic failure resolved
Event specific codes: 0/0/0
Component type: Interconnect-battery module pack
Component location: RAID Controller Module enclosure
Logged by: RAID Controller Module in slot 1

Raw data:
4d 45 4c 48 03 00 00 00 bf d5 00 00 00 00 00 00
37 28 40 02 b3 de 2b 5a 08 00 00 00 00 00 00 00
00 00 00 00 04 00 00 00 10 00 00 00 10 00 00 00
ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01 00 00 00 02 00 01 01 08 00 00 00 04 00 02 00
00 00 00 00


Date/Time: 12/9/17 8:01:07 AM
Sequence number: 54718
Event type: 2836
Event category: Internal
Priority: Critical
Description: Discrete lines diagnostic failure
Event specific codes: 0/0/0
Component type: RAID Controller Module
Component location: RAID Controller Module in slot 1
Logged by: RAID Controller Module in slot 1

Raw data:
4d 45 4c 48 03 00 00 00 be d5 00 00 00 00 00 00
36 28 48 01 93 de 2b 5a 08 00 00 00 00 00 00 00
00 00 00 00 04 00 00 00 08 00 00 00 08 00 00 00
ff ff ff ff 01 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01 00 00 00 02 00 01 01 08 00 00 00 04 00 02 00
01 00 00 00

Moderator

 • 

7.8K Posts

December 12th, 2017 08:00

Hello jezmathers,

When you replaced the battery did you let it charge for 48hrs? I ask this as when a battery is replaced it can take that long for it to charge. In addition to that once the battery is charged it will normally want to do a battery test. Now you can modify the battery setting by clicking the Tools tab → Change Battery Settings.

If you are still getting the error after the battery test has completed then we would need to look at a support bundle to see what is going on.

Please let us know if you have any other questions.

14 Posts

December 12th, 2017 13:00

Yes, I believe i did charge for at least 48hrs.  I do not have the "Change Batter Options" on my Modular Disk Storage Manager.

I have attached the storage bundle...  Your help is appreciated!

1 Attachment

Moderator

 • 

7.8K Posts

December 13th, 2017 12:00

Hello jezmathers,

Thanks for the support bundle as it helps to see what is going on. When you replaced the battery in controller 1 it did not reset. What will be needed is to run the following command via SMCLI.

reset storageArray batteryInstallDate [controller=(1)]

That should force the controller to reset the learn cycle and start a battery test.   If it doesen’t start the battery test then you can manual start it via SMCLI.

set storageArray learnCycleDate (daysToNextLearnCycle=integer-literal | day=string-literal) time=HH:MM

Here is also a link to the SMCLI guide just in case u need it. http://downloads.dell.com/manuals/all-products/esuprt_ser_stor_net/esuprt_powervault/powervault-md3000i_reference%20guide2_en-us.pdf

Please let us know if you have any other questions.

14 Posts

December 15th, 2017 10:00

I put the controller 1 back on line and successfully ran the SMCLI command to reset the battery install date.

C:\Program Files (x86)\Dell\MD Storage Manager\client>smcli -n Empire -p ****** -c "reset storageArray batteryinstallDate controller=1;"
Performing syntax check...

Syntax check complete.

Executing script...

Script execution complete.

SMcli completed successfully.

I did not see any evidence of the learn cycle starting, so I ran the second SMCLI command you suggested, but I got the following error:

C:\Program Files (x86)\Dell\MD Storage Manager\client>smcli -n Empire -p *****-c "set storageArray learnCycleDate daystoNextLearnCycle=0;"
Performing syntax check...

Syntax check complete.

Executing script...

This storage array at line 1 is not SBD (Smart-Battery Data) capable and will no
t support the setting of learn cycles.
The command at line 1 that caused the error is:

set storageArray learnCycleDate daystoNextLearnCycle=0;

Script execution halted due to error.

SMcli failed.

The array is still non-optional and I am seeing this error message in the array manager:

Storage array:  Empire
Component reporting problem:  Battery   
Status:  Unknown   
Location:  RAID Controller Module enclosure 0,
RAID Controller Module in Slot 1 Component requiring service:  RAID Controller Module 1   
Service action (removal) allowed:  No        
Service action LED on component:  No

And in the log, this error is repeating:

Date/Time: 12/15/17 11:45:02 AM
Sequence number: 54979
Event type: 2836
Description: Discrete lines diagnostic failure
Event specific codes: 0/0/0
Event category: Internal
Component type: RAID Controller Module
Component location: RAID Controller Module in slot 1
Logged by: RAID Controller Module in slot 1

Raw data:
4d 45 4c 48 03 00 00 00 c3 d6 00 00 00 00 00 00
36 28 48 01 0e fc 33 5a 08 00 00 00 00 00 00 00
00 00 00 00 04 00 00 00 08 00 00 00 08 00 00 00
ff ff ff ff 01 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01 00 00 00 02 00 01 01 08 00 00 00 04 00 02 00
01 00 00 00

I have attached a new support bundle.  Any other suggestions/advice would be most welcome. 

Thank you!

1 Attachment

Moderator

 • 

7.8K Posts

December 19th, 2017 11:00

Hello jezmathers,

Looking at your support bundle after you ran the commands it looks like there is still an issue with your battery. When you replaced the battery was it a new battery or used from another controller? I ask this as when we see this issue we replace the battery as it is not functioning as it is supposed to. Your controller is not able to get any info on battery age and life span & current charge status when I am looking at the logs.

Please let us know if you have any other questions.

14 Posts

December 19th, 2017 11:00

I had a spare (refurbished) controller (with a battery installed), I tried the new controller (with the battery that was installed), when that didn't work I tried the battery from the spare controller in the old controller.  When that didn't work, I purchased a new battery (which is what I have installed on Controller 1 now).  The battery is not Dell OEM, but is a new a Zthy Tech battery (I have used these in the past with success - in fact controller 0 may have one of these installed).  I could replace the controller/battery and try the SMcli commands again, if you thing that is worth a try?

As neither controller or batteries combination is solving my issue I am concerned it is an issue with the enclosure!  I have not had chance to completely shut down I/O and power cycle the MD3000, not sure if that could help?

What is my next move?

14 Posts

December 20th, 2017 13:00

I brought the controller 1 online and ran the ping for 7 minutes, without a single timeout/fail.

Thank you for your help... What is my next move?

10 Elder

 • 

6.2K Posts

December 21st, 2017 10:00

Hello

When you pulled the support bundle was the spare controller inserted or the original? The battery is not detected properly by whichever controller was installed. Although possible, I think it is unlikely this is a chassis or slot issue. This is likely an issue with either the battery or controller.

According to the logs the issues started on 11/17 when a normal learn cycle started on both controllers. Controller 1 failed the learn cycle and has been producing errors since. Without a functional battery caching on the controller will be disabled. If controller 1 is brought into a redundant configuration with caching disabled then caching will be disabled on both controllers. This will likely decrease performance.

I suggest testing with another battery or controller if possible. Whenever you insert a new battery make sure to allow several hours(8+) for the battery to charge. It must meet a minimum charge threshold before it will start functioning correctly.

When you make changes you can pull a new support bundle and look at the statecapturedata.txt file to view the controller battery status. This is what the battery status looks like on controller 1 of your bundle. The status indicates that the controller is unable to provide information on the battery. The status is unknown.

Battery [1]
Parent Ctlr      = CTLR_B
Is Local         = false
Parent CRU       = CRU_2
BID Index        = 0
CapabilityChking = true
Over Temp Count  = 0
Install Time     = 0x565074CA 11/21/2015 13:42:34
Warning Time     = 0xFFFFFFFF
Expired Time     = 0xFFFFFFFF

Current Status
     Overall Sts = (0x0012) Unknown         
     Common Sts  = (0x0006) Unknown
     Working Sts = (0x00c2) I2CBusErr
     Config Sts  = (0x0060) AgeExpOff AgeWrnOff
     Smart Sts   = (0x0007) VrsnErr   ChargeOk  Smart     
     Learn Sts   = (0x0002) NotReady  
     Bkup Mode   = (0x0002) Disabled 

Thanks

14 Posts

December 27th, 2017 06:00

I exchanged Controller 1 for my spare controller (& battery) on 12/22 around 1:30pm and brought the controller online.  I grabbed a support bundle and this is what I see in statecapturedata.txt:

Battery [1]
Parent Ctlr      = CTLR_B
Is Local         = false
Parent CRU       = CRU_2
BID Index        = 0
CapabilityChking = true
Over Temp Count  = 0
Install Time     = 0x5A3CC927 12/22/2017 08:58:15
Warning Time     = 0xFFFFFFFF
Expired Time     = 0xFFFFFFFF

Current Status
     Overall Sts = (0x0011) I2C Bus Err    
     Common Sts  = (0x0001) Okay
     Working Sts = (0x0042) I2CBusErr
     Config Sts  = (0x0060) AgeExpOff AgeWrnOff
     Smart Sts   = (0x0007) VrsnErr   ChargeOk  Smart    
     Learn Sts   = (0x0002) NotReady 
     Bkup Mode   = (0x0002) Disabled 

I left the controller online (fyi, I had moved all my virtual disks to prefer Controller 0).  At around 3:56am on 12/24 one of my 2 host servers (both Server 2008 R2 running as a Hyper -V cluster) crashed and rebooted (bugcheck 1001).  It also appears that the other server lost connectivity with the SAN (I see iScsiPrt error - Connection to Target lost).

I grabbed another support bundle (attached) and this is what I see in statecapturedata.txt:

Battery [1]
Parent Ctlr      = CTLR_B
Is Local         = false
Parent CRU       = CRU_2
BID Index        = 0
CapabilityChking = true
Over Temp Count  = 0
Install Time     = 0x5A3CC927 12/22/2017 08:58:15
Warning Time     = 0xFFFFFFFF
Expired Time     = 0xFFFFFFFF

Current Status
     Overall Sts = (0x0011) I2C Bus Err    
     Common Sts  = (0x0001) Okay
     Working Sts = (0x0042) I2CBusErr
     Config Sts  = (0x0060) AgeExpOff AgeWrnOff
     Smart Sts   = (0x0007) VrsnErr   ChargeOk  Smart    
     Learn Sts   = (0x0002) NotReady 
     Bkup Mode   = (0x0002) Disabled 

I had my cluster crash on a previous occasion when I left Controller 1 Online (never had either server crash prior to this issue, or when running with Controller 1 Offline).

Is there anything else I can try?  Thanks.

1 Attachment

Moderator

 • 

7.8K Posts

January 2nd, 2018 10:00

Hello jezmathers,

Looking at the last support bundle that you supplied I can see that the battery has been replaced. However I am also seeing that it is still not able to provide an information about the battery charge status. I also looked in the state capture data and the status still is not seeing the replacement battery.

When you bring raid controller 1 online does it fully boot up or no? Also, if you use the serial shell cable and watch the boot of the controller does it present the start of the day message on boot?

Please let us know if you have any other questions.

14 Posts

January 23rd, 2018 05:00

Apologies for the delayed response (I had to find a service cable).  I now have a terminal log (when I bring Controller 1 online) - but do not see the option to attach a file anymore?  Can I email that to you?

Moderator

 • 

7.8K Posts

January 23rd, 2018 08:00

Hello jezmathers,

I will send you an email that you can reply to with your serial shell output so that we can review it to see what it says.

Please let us know if you have any other questions.

No Events found!

Top