Start a Conversation

Unsolved

This post is more than 5 years old

287740

February 15th, 2013 15:00

High I/O

Have been experiencing random periods of high disk IO & poor performance. Logs show the following Storage Service events happening during the same time...

Feb  9 02:54:53 Server Administrator: Storage Service EventID: 2188  The controller write policy has been changed to Write Through.:  Battery 0 Controller 0
Feb  9 02:54:53 Server Administrator: Storage Service EventID: 2199  The virtual disk cache policy has changed.:  Virtual Disk 0 (Virtual Disk 0) Controller 0 (PERC H700 Integrated)
Feb  9 02:54:53 Server Administrator: Storage Service EventID: 2199  The virtual disk cache policy has changed.:  Virtual Disk 1 (Virtual Disk 1) Controller 0 (PERC H700 Integrated)
Feb  9 03:09:30 Server Administrator: Storage Service EventID: 2242  The Patrol Read has started.:  Controller 0 (PERC H700 Integrated)
 
Initially thought this was the Battery Learn Cycle issue that is commonly reported, but Auto-Learn had been disabled after one of our previous outages. Can anyone explain what the above log entries mean in correlation to the poor disk performance?
Thanks.

Moderator

 • 

6.2K Posts

February 15th, 2013 16:00

Hello John

The messages don't mean a lot without context. You need to be able to review more of the log. The write policy change message occurs after any reset of the controller and does not necessarily mean the policy has changed, but it could mean the policy has changed because of a battery issue. Patrol reads happen on a regular basis, but do not typically affect performance. A patrol read checks the array data for errors and attempts to correct any errors it encounters. With write back cache disabled patrol reads may cause increased I/O.

Check above the write policy change message to see what triggered it. Also, find a section about the battery information. Find out what the relative and absolute percentages of charge are.

Thanks

12 Posts

February 15th, 2013 16:00

Hi Daniel,

I work with John. The server log messages file did not have any other events logged by the controller on that day. The only previous messages were from 5 days before the incident and they are -

Feb  4 02:43:32 db1.atl.nf.int Server Administrator: Instrumentation Service EventID: 1000  Server Administrator starting

Feb  4 02:43:32 db1.atl.nf.int Server Administrator: Instrumentation Service EventID: 1012  IPMI status #012Interface: OS

Feb  4 02:43:32 db1.atl.nf.int Server Administrator: Instrumentation Service EventID: 1001  Server Administrator startup complete

Feb  4 02:43:38 db1.atl.nf.int Server Administrator: Storage Service EventID: 2131  The current firmware version  12.3.0-0032 is older than the required firmware version 12.10.4-0001 for a controller of model 0x1F17:  Controller 0 (PERC H700 Integrated)

Feb  4 02:43:38 db1.atl.nf.int Server Administrator: Storage Service EventID: 2334  Controller event log: Unexpected sense: Encl PD 20 Path 5882b0b026209000, CDB: 1c 01 a0 00 04 00, Sense: 5/24/00:  Controller 0 (PERC H700 Integrated)

Feb  4 02:43:38 db1.atl.nf.int Server Administrator: Storage Service EventID: 2334  Controller event log: Unexpected sense: Encl PD 20 Path 5882b0b026209000, CDB: 1c 01 a0 00 04 00, Sense: 5/24/00:  Controller 0 (PERC H700 Integrated)

Feb  4 02:43:39 db1.atl.nf.int Server Administrator: Storage Service EventID: 2334  Controller event log: Unexpected sense: Encl PD 20 Path 5882b0b026209000, CDB: 1c 01 a0 00 04 00, Sense: 5/24/00:  Controller 0 (PERC H700 Integrated)

Feb  4 02:43:39 db1.atl.nf.int Server Administrator: Storage Service EventID: 2334  Controller event log: Unexpected sense: Encl PD 20 Path 5882b0b026209000, CDB: 1c 01 a0 00 04 00, Sense: 5/24/00:  Controller 0 (PERC H700 Integrated)

Feb  4 02:44:54 db1.atl.nf.int ntpd[4525]: synchronized to 10.22.101.62, stratum 2

Also, i ran the command to get current battery status and here is the result -

$ sudo /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -aAll

BBU status for Adapter: 0

BatteryType: BBU

Voltage: 4043 mV

Current: 0 mA

Temperature: 28 C

Battery State     : Operational

BBU Firmware Status:

 Charging Status              : None

 Voltage                                 : OK

 Temperature                             : OK

 Learn Cycle Requested                  : No

 Learn Cycle Active                      : No

 Learn Cycle Status                      : OK

 Learn Cycle Timeout                     : No

 I2c Errors Detected                     : No

 Battery Pack Missing                    : No

 Battery Replacement required            : No

 Remaining Capacity Low                  : No

 Periodic Learn Required                 : No

 Transparent Learn                       : No

 No space to cache offload               : No

 Pack is about to fail & should be replaced : No

 Cache Offload premium feature required  : No

 Module microcode update required        : No

GasGuageStatus:

 Fully Discharged        : No

 Fully Charged           : No

 Discharging             : Yes

 Initialized             : Yes

 Remaining Time Alarm    : No

 Discharge Terminated    : No

 Over Temperature        : No

 Charging Terminated     : No

 Over Charged            : No

Relative State of Charge: 93 %

Charger Status: Complete

Remaining Capacity: 999 mAh

Full Charge Capacity: 1075 mAh

isSOHGood: Yes

 Battery backup charge time : 0 hours

BBU Capacity Info for Adapter: 0

 Relative State of Charge: 93 %

 Absolute State of charge: 59 %

 Remaining Capacity: 999 mAh

 Full Charge Capacity: 1075 mAh

 Run time to empty: Battery is not being discharged.  

 Average time to empty: Battery is not being discharged.  

 Estimated Time to full recharge: Battery is not being charged.  

 Cycle Count: 11

Max Error = 0 %

Remaining Capacity Alarm = 170 mAh

Remining Time Alarm = 10 Min

BBU Design Info for Adapter: 0

 Date of Manufacture: 02/14, 2011

 Design Capacity: 1700 mAh

 Design Voltage: 3700 mV

 Specification Info: 33

 Serial Number: 3072

 Pack Stat Configuration: 0x0008

 Manufacture Name: SANYO

 Device Name: DLNU209

 Device Chemistry: LION

 Battery FRU: N/A

 Transparent Learn = 0

 App Data = 0

BBU Properties for Adapter: 0

 Auto Learn Period: 90 Days

 Next Learn time: None

 Learn Delay Interval:0 Hours

 Auto-Learn Mode: Disabled

Exit Code: 0x00

Moderator

 • 

6.2K Posts

February 15th, 2013 17:00

The battery is not in great condition, but it is not likely the cause of the write through policy change. The other messages do indicate communication errors with the controller. The 5/24/0 sense key is giving an error that the controller firmware is unable to interpret command descriptor blocks. There are also errors indicating that the current firmware version(12.3) is incompatible with the installed driver and should be updated to 12.10.

I would recommend updating the firmware. If the issue persists then pull a controller log and review it:

MegaCli -AdpEventLog -GetEvents -f log.txt -aALL

Thanks 

12 Posts

February 15th, 2013 19:00

Thanks. We'll try the firmware upgrade

12 Posts

February 19th, 2013 11:00

We applied the firmware upgrade and restarted the server and still get a sense error. I believe this means -

"Illegal Request - invalid field in CDB (Command Descriptor Block)"

Can you let us know if this could be problematic and if we need to try additional steps to get rid of this error?

Feb 18 12:20:49 db3 Server Administrator: Instrumentation Service EventID: 1000  Server Administrator starting
Feb 18 12:20:49 db3 Server Administrator: Instrumentation Service EventID: 1012 IPMI status #012Interface: OS
Feb 18 12:20:49 db3 Server Administrator: Instrumentation Service EventID: 1001 Server Administrator startup complete
Feb 18 12:20:49 db3 sudo: pam_ldap: error trying to bind as user "cn=Pavlo Deviatkin,ou=People,dc=niteflirt,dc=com" (Invalid credentials)
Feb 18 12:20:55 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Flashing image: APP : Controller 0 (PERC H700 Integrated)
Feb 18 12:20:55 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Flashing image: BIOS : Controller 0 (PERC H700 Integrated)
Feb 18 12:20:55 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Flashing image: PCLI : Controller 0 (PERC H700 Integrated)
Feb 18 12:20:55 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Flashing image: CTLR : Controller 0 (PERC H700 Integrated)
Feb 18 12:20:55 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Flashing image: NVDT : Controller 0 (PERC H700 Integrated)
Feb 18 12:20:56 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Flashing image: BTBL : Controller 0 (PERC H700 Integrated)
Feb 18 12:20:56 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Flashing image: BOOT : Controller 0 (PERC H700 Integrated)
Feb 18 12:20:56 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Flash of new firmware image(s)complete: Controller 0 (PERC H700 Integrated)
Feb 18 12:20:56 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Policy change on VD 00/0 to [ID=00,dcp=0d,ccp=00,ap=0,dc=0,dbgi=0,S=0|0] from [ID=00,dcp=0d,ccp=0d,ap=0,dc=0,dbgi=0,S=0|0]: Controller 0 (PERC H700 Integrated)
Feb 18 12:20:57 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Policy change on VD 01/1 to [ID=01,dcp=0d,ccp=00,ap=0,dc=0,dbgi=0,S=0|0] from [ID=01,dcp=0d,ccp=0d,ap=0,dc=0,dbgi=0,S=0|0]: Controller 0 (PERC H700 Integrated)
Feb 18 12:20:57 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Shutdown command received from host: Controller 0 (PERC H700 Integrated)
Feb 18 12:20:57 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Firmware initialization started (PCI ID 0079/1000/1f17/1028): Controller 0 (PERC H700 Integrated)
Feb 18 12:20:57 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Battery Present: Controller 0 (PERC H700 Integrated)
Feb 18 12:20:58 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Package version 12.10.4-0001: Controller 0 (PERC H700 Integrated)
Feb 18 12:20:58 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Board Revision A03: Controller 0 (PERC H700 Integrated)
Feb 18 12:20:58 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Enclosure PD 20(c None/p0) communication restored: Controller 0 (PERC H700 Integrated)
Feb 18 12:20:58 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Inserted: Encl PD 20: Controller 0 (PERC H700 Integrated)
Feb 18 12:20:59 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Inserted: PD 00(e0x20/s0): Controller 0 (PERC H700 Integrated)
Feb 18 12:20:59 db3 puppet-agent[9178]: Reopening log files
Feb 18 12:20:59 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Inserted: PD 01(e0x20/s1): Controller 0 (PERC H700 Integrated)
Feb 18 12:20:59 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Inserted: PD 02(e0x20/s2): Controller 0 (PERC H700 Integrated)
Feb 18 12:20:59 db3 puppet-agent[9178]: Starting Puppet client version 2.7.6
Feb 18 12:20:59 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Inserted: PD 03(e0x20/s3): Controller 0 (PERC H700 Integrated)
Feb 18 12:21:00 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Inserted: PD 04(e0x20/s4): Controller 0 (PERC H700 Integrated)
Feb 18 12:21:00 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Inserted: PD 05(e0x20/s5): Controller 0 (PERC H700 Integrated)
Feb 18 12:21:00 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Inserted: PD 06(e0x20/s6): Controller 0 (PERC H700 Integrated)
Feb 18 12:21:00 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Inserted: PD 07(e0x20/s7): Controller 0 (PERC H700 Integrated)
Feb 18 12:21:01 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Battery temperature is normal: Controller 0 (PERC H700 Integrated)
Feb 18 12:21:01 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Current capacity of the battery is above threshold: Controller 0 (PERC H700 Integrated)
Feb 18 12:21:01 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Battery started charging: Controller 0 (PERC H700 Integrated)
Feb 18 12:21:01 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Time established as 02/18/13 12:18:32; (244 seconds since power on): Controller 0 (PERC H700 Integrated)
Feb 18 12:21:02 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Unexpected sense: Encl PD 20 Path 5882b0b026209d00, CDB: 1c 01 a0 00 04 00, Sense: 5/24/00: Controller 0 (PERC H700 Integrated)
Feb 18 12:21:02 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Unexpected sense: Encl PD 20 Path 5882b0b026209d00, CDB: 1c 01 a0 00 04 00, Sense: 5/24/00: Controller 0 (PERC H700 Integrated)
Feb 18 12:21:02 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Unexpected sense: Encl PD 20 Path 5882b0b026209d00, CDB: 1c 01 a0 00 04 00, Sense: 5/24/00: Controller 0 (PERC H700 Integrated)
Feb 18 12:21:02 db3 Server Administrator: Storage Service EventID: 2334 Controller event log: Unexpected sense: Encl PD 20 Path 5882b0b026209d00, CDB: 1c 01 a0 00 04 00, Sense: 5/24/00: Controller 0 (PERC H700 Integrated)

Moderator

 • 

6.2K Posts

February 20th, 2013 09:00

It could be some bad blocks on a specific drive or a problem with a drives firmware. I would suggest updating all of the HDD firmware if you have not already, and run a consistency check on the array. A consistency check will degrade performance for a few hours so I would do it during off peak hours.

Thanks

No Events found!

Top