Unsolved
This post is more than 5 years old
2 Posts
0
287740
High I/O
Have been experiencing random periods of high disk IO & poor performance. Logs show the following Storage Service events happening during the same time...
Feb 9 02:54:53 Server Administrator: Storage Service EventID: 2188 The controller write policy has been changed to Write Through.: Battery 0 Controller 0 Feb 9 02:54:53 Server Administrator: Storage Service EventID: 2199 The virtual disk cache policy has changed.: Virtual Disk 0 (Virtual Disk 0) Controller 0 (PERC H700 Integrated) Feb 9 02:54:53 Server Administrator: Storage Service EventID: 2199 The virtual disk cache policy has changed.: Virtual Disk 1 (Virtual Disk 1) Controller 0 (PERC H700 Integrated) Feb 9 03:09:30 Server Administrator: Storage Service EventID: 2242 The Patrol Read has started.: Controller 0 (PERC H700 Integrated)
Initially thought this was the Battery Learn Cycle issue that is commonly reported, but Auto-Learn had been disabled after one of our previous outages. Can anyone explain what the above log entries mean in correlation to the poor disk performance?
Thanks.
DELL-Daniel My
Moderator
Moderator
•
6.2K Posts
0
February 15th, 2013 16:00
Hello John
The messages don't mean a lot without context. You need to be able to review more of the log. The write policy change message occurs after any reset of the controller and does not necessarily mean the policy has changed, but it could mean the policy has changed because of a battery issue. Patrol reads happen on a regular basis, but do not typically affect performance. A patrol read checks the array data for errors and attempts to correct any errors it encounters. With write back cache disabled patrol reads may cause increased I/O.
Check above the write policy change message to see what triggered it. Also, find a section about the battery information. Find out what the relative and absolute percentages of charge are.
Thanks
andy318
12 Posts
0
February 15th, 2013 16:00
Hi Daniel,
I work with John. The server log messages file did not have any other events logged by the controller on that day. The only previous messages were from 5 days before the incident and they are -
Feb 4 02:43:32 db1.atl.nf.int Server Administrator: Instrumentation Service EventID: 1000 Server Administrator starting
Feb 4 02:43:32 db1.atl.nf.int Server Administrator: Instrumentation Service EventID: 1012 IPMI status #012Interface: OS
Feb 4 02:43:32 db1.atl.nf.int Server Administrator: Instrumentation Service EventID: 1001 Server Administrator startup complete
Feb 4 02:43:38 db1.atl.nf.int Server Administrator: Storage Service EventID: 2131 The current firmware version 12.3.0-0032 is older than the required firmware version 12.10.4-0001 for a controller of model 0x1F17: Controller 0 (PERC H700 Integrated)
Feb 4 02:43:38 db1.atl.nf.int Server Administrator: Storage Service EventID: 2334 Controller event log: Unexpected sense: Encl PD 20 Path 5882b0b026209000, CDB: 1c 01 a0 00 04 00, Sense: 5/24/00: Controller 0 (PERC H700 Integrated)
Feb 4 02:43:38 db1.atl.nf.int Server Administrator: Storage Service EventID: 2334 Controller event log: Unexpected sense: Encl PD 20 Path 5882b0b026209000, CDB: 1c 01 a0 00 04 00, Sense: 5/24/00: Controller 0 (PERC H700 Integrated)
Feb 4 02:43:39 db1.atl.nf.int Server Administrator: Storage Service EventID: 2334 Controller event log: Unexpected sense: Encl PD 20 Path 5882b0b026209000, CDB: 1c 01 a0 00 04 00, Sense: 5/24/00: Controller 0 (PERC H700 Integrated)
Feb 4 02:43:39 db1.atl.nf.int Server Administrator: Storage Service EventID: 2334 Controller event log: Unexpected sense: Encl PD 20 Path 5882b0b026209000, CDB: 1c 01 a0 00 04 00, Sense: 5/24/00: Controller 0 (PERC H700 Integrated)
Feb 4 02:44:54 db1.atl.nf.int ntpd[4525]: synchronized to 10.22.101.62, stratum 2
Also, i ran the command to get current battery status and here is the result -
$ sudo /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -aAll
BBU status for Adapter: 0
BatteryType: BBU
Voltage: 4043 mV
Current: 0 mA
Temperature: 28 C
Battery State : Operational
BBU Firmware Status:
Charging Status : None
Voltage : OK
Temperature : OK
Learn Cycle Requested : No
Learn Cycle Active : No
Learn Cycle Status : OK
Learn Cycle Timeout : No
I2c Errors Detected : No
Battery Pack Missing : No
Battery Replacement required : No
Remaining Capacity Low : No
Periodic Learn Required : No
Transparent Learn : No
No space to cache offload : No
Pack is about to fail & should be replaced : No
Cache Offload premium feature required : No
Module microcode update required : No
GasGuageStatus:
Fully Discharged : No
Fully Charged : No
Discharging : Yes
Initialized : Yes
Remaining Time Alarm : No
Discharge Terminated : No
Over Temperature : No
Charging Terminated : No
Over Charged : No
Relative State of Charge: 93 %
Charger Status: Complete
Remaining Capacity: 999 mAh
Full Charge Capacity: 1075 mAh
isSOHGood: Yes
Battery backup charge time : 0 hours
BBU Capacity Info for Adapter: 0
Relative State of Charge: 93 %
Absolute State of charge: 59 %
Remaining Capacity: 999 mAh
Full Charge Capacity: 1075 mAh
Run time to empty: Battery is not being discharged.
Average time to empty: Battery is not being discharged.
Estimated Time to full recharge: Battery is not being charged.
Cycle Count: 11
Max Error = 0 %
Remaining Capacity Alarm = 170 mAh
Remining Time Alarm = 10 Min
BBU Design Info for Adapter: 0
Date of Manufacture: 02/14, 2011
Design Capacity: 1700 mAh
Design Voltage: 3700 mV
Specification Info: 33
Serial Number: 3072
Pack Stat Configuration: 0x0008
Manufacture Name: SANYO
Device Name: DLNU209
Device Chemistry: LION
Battery FRU: N/A
Transparent Learn = 0
App Data = 0
BBU Properties for Adapter: 0
Auto Learn Period: 90 Days
Next Learn time: None
Learn Delay Interval:0 Hours
Auto-Learn Mode: Disabled
Exit Code: 0x00
DELL-Daniel My
Moderator
Moderator
•
6.2K Posts
0
February 15th, 2013 17:00
The battery is not in great condition, but it is not likely the cause of the write through policy change. The other messages do indicate communication errors with the controller. The 5/24/0 sense key is giving an error that the controller firmware is unable to interpret command descriptor blocks. There are also errors indicating that the current firmware version(12.3) is incompatible with the installed driver and should be updated to 12.10.
I would recommend updating the firmware. If the issue persists then pull a controller log and review it:
MegaCli -AdpEventLog -GetEvents -f log.txt -aALL
Thanks
andy318
12 Posts
0
February 15th, 2013 19:00
Thanks. We'll try the firmware upgrade
andy318
12 Posts
0
February 19th, 2013 11:00
We applied the firmware upgrade and restarted the server and still get a sense error. I believe this means -
"Illegal Request - invalid field in CDB (Command Descriptor Block)"
Can you let us know if this could be problematic and if we need to try additional steps to get rid of this error?
DELL-Daniel My
Moderator
Moderator
•
6.2K Posts
0
February 20th, 2013 09:00
It could be some bad blocks on a specific drive or a problem with a drives firmware. I would suggest updating all of the HDD firmware if you have not already, and run a consistency check on the array. A consistency check will degrade performance for a few hours so I would do it during off peak hours.
Thanks