Unsolved
This post is more than 5 years old
5 Practitioner
•
274.2K Posts
0
273948
"power_saving" in centos 6.2 deployed on the PowerEdge R620
Hi, everyone
Here I have 68 r620 server and each has two e5-2650 cpus and 32G memory.
I have deploy the centos 6.2 on the servers and the settings in the BIOS are not be changed except I have close the logic processors.
So there're every strang problems
When some server booted , their cpus occopied by the process "power_saving" and "watchdog". as following
top - 09:56:54 up 10:52, 3 users, load average: 33.80, 33.71, 31.90 Tasks: 339 total, 40 running, 299 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 78.7%sy, 0.0%ni, 21.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 32931132k total, 980388k used, 31950744k free, 72724k buffers Swap: 67108856k total, 0k used, 67108856k free, 268520k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6951 root -2 0 0 0 0 R 96.7 0.0 210:54.71 power_saving/4 50 root RT 0 0 0 0 S 75.7 0.0 162:55.42 watchdog/11 14 root RT 0 0 0 0 S 61.1 0.0 168:48.67 watchdog/2 6960 root -2 0 0 0 0 R 25.6 0.0 262:19.47 power_saving/13 6958 root -2 0 0 0 0 R 21.3 0.0 241:30.96 power_saving/11 6957 root -2 0 0 0 0 R 18.9 0.0 237:24.04 power_saving/10 6956 root -2 0 0 0 0 R 16.9 0.0 236:43.27 power_saving/9 6953 root -2 0 0 0 0 R 10.6 0.0 218:47.43 power_saving/6 6952 root -2 0 0 0 0 R 8.6 0.0 246:29.83 power_saving/5 6948 root -2 0 0 0 0 R 6.0 0.0 199:38.34 power_saving/1 6949 root -2 0 0 0 0 R 2.3 0.0 204:18.63 power_saving/2 20073 nobody 20 0 154m 4792 2208 S 0.3 0.0 0:00.97 gmond 20666 root 20 0 15148 1428 940 R 0.3 0.0 0:00.05 top 1 root 20 0 19324 1620 1292 S 0.0 0.0 0:02.00 init
|
[root@compute-2-4 ~]# ps -aux |grep power_saving Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ root 6947 37.2 0.0 0 0 ? R 01:13 194:38 [power_saving/0] root 6948 38.1 0.0 0 0 ? R 01:13 199:04 [power_saving/1] root 6949 38.9 0.0 0 0 ? R 01:13 203:29 [power_saving/2] root 6950 39.5 0.0 0 0 ? R 01:13 206:30 [power_saving/3] root 6951 40.2 0.0 0 0 ? R 01:13 210:06 [power_saving/4] root 6952 47.0 0.0 0 0 ? R 01:13 245:45 [power_saving/5] root 6953 41.8 0.0 0 0 ? R 01:13 218:09 [power_saving/6] root 6954 43.3 0.0 0 0 ? R 01:13 226:22 [power_saving/7] root 6955 43.2 0.0 0 0 ? R 01:13 225:44 [power_saving/8] root 6956 45.1 0.0 0 0 ? R 01:13 235:47 [power_saving/9] root 6957 45.3 0.0 0 0 ? R 01:13 236:42 [power_saving/10] root 6958 46.0 0.0 0 0 ? D 01:13 240:34 [power_saving/11] root 6959 47.7 0.0 0 0 ? D 01:13 249:16 [power_saving/12] root 6960 50.1 0.0 0 0 ? R 01:13 261:28 [power_saving/13] root 6961 62.4 0.0 0 0 ? D 01:13 325:53 [power_saving/14] root 6962 36.3 0.0 0 0 ? R 01:13 189:29 [power_saving/15] |
[root@compute-2-4 ~]# ps -aux |grep watchdog Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ root 6 26.5 0.0 0 0 ? R Jul09 173:17 [watchdog/0] root 10 26.1 0.0 0 0 ? R Jul09 170:29 [watchdog/1] root 14 25.8 0.0 0 0 ? R Jul09 168:28 [watchdog/2] root 18 25.5 0.0 0 0 ? R Jul09 166:28 [watchdog/3] root 22 25.6 0.0 0 0 ? S Jul09 166:57 [watchdog/4] root 26 25.2 0.0 0 0 ? R Jul09 164:44 [watchdog/5] root 30 27.9 0.0 0 0 ? R Jul09 182:05 [watchdog/6] root 34 24.6 0.0 0 0 ? R Jul09 160:52 [watchdog/7] root 38 24.7 0.0 0 0 ? S Jul09 161:08 [watchdog/8] root 42 24.6 0.0 0 0 ? S Jul09 160:51 [watchdog/9] root 46 24.7 0.0 0 0 ? R Jul09 161:31 [watchdog/10] root 50 24.9 0.0 0 0 ? R Jul09 162:37 [watchdog/11] root 54 24.4 0.0 0 0 ? R Jul09 159:23 [watchdog/12] root 58 24.5 0.0 0 0 ? R Jul09 159:59 [watchdog/13] root 62 24.1 0.0 0 0 ? R Jul09 157:21 [watchdog/14] root 66 17.0 0.0 0 0 ? S Jul09 111:22 [watchdog/15] |
I can fould the power_saving on the linux system. after i reboot , it'll be ok ,and then these processes appeared randomly on all the server.
Please figure me out what the problem?
thank you very much!
DELL-Jonathan S
153 Posts
1
July 10th, 2012 07:00
The CentOS 6.2 kernel is 2.6.32-220. It looks like you might be encountering the issue that is documented here: bugzilla.kernel.org/show_bug.cgi which affects, at least, kernel 2.6.32-131 (in 6.1) and newer. The fix is committed for 3.5-rc2 and it may take some time to be backported into RHEL and reach CentOS. If I hear anything new about the progress of this I'll post here.
alvinc702
1 Message
0
July 13th, 2012 09:00
Is there a list of affected Dell servers (12g line only?) and Operating systems?
Thank you
DELL-Jonathan S
153 Posts
0
July 13th, 2012 16:00
From the notes on that bug report (provided this is what you are actually seeing!) it seems this is an issue with Linux ACPI 4.0 support idling multiple cores in a short time (more than 2), and from the bug conversation it seems to me this can be observed on any system running an affected kernel with many cores and ACPI 4.0.
Anonymous
5 Practitioner
5 Practitioner
•
274.2K Posts
0
July 14th, 2012 01:00
Some 12g server , I have installed the centos 5.7 and everythin is ok
Anonymous
5 Practitioner
5 Practitioner
•
274.2K Posts
0
July 14th, 2012 01:00
Maybe I can try to close the acpid service?
Anonymous
5 Practitioner
5 Practitioner
•
274.2K Posts
0
July 14th, 2012 02:00
did you think whether it woks if i rmod the acpi_pad?
DELL-Jonathan S
153 Posts
0
July 17th, 2012 12:00
If you do not require ACPI support those may be reasonable workarounds. Does it run ok for several days like that?
Anonymous
5 Practitioner
5 Practitioner
•
274.2K Posts
0
July 21st, 2012 20:00
It does not work
DELL-Garima K
1 Message
1
August 22nd, 2012 13:00
Some other options to try:
1) Disable Logical Processor in the BIOS
2) unload the acpi_pad module by running “rmmod acpi_pad”
3) Use a kernel parameter at boot time to prevent acpi_pad module from loading on server boot up. Append “acpi_pad.disable=1” to the kernel line
matt_fio
1 Message
0
September 25th, 2012 09:00
I'm having the same problem running CentOS 6.3 on Dell R720s. I've got six of them and they will randomly enter this state. My 'ps' output is pasted below. This happens sometimes immediately after reboot. I don't thing the kernel bug mentioned by Jonathan is the problem, as I followed the steps to reproduce the problem given by Len Brown in comment #4 on bugzilla.kernel.org/show_bug.cgi and it didn't result in the same condition. It also sounds like the workaround didn't work for chufall. Any other suggestions for workarounds?
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 7407 43.7 0.0 0 0 ? R Sep24 570:19 \_ [power_saving/14]
root 7406 43.1 0.0 0 0 ? R Sep24 562:42 \_ [power_saving/13]
root 7404 42.7 0.0 0 0 ? D Sep24 556:54 \_ [power_saving/11]
root 7405 42.6 0.0 0 0 ? D Sep24 555:22 \_ [power_saving/12]
root 7403 41.4 0.0 0 0 ? R Sep24 539:50 \_ [power_saving/10]
root 7402 41.3 0.0 0 0 ? R Sep24 539:06 \_ [power_saving/9]
root 7401 40.6 0.0 0 0 ? D Sep24 529:13 \_ [power_saving/8]
root 7400 39.8 0.0 0 0 ? R Sep24 518:56 \_ [power_saving/7]
root 7399 39.2 0.0 0 0 ? D Sep24 510:50 \_ [power_saving/6]
root 7398 38.8 0.0 0 0 ? D Sep24 505:43 \_ [power_saving/5]
root 7397 38.0 0.0 0 0 ? D Sep24 495:58 \_ [power_saving/4]
root 7396 37.0 0.0 0 0 ? R Sep24 482:34 \_ [power_saving/3]
root 7395 36.2 0.0 0 0 ? D Sep24 472:25 \_ [power_saving/2]
root 7394 36.1 0.0 0 0 ? R Sep24 471:21 \_ [power_saving/1]
root 7393 35.0 0.0 0 0 ? R Sep24 456:08 \_ [power_saving/0]
root 6 33.6 0.0 0 0 ? S Sep24 447:03 \_ [watchdog/0]
root 7408 33.3 0.0 0 0 ? R Sep24 434:51 \_ [power_saving/15]
root 10 33.2 0.0 0 0 ? R Sep24 441:59 \_ [watchdog/1]
root 18 32.8 0.0 0 0 ? S Sep24 436:22 \_ [watchdog/3]
root 14 32.7 0.0 0 0 ? R Sep24 435:59 \_ [watchdog/2]
root 66 32.4 0.0 0 0 ? S Sep24 431:23 \_ [watchdog/15]
root 62 32.3 0.0 0 0 ? R Sep24 430:35 \_ [watchdog/14]
root 22 32.3 0.0 0 0 ? R Sep24 430:22 \_ [watchdog/4]
root 26 32.2 0.0 0 0 ? S Sep24 428:53 \_ [watchdog/5]
root 30 32.0 0.0 0 0 ? S Sep24 426:16 \_ [watchdog/6]
root 58 31.9 0.0 0 0 ? S Sep24 424:31 \_ [watchdog/13]
root 50 31.9 0.0 0 0 ? S Sep24 424:33 \_ [watchdog/11]
root 38 31.9 0.0 0 0 ? S Sep24 425:21 \_ [watchdog/8]
root 34 31.8 0.0 0 0 ? S Sep24 423:39 \_ [watchdog/7]
root 54 31.7 0.0 0 0 ? S Sep24 422:49 \_ [watchdog/12]
root 46 31.7 0.0 0 0 ? R Sep24 422:15 \_ [watchdog/10]
root 42 31.7 0.0 0 0 ? S Sep24 422:10 \_ [watchdog/9]
mvpel
1 Message
0
November 6th, 2012 13:00
Are you sure you've disabled the acpi_pad module? Garima's workarounds seem to be the right ones.
I noticed this problem on one of my own Dell R720's running RHEL6u3, and it was a clue to me that hyperthreading was turned on, which I had intended to disable. Once I disabled it the problem went away.
But if you don't want to disable virtual cores, then you can unload the acpi_pad module with "rmmod acpi_pad - once that module came out, the wedged power_saving threads disappeared. If you add "acpi_pad.disable=1" to your kernel options with grubby, it'll stick through a reboot.
student_j
1 Message
0
January 29th, 2013 01:00
Has the fix to the bug reach CentOS?
Stephan-thevalley
130 Posts
0
June 8th, 2013 08:00
I'm still seeing this issue with the latest kernel (2.6.32-358.6.2.el6.x86_64) on a t620 system.
Any hints ?
Stephan-thevalley
130 Posts
0
June 8th, 2013 08:00
access.redhat.com/.../366273
Resolution :
•Disable C states/Speed Step in system BIOS
•To immediately reduce load on the system without a reboot modprobe -r acpi_pad
Not a real solution, more a workaround ?
jre-vast
2 Posts
0
January 27th, 2014 08:00
This problem exists on newest RHEL6.5, just bumped into it, shame Redhat hasn't backported the fix. Without gimping power management, you can just disable the "Logical Processor Idling" in the BIOS. With OpenManage installed, you can just run the following command and reboot:
omconfig chassis biossetup attribute=DynamicCoreAllocation setting=Disabled
You don't have to disable logical processors ie HyperThreading, the above setting is sufficient to avoid the problem.