Unsolved
6 Posts
0
1614
October 5th, 2020 08:00
Vostro 5391 randomly shuts down on Linux
Hello, I have a problem with my Vostro 5391-8672 laptop running Ubuntu 18.04.
When I run some 3D software (X-plane 11, Unigine Test), the laptop shuts down after working for about 5 minutes. When I enter BIOS after shutdown with F12, I can see fresh Themal Log item saying "Critical temperature shutdown".
I've already done anything obvious related to overheating:
- The laptop is new, so the dust in cooling system is definitely not a factor.
- The laptop is located on a desk, so the cooling should be ok.
- I've borrowed a cooling pad and put the laptop on, but this doesn't really help, it takes only couple of minutes longer for laptop to shutdown.
- I've brought my laptop to official service but their engineers couldn't reproduce the issue using their Windows 10 SSD.
- All updates to BIOS, Ubuntu, Nvidia driver are installed as of 29 september (BIOS is 1.10.x, Ubuntu is fully updated, NVidia driver is 450.x).
I've made some deep research and found out the following:
- The temperature of GPU stabilizes at 93±1 degree Celsius (as shown by "nvidia-settings" application).
- The temperature of CPU cores doesn't exceed 100 degree Celsuis when Intel Turbo Boost is on (as shown by "sensors" command-line application). When it approaches 100 degrees, throttling occurs and CPU cools down to about 76 degrees until throttling ends.
- The temperature of CPU cores doesn't exceed 88±1 degree Celsius when Intel Turbo Boost is off. No throttling occurs in this case.
So, automatic temperature regulation in both CPU and GPU appears to work as expected. As far as I know, temperatures I mentioned above are OK for modern CPUs and GPUs, but the laptop keeps shutting down. Are those temperatures really critical for this particular laptop?
My guess is there's an extra sensor not shown by "sensors" application which triggers "Critical temperature shutdown" stops. Or maybe Ubuntu installation lacks some configuration entries to set reasonable threshold temperatures for sensors. I need some assistance in this, thanks forward for anyone who can help.


SoftCreator
6 Posts
0
October 6th, 2020 10:00
But the BIOS thermal log still shows critical temperature. Some sensor/hardware failure?
SoftCreator
6 Posts
0
October 6th, 2020 10:00
Just caught critical temperature shutdown on video. Attached is a last video frame (hardinfo sensors shown) before shutdown. Temperatures are just right for busy system, not critical at all to my mind.
DELL-Jesse L
Moderator
•
17.9K Posts
•
69K Points
0
November 2nd, 2020 09:00
SoftCreator
6 Posts
0
November 2nd, 2020 09:00
Due to lack of support from Dell, I had to dig deeper. For those who faces the same problem, I'll write my results, as it is really unlikely for Dell to do anything
I've installed Windows 10 with all recent drivers and repeated the test. To my surprise, even under a heavy load both CPU and GPU temperatures haven't exceeded 70 degrees Celsius. Ok, maybe there's some sensor set to overheat at 80+ degrees in the laptop.
Then I've restored factory Ubuntu Linux and dug into thermald docs and various thermal_zones, cooling devices etc. There are two types of cooling devices in the /sys/class/thermal - Processor and intel_powerclamp. The latter is off by default (cur_state == -1 with max_state == 50). I haven't seen this driver working under the load. But why?
AFAIK it is thermald who can enable it based on thermal-conf.xml. Let's open it and see that... it is only example! No rules in factory installed Ubuntu specify to enable intel_powerclamp. So, let's do the following:
$ sudo systemctl stop thermald
$ sudo sh -c "echo 50 >/sys/class/thermal/cooling_device8/cur_state"
Ta-dam! Processor won't become hot, less than 70 degrees Celsius. But the laptop has very poor performance indeed. So let's find out which cur_state is the best tradeoff between slow laptop and fried one. My tests shown that the setting of 39-40 leads to moderate heating (CPU stabilizes at 80 degrees Celsius and GPU at 83-85). So, the crunch solution is:
$ sudo sh -c "echo 39 >/sys/class/thermal/cooling_device8/cur_state"
P.S. Dell, can you at least comment on this?
SoftCreator
6 Posts
0
November 3rd, 2020 00:00
All tests pass correctly, see attached image.
DELL-Cares
Moderator
•
27.7K Posts
•
331 Points
0
November 3rd, 2020 04:00
Hi There,
We'd like to let you know that the Dell diagnostics will identify if there are any hardware issues and let us know through error message pertaining to a specific component.
Also, to answer to your original question, we have limited support for Linux operating system, you may want to check the Dell dedicate Linux support page if it contains any work arounds https://dell.to/2GuaEys or on the Linux canonical website community where, other developers and users may have fixes or solution available https://dell.to/34QjzDw