Start a Conversation

Unsolved

S

1 Message

780

June 11th, 2020 09:00

XPS 15-7590 with heavy GPU load crashes on Ubuntu 16.04

The problem in short:

The XPS 15-7590 crashes on Ubuntu 16.04 under a heavy GPU load. I suspect this is caused by the GPU overheating (thermal throttling not working) as the GPU seems to crash as soon as the temperature reported by the nvidia-smi tool reaches 85-87 degrees Celcius. The crash if always reproducible and occurs within 5 minutes of heavy GPU usage. Does anyone know how to fix/avoid this issue?

The detailed version:

I recently bought an XPS 15-7590 (detailed specs below) with a primary aim of developing robotics and AI software. As it is common for such software to be developed using Ubuntu, after updating all the drivers, software packages, and BIOS from Windows, I dual booted the XPS with Ubuntu 16.04. The system works like a charm (after fixing some minor issues related to Wifi drivers). I installed the nvidia drivers (version 430.64) from the ppa:graphics-drivers/ppa and installed PyTorch. When I tried training a basic machine learning network, in a couple of minutes the fans start spinning really fast, and after a couple of more minutes, the system crashes mid training.

After restarting the system and checking the syslog, I found the log message thermald[1196]: critical temp reached. Upon further monitoring the CPU and GPU usage and their temperatures, I found that when I connect the charger, the Performance State of the GPU (when training) changes to P0 (the highest possible performance state) from P8 (the idle mode). But when I disconnect the charger, this state changes to P3. The Power usage also drops from about 50W to ~30W when the power state changes and the temperatures drop from ~85 deg to ~74 deg. In this kind of setting, the PC does not crash. I tried reading some articles about how to set the power limits (using nvidia-smi -pl) to a specific value but it seems like that is not possible with the Nvidia GTX1650 GPU.

 

To isolate the issue, I then tried running a GPU stress test (the FurMark test taken from here). Even with this test, I observed the same behavior and the crashed when the GPU temperature reached ~85 degrees Celcius.

 

I tried stress testing the XPS GPU and CPU using the Furmark testing suite in WIndows and the problem does not occur over there. The GPU barely reaches temperatures above 80 degrees and the thermal throttling kicks in to keep the system from crashing. Any help on how to fix this would be great!

 

System Specifications:

Model: Dell XPS 15-7590 (bought may 2020)

CPU: Intel® Core™ i7-9750H CPU @ 2.60GHz × 12

Memory: 16 GB

GPU: GeForce GTX 1650/PCIe/SSE2

Hard disk: 1 TB M.2 SSD

OS: Ubuntu 16.04

Nvidia-driver version: 430.64

 

 

Moderator

 • 

25.1K Posts

June 11th, 2020 09:00

Thank you! We have received the required details. We will work towards a resolution via private messages to ensure the security of your information. In the meanwhile, you may receive assistance or suggestions from the community members as well.

No Events found!

Top