Unsolved

797

July 17th, 2022 16:00

T7820 AMD GPU BIOS regression

I have a T7820 with an AMD RX 5700XT GPU installed.  Recently I updated the BIOS, from 2.6.3 to 2.24.0, and the OS failed to start, reporting some errors related to the GPU:

[   14.256314] amdgpu 0000:b5:00.0: amdgpu: PSP runtime database doesn't exist
[   14.297174] [drm] Found VCN firmware Version ENC: 1.17 DEC: 5 VEP: 0 Revision: 2
[   14.297180] amdgpu 0000:b5:00.0: amdgpu: Will use PSP to load VCN firmware
[   15.882731] [drm:psp_hw_start [amdgpu]] *ERROR* PSP load kdb failed!
[   15.883123] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
[   15.883454] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block  failed -22
[   15.883809] amdgpu 0000:b5:00.0: amdgpu: amdgpu_device_ip_init failed
[   15.883963] amdgpu 0000:b5:00.0: amdgpu: Fatal error during GPU init
[   15.884115] amdgpu 0000:b5:00.0: amdgpu: amdgpu: finishing device.
[   15.885471] amdgpu: probe of 0000:b5:00.0 failed with error -22

 

I tried updating the OS and linux-firmware to the latest versions as well as running the LTS and fallback kernels, with the same problem: black screen after the kernel attempted to start.  Attempting to boot from an ISO also yielded the same result.  However, manually installing BIOS 2.6.3 from a flash drive via UEFI enabled the system to boot properly again. 

I have since attempted to isolate when this bug was introduced in the BIOS, by installing various BIOS versions to see which ones work and which ones don't. 

  • 2.6.3: OK
  • 2.9.0: OK
  • 2.12.0: OK
  • 2.13.1: bad
  • 2.14.0: bad
  • 2.24.0: bad

So, it looks like I'm stuck on BIOS 2.12.0 until someone at Dell tracks down whatever is preventing the RX 5700XT from working correctly.

Has anyone else run in to a similar issue?

11 Legend

 • 

47K Posts

July 17th, 2022 17:00

@alex.forencich 

The motherboard bios is saying there is no certificate for the Video Card so secure boot Must be OFF.

amdgpu 0000:b5:00.0:

amdgpu: PSP runtime database doesn't exist

T7800 does not come with radeon RX Anything.

Intel® Xeon® Bronze 3204
Windows 11 Pro for Workstations
AMD® Radeon™ Pro WX 3200
16 GB Memory
256 GB SSD



The  PSP is the TPM processor.

ELAM is not letting the card load because it has no certificate. 3rd party software and hardware will be prevented from loading without a certificate in bios.  Dell isn't going to figure this out because its working as designed.

ELAM.jpg

 

11 Legend

 • 

47K Posts

July 17th, 2022 17:00

@alex.forencich 

While there is Linux support for Dell Workstations its not Free and its not for ALL versions of Linux on earth.  Specifically its for Redhat , Ubuntu, Suse, etc.

Support is not free and only if you purchased your Dell with Linux installed by Dell.  So again Dell wont be figuring this out.

Linux support  000138246

 

 

July 17th, 2022 17:00

Secure boot is disabled.  Is there another setting that needs to be adjusted?

For reference, after downgrading to 2.12.0, the firmware-related output looks like so:

[   14.312693] amdgpu 0000:b5:00.0: amdgpu: PSP runtime database doesn't exist
[   14.353929] [drm] Found VCN firmware Version ENC: 1.17 DEC: 5 VEP: 0 Revision: 2
[   14.353936] amdgpu 0000:b5:00.0: amdgpu: Will use PSP to load VCN firmware
[   14.409531] [drm] reserve 0x900000 from 0x81fe400000 for PSP TMR
[   14.451394] amdgpu 0000:b5:00.0: amdgpu: RAS: optional ras ta ucode is not available
[   14.457430] amdgpu 0000:b5:00.0: amdgpu: RAP: optional rap ta ucode is not available
[   14.457433] amdgpu 0000:b5:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[   14.457531] amdgpu 0000:b5:00.0: amdgpu: use vbios provided pptable
[   14.457533] amdgpu 0000:b5:00.0: amdgpu: smc_dpm_info table revision(format.content): 4.5
[   14.492904] amdgpu 0000:b5:00.0: amdgpu: SMU is initialized successfully!
[   14.493067] [drm] Display Core initialized with v3.2.177!
[   14.947602] [drm] kiq ring mec 2 pipe 1 q 0
[   14.949575] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[   14.949708] [drm] JPEG decode initialized successfully.
[   14.952480] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[   15.008198] memmap_init_zone_device initialised 2097152 pages in 24ms
[   15.008207] amdgpu: HMM registered 8176MB device memory
[   15.008502] amdgpu: Virtual CRAT table created for GPU
[   15.008638] amdgpu: Topology: Add dGPU node [0x731f:0x1002]

 

July 17th, 2022 19:00

This does not seem like an issue specific to one linux distribution, but I should definitely see if I can boot into an Ubuntu iso with the latest firmware installed.  If the solution is something simple like changing a setting in BIOS or even performing a VBIOS update on the card, then fair enough.  And this is not some issue with esoteric hardware, this is an issue involving an off-the-shelf GPU from a major manufacturer.  If this isn't something that Dell is interested in looking in to, then perhaps all of the affected firmware releases should be flagged in LVFS so they are not automatically installed though fwupd on any version of linux (explicitly supported by Dell or otherwise) since this issue renders an affected machine unable to boot into the OS in a usable way. 

11 Legend

 • 

47K Posts

July 17th, 2022 20:00

@alex.forencich 

Dell is very selective about what it supports.

https://ubuntu.com/certified/201702-25401

 

  • Kernel

    This system was tested with 20.04 LTS, running the 5.10.0-1013-oem kernel.

    BIOS

    Dell Inc.: 2.6.3 (UEFI)

    Hardware details ›

issue involving an off-the-shelf GPU from a major manufacturer.

NVIDIA INSTALL

This is a KNOWN issue since 2012 over 10 years now.

Has nothing whatsoever to do with Dell and its not limited to NVIDIA.

https://nvidia.custhelp.com/app/answers/detail/a_id/3156/

 

When installing an after-market graphics card into a certified Windows
PC with UEFI enabled, the system may not boot.

July 17th, 2022 22:00

I replicated the issue on my 7920 with an RX 5700XT, system firmware 2.24.0 (latest), running Ubuntu 20.04.4 LTS.  So this is an issue even when running an explicitly supported OS.

11 Legend

 • 

47K Posts

July 18th, 2022 01:00

@alex.forencich 

Support from Dell or Canonical is not free.

https://ubuntu.com/certified/component/766

Dell support also requires the system to be purchased with Linux installed by Dell.

So again this wont be looked into for 3rd party hardware.

No Events found!

Top