What operating system did you have one the server?
AMD w6800 is not a supported GPU for PowerEdge R7425. Not supported does not mean it will not work. As it is not supported it is not validated, hence we can not guarantee that it will work.
Below are the GPU which is supported in PowerEdge R7425
I see the following errors in dmesg after the amdgpu module loads:
[ 85.055286] [drm] amdgpu kernel modesetting enabled.
[ 85.055293] [drm] amdgpu version: 5.18.2.22.40
[ 85.055294] [drm] OS DRM version: 5.18.0
[ 85.055775] amdgpu: Ignoring ACPI CRAT on non-APU system
[ 85.055785] amdgpu: Virtual CRAT table created for CPU
[ 85.055920] amdgpu: Topology: Add CPU node
[ 85.069562] amdgpu 0000:c3:00.0: enabling device (0140 -> 0143)
[ 85.069662] [drm] initializing kernel modesetting (SIENNA_CICHLID 0x1002:0x73A3 0x1002:0x0E1E 0x00).
[ 85.069736] [drm] register mmio base: 0x91800000
[ 85.069738] [drm] register mmio size: 1048576
[ 85.071924] [drm] add ip block number 0
[ 85.071927] [drm] add ip block number 1
[ 85.071929] [drm] add ip block number 2
[ 85.071931] [drm] add ip block number 3
[ 85.071933] [drm] add ip block number 4
[ 85.071935] [drm] add ip block number 5
[ 85.071937] [drm] add ip block number 6
[ 85.071939] [drm] add ip block number 7
[ 85.071940] [drm] add ip block number 8
[ 85.071941] [drm] add ip block number 9
[ 85.109653] amdgpu 0000:c3:00.0: amdgpu: Fetched VBIOS from ROM BAR [ 85.109675] amdgpu: ATOM BIOS: 113-D4300100-100 [ 85.109690] [drm] VCN(0) decode is enabled in VM mode [ 85.109692] [drm] VCN(1) decode is enabled in VM mode [ 85.109693] [drm] VCN(0) encode is enabled in VM mode [ 85.109694] [drm] VCN(1) encode is enabled in VM mode [ 85.109697] [drm] JPEG decode is enabled in VM mode [ 85.109698] amdgpu 0000:c3:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default) [ 85.109731] [drm] GPU posting now... [ 85.109762] amdgpu 0000:c3:00.0: amdgpu: MEM ECC is active. [ 85.109763] amdgpu 0000:c3:00.0: amdgpu: SRAM ECC is not presented. [ 85.109769] amdgpu 0000:c3:00.0: amdgpu: RAS INFO: ras initialized successfully, hardware ability[101] ras_mask[101] [ 85.109779] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit [ 85.109801] amdgpu 0000:c3:00.0: BAR 2: releasing [mem 0x10010000000-0x100101fffff 64bit pref] [ 85.109805] amdgpu 0000:c3:00.0: BAR 0: releasing [mem 0x10000000000-0x1000fffffff 64bit pref] [ 85.109823] pcieport 0000:c2:00.0: BAR 15: releasing [mem 0x10000000000-0x100101fffff 64bit pref] [ 85.109826] pcieport 0000:c1:00.0: BAR 15: releasing [mem 0x10000000000-0x100101fffff 64bit pref] [ 85.109828] pcieport 0000:c0:03.1: BAR 15: releasing [mem 0x10000000000-0x100101fffff 64bit pref] [ 85.109838] pcieport 0000:c0:03.1: BAR 15: no space for [mem size 0xc00000000 64bit pref] [ 85.109840] pcieport 0000:c0:03.1: BAR 15: failed to assign [mem size 0xc00000000 64bit pref] [ 85.109843] pcieport 0000:c1:00.0: BAR 15: no space for [mem size 0xc00000000 64bit pref] [ 85.109844] pcieport 0000:c1:00.0: BAR 15: failed to assign [mem size 0xc00000000 64bit pref] [ 85.109846] pcieport 0000:c2:00.0: BAR 15: no space for [mem size 0xc00000000 64bit pref] [ 85.109848] pcieport 0000:c2:00.0: BAR 15: failed to assign [mem size 0xc00000000 64bit pref] [ 85.109851] amdgpu 0000:c3:00.0: BAR 0: no space for [mem size 0x800000000 64bit pref] [ 85.109852] amdgpu 0000:c3:00.0: BAR 0: failed to assign [mem size 0x800000000 64bit pref] [ 85.109854] amdgpu 0000:c3:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref] [ 85.109856] amdgpu 0000:c3:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref] [ 85.109858] pcieport 0000:c0:03.1: PCI bridge to [bus c1-c3] [ 85.109860] pcieport 0000:c0:03.1: bridge window [io 0x2000-0x2fff] [ 85.109863] pcieport 0000:c0:03.1: bridge window [mem 0x91800000-0x91afffff] [ 85.109868] pcieport 0000:c0:03.1: PCI bridge to [bus c1-c3] [ 85.109870] pcieport 0000:c0:03.1: bridge window [io 0x2000-0x2fff] [ 85.109872] pcieport 0000:c0:03.1: bridge window [mem 0x91800000-0x91afffff] [ 85.109874] pcieport 0000:c0:03.1: bridge window [mem 0x10000000000-0x100101fffff 64bit pref] [ 85.109878] pcieport 0000:c1:00.0: PCI bridge to [bus c2-c3] [ 85.109880] pcieport 0000:c1:00.0: bridge window [io 0x2000-0x2fff] [ 85.109884] pcieport 0000:c1:00.0: bridge window [mem 0x91800000-0x919fffff] [ 85.109887] pcieport 0000:c1:00.0: bridge window [mem 0x10000000000-0x100101fffff 64bit pref] [ 85.109892] pcieport 0000:c2:00.0: PCI bridge to [bus c3] [ 85.109894] pcieport 0000:c2:00.0: bridge window [io 0x2000-0x2fff] [ 85.109898] pcieport 0000:c2:00.0: bridge window [mem 0x91800000-0x919fffff] [ 85.109901] pcieport 0000:c2:00.0: bridge window [mem 0x10000000000-0x100101fffff 64bit pref] [ 85.109911] [drm] Not enough PCI address space for a large BAR. [ 85.109913] amdgpu 0000:c3:00.0: BAR 0: assigned [mem 0x10000000000-0x1000fffffff 64bit pref] [ 85.109922] amdgpu 0000:c3:00.0: BAR 2: assigned [mem 0x10010000000-0x100101fffff 64bit pref] [ 85.109944] amdgpu 0000:c3:00.0: amdgpu: VRAM: 30704M 0x0000008000000000 - 0x000000877EFFFFFF (30704M used) [ 85.109946] amdgpu 0000:c3:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF [ 85.109948] amdgpu 0000:c3:00.0: amdgpu: AGP: 267878400M 0x0000008800000000 - 0x0000FFFFFFFFFFFF [ 85.109959] [drm] Detected VRAM RAM=30704M, BAR=256M [ 85.109961] [drm] RAM width 256bits GDDR6 [ 85.110120] [drm] amdgpu: 30704M of VRAM memory ready [ 85.110123] [drm] amdgpu: 64022M of GTT memory ready. [ 85.110147] [drm] GART: num cpu pages 131072, num gpu pages 131072 [ 85.355685] Uhhuh. NMI received for unknown reason 2d on CPU 16. [ 85.355690] Uhhuh. NMI received for unknown reason 2d on CPU 19. [ 85.355694] Uhhuh. NMI received for unknown reason 2d on CPU 3. [ 85.355695] Do you have a strange power saving mode enabled? [ 85.355698] Do you have a strange power saving mode enabled? [ 85.355697] Uhhuh. NMI received for unknown reason 2d on CPU 27. [ 85.355698] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 7 [ 85.355699] Uhhuh. NMI received for unknown reason 2d on CPU 11. [ 85.355701] Do you have a strange power saving mode enabled? [ 85.355700] Dazed and confused, but trying to continue [ 85.355703] Dazed and confused, but trying to continue [ 85.355702] Uhhuh. NMI received for unknown reason 2d on CPU 13. [ 85.355704] Do you have a strange power saving mode enabled? [ 85.355705] Uhhuh. NMI received for unknown reason 2d on CPU 29. [ 85.355706] Do you have a strange power saving mode enabled? [ 85.355706] Dazed and confused, but trying to continue [ 85.355706] {1}[Hardware Error]: event severity: fatal [ 85.355707] Uhhuh. NMI received for unknown reason 2d on CPU 15. [ 85.355709] Dazed and confused, but trying to continue [ 85.355710] Uhhuh. NMI received for unknown reason 2d on CPU 31. [ 85.355712] Dazed and confused, but trying to continue [ 85.355711] Do you have a strange power saving mode enabled? [ 85.355711] Do you have a strange power saving mode enabled? [ 85.355712] Uhhuh. NMI received for unknown reason 2d on CPU 7. [ 85.355715] Uhhuh. NMI received for unknown reason 2d on CPU 23. [ 85.355715] {1}[Hardware Error]: Error 0, type: fatal [ 85.355717] Do you have a strange power saving mode enabled? [ 85.355717] Do you have a strange power saving mode enabled? [ 85.355717] Uhhuh. NMI received for unknown reason 2d on CPU 5. [ 85.355720] Dazed and confused, but trying to continue [ 85.355721] Dazed and confused, but trying to continue [ 85.355721] Uhhuh. NMI received for unknown reason 2d on CPU 21. [ 85.355724] Do you have a strange power saving mode enabled? [ 85.355724] Do you have a strange power saving mode enabled? [ 85.355726] Do you have a strange power saving mode enabled? [ 85.355726] Do you have a strange power saving mode enabled? [ 85.355726] Dazed and confused, but trying to continue [ 85.355726] Dazed and confused, but trying to continue [ 85.355728] Dazed and confused, but trying to continue [ 85.355731] Dazed and confused, but trying to continue [ 85.355725] Uhhuh. NMI received for unknown reason 2d on CPU 17. [ 85.355727] Uhhuh. NMI received for unknown reason 2d on CPU 1. [ 85.355730] Uhhuh. NMI received for unknown reason 2d on CPU 25. [ 85.355733] Uhhuh. NMI received for unknown reason 2d on CPU 9. [ 85.355732] Dazed and confused, but trying to continue [ 85.355735] Uhhuh. NMI received for unknown reason 2d on CPU 6. [ 85.355726] {1}[Hardware Error]: section_type: PCIe error [ 85.355734] Dazed and confused, but trying to continue [ 85.355738] Uhhuh. NMI received for unknown reason 2d on CPU 22. [ 85.355740] Uhhuh. NMI received for unknown reason 2d on CPU 30. [ 85.355743] Uhhuh. NMI received for unknown reason 2d on CPU 14. [ 85.355742] Do you have a strange power saving mode enabled? [ 85.355742] Do you have a strange power saving mode enabled? [ 85.355742] Do you have a strange power saving mode enabled? [ 85.355746] Uhhuh. NMI received for unknown reason 2d on CPU 10. [ 85.355741] Do you have a strange power saving mode enabled? [ 85.355750] Do you have a strange power saving mode enabled? [ 85.355749] Uhhuh. NMI received for unknown reason 2d on CPU 26. [ 85.355754] Do you have a strange power saving mode enabled? [ 85.355747] {1}[Hardware Error]: port_type: 1, legacy PCI end point [ 85.355754] Uhhuh. NMI received for unknown reason 2d on CPU 24. [ 85.355751] Uhhuh. NMI received for unknown reason 2d on CPU 8. [ 85.355754] Dazed and confused, but trying to continue [ 85.355752] Do you have a strange power saving mode enabled? [ 85.355752] Do you have a strange power saving mode enabled? [ 85.355761] Do you have a strange power saving mode enabled? [ 85.355757] Uhhuh. NMI received for unknown reason 2d on CPU 20. [ 85.355761] Dazed and confused, but trying to continue [ 85.355758] Dazed and confused, but trying to continue [ 85.355759] Uhhuh. NMI received for unknown reason 2d on CPU 4. [ 85.355758] Dazed and confused, but trying to continue [ 85.355770] Dazed and confused, but trying to continue [ 85.355767] Do you have a strange power saving mode enabled? [ 85.355762] Dazed and confused, but trying to continue [ 85.355765] Uhhuh. NMI received for unknown reason 2d on CPU 12. [ 85.355767] Dazed and confused, but trying to continue [ 85.355773] Do you have a strange power saving mode enabled? [ 85.355773] Do you have a strange power saving mode enabled? [ 85.355776] Do you have a strange power saving mode enabled? [ 85.355781] Do you have a strange power saving mode enabled? [ 85.355767] Dazed and confused, but trying to continue [ 85.355767] Dazed and confused, but trying to continue [ 85.355781] Dazed and confused, but trying to continue [ 85.355767] Dazed and confused, but trying to continue [ 85.355770] {1}[Hardware Error]: version: 3.0 [ 85.355786] Dazed and confused, but trying to continue [ 85.355786] Dazed and confused, but trying to continue [ 85.355767] Uhhuh. NMI received for unknown reason 2d on CPU 18. [ 85.355769] Uhhuh. NMI received for unknown reason 2d on CPU 2. [ 85.355789] Dazed and confused, but trying to continue [ 85.355789] Dazed and confused, but trying to continue [ 85.355762] Uhhuh. NMI received for unknown reason 2d on CPU 28. [ 85.355788] Do you have a strange power saving mode enabled? [ 85.355796] {1}[Hardware Error]: command: 0x0143, status: 0x0010 [ 85.355800] Dazed and confused, but trying to continue [ 85.355800] Do you have a strange power saving mode enabled? [ 85.355801] Do you have a strange power saving mode enabled? [ 85.355801] Do you have a strange power saving mode enabled? [ 85.355804] Dazed and confused, but trying to continue [ 85.355803] {1}[Hardware Error]: device_id: 0000:c3:00.0 [ 85.355804] Dazed and confused, but trying to continue [ 85.355806] Dazed and confused, but trying to continue [ 85.355810] {1}[Hardware Error]: slot: 4 [ 85.355812] {1}[Hardware Error]: secondary_bus: 0x00 [ 85.355814] {1}[Hardware Error]: vendor_id: 0x1002, device_id: 0x73a3 [ 85.355816] {1}[Hardware Error]: class_code: 030000 [ 85.355818] {1}[Hardware Error]: aer_uncor_status: 0x00000000, aer_uncor_mask: 0x00010000 [ 85.355821] {1}[Hardware Error]: aer_uncor_severity: 0x004ef030 [ 85.355823] {1}[Hardware Error]: TLP Header: 40001001 c000000f 9187f000 00000000 [ 85.355830] Kernel panic - not syncing: Fatal hardware error!
DELL-Shine K
4 Operator
•
3K Posts
0
December 29th, 2022 00:00
What operating system did you have one the server?
AMD w6800 is not a supported GPU for PowerEdge R7425. Not supported does not mean it will not work. As it is not supported it is not validated, hence we can not guarantee that it will work.
Below are the GPU which is supported in PowerEdge R7425
Number
JoJoMan25
3 Posts
0
December 29th, 2022 10:00
I've tried rhel 9.1 and Ubuntu 22.04 lts
JoJoMan25
3 Posts
0
December 30th, 2022 10:00
I see the following errors in dmesg after the amdgpu module loads: