Linux Developer Systems

Last reply by 11-25-2021 Unsolved
Start a Discussion
2 Bronze
2 Bronze

Precision 7560 won't boot with Network Enabled

Hey all,

Work just upgraded my laptop to the new Precision 7560 running Ubuntu 20.04.  It seemed to work fine the first day, but after upgrading the installed packages I stopped being able to boot.  I'd get past GRUB and just get a blank screen.

Variations of passing `nomodeset` and `acpi=off` to the kernel helped a bit, but not reliably.  After a couple OS recovery installs and trying the standard Ubuntu installer, I think I've isolated it to the Intel NIC.  If I disable it in the BIOS, I can boot into X every time.  This is under the 5.10.0-1034-oem kernel.

When I tried the standard Ubuntu installer (kernel 5.8.0), I was able to boot into X with the NIC enabled.  However, the WIFI and Bluetooth devices are not supported!  I have also updated to the latest BIOS 1.2.2 without any luck.

Edit to Add:  The wired NIC worked fine under kernel 5.8.0 (and under 5.10 if it happens to boot succesffully), and all the Onboard Diagnostic tests pass.

I am able to view some error messages from the previous stalled boot using journactl.  It looks like it's related to the device firmware loading/unloading.

Spoiler
Jul 17 16:20:24 talos kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
Jul 17 16:20:24 talos kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [systemd-udevd:286]
Jul 17 16:20:24 talos kernel: Modules linked in: hid_sensor_custom hid_sensor_hub intel_ishtp_loader intel_ishtp_hid nvidia_drm(PO) nvidia_modeset(PO) hid_generic nvidia(PO) f>
Jul 17 16:20:24 talos kernel: CPU: 1 PID: 286 Comm: systemd-udevd Tainted: P W O 5.10.0-1034-oem #35-Ubuntu
Jul 17 16:20:24 talos kernel: Hardware name: Dell Inc. Precision 7560/0Y1R4H, BIOS 1.2.2 06/29/2021
Jul 17 16:20:24 talos kernel: RIP: 0010:e1000_flash_cycle_ich8lan.constprop.0+0x5c/0x90 [e1000e]
Jul 17 16:20:24 talos kernel: Code: 00 00 0b 76 42 c1 e0 10 89 42 04 bb 81 96 98 00 eb 0f bf c7 10 00 00 e8 32 e8 0b e7 83 eb 01 74 10 49 8b 44 24 10 66 8b 40 04 <41> 89 c5 a8>
Jul 17 16:20:24 talos kernel: RSP: 0018:ffffa40780def920 EFLAGS: 00000202
Jul 17 16:20:24 talos kernel: RAX: ffffa40780bc4028 RBX: 0000000000353713 RCX: 0000000000000001
Jul 17 16:20:24 talos kernel: RDX: 0000000000000a38 RSI: 0000000000000001 RDI: 0000000000000a1f
Jul 17 16:20:24 talos kernel: RBP: ffffa40780def938 R08: 0000001c8a9dd609 R09: 0000000000000001
Jul 17 16:20:24 talos kernel: R10: ffff8d6a56ef1030 R11: 0000000000000000 R12: ffff8d6a56ef0f38
Jul 17 16:20:24 talos kernel: R13: 0000000080bc4028 R14: 0000000000000000 R15: ffff8d6a56ef0f38
Jul 17 16:20:24 talos kernel: FS: 00007fa01c36e880(0000) GS:ffff8d798fc40000(0000) knlGS:0000000000000000
Jul 17 16:20:24 talos kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 17 16:20:24 talos kernel: CR2: 000055644907169c CR3: 0000000118780002 CR4: 0000000000770ee0
Jul 17 16:20:24 talos kernel: PKRU: 55555554
Jul 17 16:20:24 talos kernel: Call Trace:
Jul 17 16:20:24 talos kernel: e1000_erase_flash_bank_ich8lan+0xa0/0x1a0 [e1000e]
Jul 17 16:20:24 talos kernel: e1000_update_nvm_checksum_spt+0x1f5/0x340 [e1000e]
Jul 17 16:20:24 talos kernel: e1000_validate_nvm_checksum_ich8lan+0xa1/0xd0 [e1000e]
Jul 17 16:20:24 talos kernel: e1000_probe+0x65f/0xc90 [e1000e]
Jul 17 16:20:24 talos kernel: local_pci_probe+0x48/0x80
Jul 17 16:20:24 talos kernel: pci_device_probe+0x10f/0x1c0
Jul 17 16:20:24 talos kernel: really_probe+0xfb/0x420
Jul 17 16:20:24 talos kernel: driver_probe_device+0xe9/0x160
Jul 17 16:20:24 talos kernel: device_driver_attach+0x5d/0x70
Jul 17 16:20:24 talos kernel: __driver_attach+0x8f/0x150
Jul 17 16:20:24 talos kernel: ? device_driver_attach+0x70/0x70
Jul 17 16:20:24 talos kernel: bus_for_each_dev+0x7e/0xc0
Jul 17 16:20:24 talos kernel: driver_attach+0x1e/0x20
Jul 17 16:20:24 talos kernel: bus_add_driver+0x152/0x1f0
Jul 17 16:20:24 talos kernel: driver_register+0x74/0xd0
Jul 17 16:20:24 talos kernel: ? 0xffffffffc0773000
Jul 17 16:20:24 talos kernel: __pci_register_driver+0x54/0x60
Jul 17 16:20:24 talos kernel: e1000_init_module+0x3b/0x1000 [e1000e]
Jul 17 16:20:24 talos kernel: do_one_initcall+0x48/0x1d0
Jul 17 16:20:24 talos kernel: ? _cond_resched+0x19/0x30
Jul 17 16:20:24 talos kernel: ? kmem_cache_alloc_trace+0x37a/0x430
Jul 17 16:20:24 talos kernel: ? do_init_module+0x28/0x250
Jul 17 16:20:24 talos kernel: do_init_module+0x62/0x250
Jul 17 16:20:24 talos kernel: load_module+0x11ac/0x1370
Jul 17 16:20:24 talos kernel: ? security_kernel_post_read_file+0x5c/0x70
Jul 17 16:20:24 talos kernel: ? security_kernel_post_read_file+0x5c/0x70
Jul 17 16:20:24 talos kernel: __do_sys_finit_module+0xc2/0x120
Jul 17 16:20:24 talos kernel: ? __do_sys_finit_module+0xc2/0x120
Jul 17 16:20:24 talos kernel: __x64_sys_finit_module+0x1a/0x20
Jul 17 16:20:24 talos kernel: do_syscall_64+0x38/0x90
Jul 17 16:20:24 talos kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 17 16:20:24 talos kernel: RIP: 0033:0x7fa01c8f089d

Any suggestions would be appreciated!  In the meantime I guess I'm running WiFi-only or maybe using a thunderbolt dock.

Thanks,
Dean

Replies (8)
2 Bronze
2 Bronze

I had some luck using a custom compiled version of the e1000e module, using the 3.8.7 version from sourceforge.net.  I did have to disable an Ubuntu version check and SecureBoot (until I enroll my own MOK).  But it boots, and the wired NIC seems to work!

2 Bronze
2 Bronze

Found a bug report on the Linux Kernel bug tracker here: https://bugzilla.kernel.org/show_bug.cgi?id=213667

The included patches also seem to fix the issue.  Hopefully those make it into the OEM kernel soon!

2 Bronze
2 Bronze

As an update, a fix did make it into the OEM kernel.  So now it won't hang on boot trying to write a correct checksum to the NIC NVM.  Yay!  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1936998

But, while it boots right up, it won't load the e1000e driver module if the NVM checksum is bad.  Booo!  I'm still running a custom patched e1000e module (through DKMS) until Dell and our cyber-support team sort this out.

2 Bronze
2 Bronze

Is there any hope that someone tries to fix it? I think we cannot expect the next Kernel patch when the NVM checksum is wrong.

Windows doesn't complain and the network card works, so probably it ignores the bad checksum by default.

Hi,

I have same identical problem on my Precision 7560 with Ubuntu 20.04.

The story: after first installation of Ubuntu 20.04 (and after update kernel to 5.11.0-38-generic) NIC work but Wi-Fi not (bloutooth was ok). After some search I have to remove file iwlwifi-ty-a0-gf-a0.pnvm and all was ok, except a problem which forced me to replace the motherboard.

After motehrboard replacement SBAM! NIC didn't work anymore: problem was

e1000e 0000:00:1f.6: The NVM Checksum Is Not Valid
e1000e: probe of 0000:00:1f.6 failed with error -5 

 Then, after upgrade on kernel 5.11.0-41-generic, I installed related dkms fix https://github.com/koljah-de/e1000e-dkms-debian/releases and after reboot NIC has returned to work.

Here the problem: not all system reboot now happen successfully: sometimes system doesn't boot (with errore BUG: soft lockup - CPU#1 stuck for 23s!) or start very slow. If force restart during this time instead (hold power button for 10 seconds), all go on and Ubuntu start as a usual, with NIC working as expected.

Note that every time I access the bios and exit, the next reboot failed, ALWAYS! This behavior does not allow to start an iso because the Ubuntu ISOs do not start for the same problem (not allowing me to test if the NIC works with a different version of Ubuntu).

At the moment, as suggested by xerotope, I am forced to disable NIC in the BIOS and work only with Wi-Fi module.

Now some questions:

  • why NIC works with old motherboard and not with the replacement?
  • assuming that there is some catch in the current ubuntu installation after the replacement, how can i be sure that a reinstall of Ubuntu will solve the problem, not being able to test a live iso?
  • in case I intend to reinstall Ubuntu, should I use the ISO downloadable from the DELL website (https://www.dell.com/support/home/it-it/drivers/osiso) or proceed with the latest LTS version available?
  • The DELL ISO use the OEM Kernel where the problem was fixed (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1936998) or I have to install OEM Kernel in my standard installation?
  • Any other suggestions?

Thanks in advance!

@pindi , here's my current configuration steps:

* Use the Ubuntu OEM kernel packages.  This is what the Dell ISO includes, but you can also install it just through apt-get.  I'm using linux-oem-20.04c which is based on the 5.13 kernel line.
* Additional Dell OEM packages include oem-somerville-meta and oem-somerville-factory-meta

Now, the OEM kernels include the fix referenced in launchpad.  However, if your NVM checksum is already bad, it will still return the error message when trying to load the e1000e module.  As far as I can tell, this problem is caused either by a bad version of the driver trying to write the checksum in some kernels or a factory error programming the NIC.

My current workaround is to manually patch the driver kernel module using the patches from the original kernel bug report https://bugzilla.kernel.org/show_bug.cgi?id=213667 .  To keep it up to date, I made a DKMS script to re-build it.  However, I hacked together this process so don't currently have any step-by-step instructions.

If I get it fully automated I'll try and get a gist or something on Github and link it here.  Good luck!

Hi @xerotope 

if I understand correctly, the bugfix released into these kernel versions is for resolve the stuck problem during OS boot, not for bypassing "The NVM Checksum Is Not Valid" error, right?

And for some reason, my old motherboard probably had a nic module with correct checksum, while the current one does not.

But is it possible in your opinion that a complete fix for this problem will be released soon? 

Unfortunately it does not even seem possible to act on the checksum with ethtool or bootutil64e (because write is probably blocked --> "Unable to write default configuration to EEPROM").

 

Yeah, the bugfix just stops it from hard locking, but the checksum is still checked and the module won't load.

I'm not sure about a timetable for a fix.  My company's IT team has been in contact with Dell and it's been elevated to some higher level Linux engineering team and confirmed.  But no path on a fix (other than USB-C ethernet dongles, yuck)

I also tried the ethtool/bootutil route, and it seems the consensus is the EEPROM is read-only, even if write protection is disabled in the module parameters.

So for now, patching the kernel module to bypass the checksum check entirely is my workaround.

Top Contributor
Latest Solutions