Hey all,
Work just upgraded my laptop to the new Precision 7560 running Ubuntu 20.04. It seemed to work fine the first day, but after upgrading the installed packages I stopped being able to boot. I'd get past GRUB and just get a blank screen.
Variations of passing `nomodeset` and `acpi=off` to the kernel helped a bit, but not reliably. After a couple OS recovery installs and trying the standard Ubuntu installer, I think I've isolated it to the Intel NIC. If I disable it in the BIOS, I can boot into X every time. This is under the 5.10.0-1034-oem kernel.
When I tried the standard Ubuntu installer (kernel 5.8.0), I was able to boot into X with the NIC enabled. However, the WIFI and Bluetooth devices are not supported! I have also updated to the latest BIOS 1.2.2 without any luck.
Edit to Add: The wired NIC worked fine under kernel 5.8.0 (and under 5.10 if it happens to boot succesffully), and all the Onboard Diagnostic tests pass.
I am able to view some error messages from the previous stalled boot using journactl. It looks like it's related to the device firmware loading/unloading.
Any suggestions would be appreciated! In the meantime I guess I'm running WiFi-only or maybe using a thunderbolt dock.
Thanks,
Dean
I had some luck using a custom compiled version of the e1000e module, using the 3.8.7 version from sourceforge.net. I did have to disable an Ubuntu version check and SecureBoot (until I enroll my own MOK). But it boots, and the wired NIC seems to work!
Found a bug report on the Linux Kernel bug tracker here: https://bugzilla.kernel.org/show_bug.cgi?id=213667
The included patches also seem to fix the issue. Hopefully those make it into the OEM kernel soon!
As an update, a fix did make it into the OEM kernel. So now it won't hang on boot trying to write a correct checksum to the NIC NVM. Yay! https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1936998
But, while it boots right up, it won't load the e1000e driver module if the NVM checksum is bad. Booo! I'm still running a custom patched e1000e module (through DKMS) until Dell and our cyber-support team sort this out.
Is there any hope that someone tries to fix it? I think we cannot expect the next Kernel patch when the NVM checksum is wrong.
Windows doesn't complain and the network card works, so probably it ignores the bad checksum by default.
Hi,
I have same identical problem on my Precision 7560 with Ubuntu 20.04.
The story: after first installation of Ubuntu 20.04 (and after update kernel to 5.11.0-38-generic) NIC work but Wi-Fi not (bloutooth was ok). After some search I have to remove file iwlwifi-ty-a0-gf-a0.pnvm and all was ok, except a problem which forced me to replace the motherboard.
After motehrboard replacement SBAM! NIC didn't work anymore: problem was
e1000e 0000:00:1f.6: The NVM Checksum Is Not Valid
e1000e: probe of 0000:00:1f.6 failed with error -5
Then, after upgrade on kernel 5.11.0-41-generic, I installed related dkms fix https://github.com/koljah-de/e1000e-dkms-debian/releases and after reboot NIC has returned to work.
Here the problem: not all system reboot now happen successfully: sometimes system doesn't boot (with errore BUG: soft lockup - CPU#1 stuck for 23s!) or start very slow. If force restart during this time instead (hold power button for 10 seconds), all go on and Ubuntu start as a usual, with NIC working as expected.
Note that every time I access the bios and exit, the next reboot failed, ALWAYS! This behavior does not allow to start an iso because the Ubuntu ISOs do not start for the same problem (not allowing me to test if the NIC works with a different version of Ubuntu).
At the moment, as suggested by xerotope, I am forced to disable NIC in the BIOS and work only with Wi-Fi module.
Now some questions:
Thanks in advance!
@pindi , here's my current configuration steps:
* Use the Ubuntu OEM kernel packages. This is what the Dell ISO includes, but you can also install it just through apt-get. I'm using linux-oem-20.04c which is based on the 5.13 kernel line.
* Additional Dell OEM packages include oem-somerville-meta and oem-somerville-factory-meta
Now, the OEM kernels include the fix referenced in launchpad. However, if your NVM checksum is already bad, it will still return the error message when trying to load the e1000e module. As far as I can tell, this problem is caused either by a bad version of the driver trying to write the checksum in some kernels or a factory error programming the NIC.
My current workaround is to manually patch the driver kernel module using the patches from the original kernel bug report https://bugzilla.kernel.org/show_bug.cgi?id=213667 . To keep it up to date, I made a DKMS script to re-build it. However, I hacked together this process so don't currently have any step-by-step instructions.
If I get it fully automated I'll try and get a gist or something on Github and link it here. Good luck!
Hi @xerotope
if I understand correctly, the bugfix released into these kernel versions is for resolve the stuck problem during OS boot, not for bypassing "The NVM Checksum Is Not Valid" error, right?
And for some reason, my old motherboard probably had a nic module with correct checksum, while the current one does not.
But is it possible in your opinion that a complete fix for this problem will be released soon?
Unfortunately it does not even seem possible to act on the checksum with ethtool or bootutil64e (because write is probably blocked --> "Unable to write default configuration to EEPROM").
Yeah, the bugfix just stops it from hard locking, but the checksum is still checked and the module won't load.
I'm not sure about a timetable for a fix. My company's IT team has been in contact with Dell and it's been elevated to some higher level Linux engineering team and confirmed. But no path on a fix (other than USB-C ethernet dongles, yuck)
I also tried the ethtool/bootutil route, and it seems the consensus is the EEPROM is read-only, even if write protection is disabled in the module parameters.
So for now, patching the kernel module to bypass the checksum check entirely is my workaround.