Start a Conversation

Unsolved

I

16 Posts

9471

August 6th, 2022 00:00

XPS 8940, intermittent freeze, possible solution

XPS 8940

XPS 8940

Hi all,

Like many other owners of an XPS 8940/50 with an NVidia RTX (3060Ti), I’ve been having problems with the machine randomly freezing every 2-5 days for many many months. I have to hold down the power button to reset and on the rare occasions where I’ve had a mini dmp, the error is always VIDEO_TDR_TIMEOUT. If I disable the nvidia and run purely (multi mon) on the Intel mobo gfx, there is no lockup. Using Bios 2.8 and obviously latest drivers and Win 11 updates.

Dell support have been responsive, but for those of us who’ve been in IT (dev) for several decades, it can be a painful process. I’ve had replacement (refurb yuck) mobo and gfx card to no avail and the only other step to take which I’ve resisted as it didn’t seem to have worked for anyone was Win 11 reinstallation (which would be win 10 then upgrade again). 

As a desperate last hope, I determined to clear out as many warnings / errors from EventViewer as possible. This was when I found that if I actually do a Shutdown as opposed to a Restart, I ended up with EventID 14 from source nvlddmkm. Watching the startup I could see a screen flicker when this happened and caused the nvidia to instigate a recovery. This is essentially the same timeout issue which appears to precede the hang, so resolving this seemed likely to be important.

Following this rabbit hole led to an interesting point that stated this could mean there was some sort of device contention problem. Turning off the onboard Intel gfx in the bios however (not recommended) didn’t help with getting rid of the event error however. I pondered the fact that this was only happening after a full Shutdown and (perhaps incorrectly!) wondered if it could be related to something I’d seen in passing on one of the many web sites I’d been digging through - Windows Fast Startup.

Once I finally found the option to disable this (see below) and did my 400th shutdown and start for the day (aaaaargh) I was blessed with a clean bill of health from EventViewer! Something within the Fast Startup mechanism was causing it - perhaps someone with more energy left can figure out the details for interest:)

So having resolved this, the machine has now been up for 6 days and counting, which is longer than any previous good spell. I’m hesitant to say the war is won, but it just might be and perhaps it’s worth letting the rest of you poor forlorn ‘freezers’ give it a go

Hope this helps - let me know how you get on, instructions below.

(@Dell - if this is the fix, do I get a reward?, I think I’ve just saved you over $100k 🥳 hehe)

  1. Press Win + R to open Run.
  2. Type control and click OK to open the Control Panel.
  3. Go to System and Security and then click on Power Options.
  4. In the left pane, click on Choose what the power buttons do.
  5. Next, click the Change settings that are currently unavailable link.
  6. Under the Shutdown settings section, uncheck the Turn on fast startup option to turn it off.
  7. Click Save changes to apply the changes.

4 Operator

 • 

1.7K Posts

August 6th, 2022 05:00

Well, let me tell you that I've done what you have and to no avail, long ago.

First of all, I used "powercfg.exe /hibernate off" in Windows Administrator CMD to turn if off LONG ago. Because of this, you can't even see that option in the Power Options:

ispalten_0-1659785754398.png

Of course, to see it and be able to set it, you have to turn it on again and open the Power Settings again:

ispalten_2-1659785938380.png

 

ispalten_1-1659785915702.png

So you can now see it, and as above, I have it off.

Secondly, I never reboot other than when Windows Update or a removal of a program requires it. I Shutdown every day and power up the next day.

I too have been through the Dell Support 'actions'.

  • Tried various Nvidia drivers, beta ones not released by Dell, Nvidia Game Ready and Studio Drivers and many versions of them all.
  • Had Motherboard AND RTX2060 card replaced.
  • Re-installed Windows 11 using the Windows Creation Media.
  • Locating and installing Dell programs that came with the XPS S.E., such as the CyberLink Media Suite and other needed programs.
  • Used every BIOS Version past V2.3.0, all Lock up the XPS , with V2.4.0 being the worst as it is it seemed every day.
  • Scoured the Even Viewer and NEVER found and entries for the Lock-up time (easy to tell it you see the desktop, the clock on the lower right stops at the time it happened).
  • Never got a .DMP file.

"Screen Flicker"? Not sure what you mean, but initially I did have the Desktop go black for a few seconds right after booting and it first appeared. This actually was the reason my video card and motherboard was replaced (with an old version of the original one that came with the XPS). After that happened, all was OK... but it happened on every boot.

Later (months) with Dell suggestion everything under the sun and the h/w replacement, I actually discovered if I used DDU to completely remove all Nvidia files, and reinstall the Driver, Control Panel, and GeForce Experience that fixed it.

The problem was actually how the Dell Image had something in it with respect to Nvidia that caused it. I got the idea to do that as the Intel HD750 when I disabled the Nvidia card I didn't have the problem. Dell had me move and exchange monitors even as they tried blaming it on the monitor as well (also a Dell).

By the way, the use of DDU was before BIOS V2.4.0 appeared... which what kicked off the lockups.

Now back to you theory.

I'll assume you are on V2.8.0. If so, I have had as many as 15 days on it with NO LOCKUP. I've had it as short as 4 days as well.

Here, look at what my Reliability Monitor shows, last Error 41 in the Event Viewer (Windows was not properly shut down) was on 7/28. before that, 7//14, as I had to use the Power Button to recover.

ispalten_3-1659787239704.png

Oh, all those red x's on the top line... some is caused by Support Assist, Dell still has a problem with it, others are an MS problem with the Xbox code which I need to run MS Flight Simulator 2020. Does no harm.

If you feel you've discovered the FIX, I suggest you go back to BIOS V2.4.0 and see if that doesn't lockup.

So, in closing, I've had Fast Startup OFF for a very long time, and I always (well, sometimes I do have to leave it on for some program) Shutdown every day.

May I also ask you (my configuration in parenthesis after the request):

  • What RAM you have, how much (1 DIMM, 16GB).
  • Make and model of DIMMs (Dell Supplied Samsung).
  • If Single or Dual Rank (CPU-Z will show you this) (Single Rank)?
  • CPU Intel version (i7 11700)? 
  • XPS model, S.E. or Standard (S.E).
  • Any additions or alterations you made (External USB 4TB drive)?
  • Is your SSD set to AHCI or RAID (AHCI)?
  • Nvidia driver # and version, Game Ready or Studio (Game Ready, .516.59, 6/28/2022)
  • Nvidia card RAM amount (6GB).

I suspect if on V2.8.0 it will happen again though, it just will take some time.

16 Posts

August 6th, 2022 07:00

Hi Irv!

oh no, there goes my reward lol! I’ll be so disappointed if it freezes again, I can’t take any more tinkering

I’ve left the machine on this weekend so interesting to see if it’s still alive and if it makes it through another week. I’d already been on bios 2.8 for a while and had regular freezes and also done multiple DDU runs from safe boot, but none of that helped.
It would be good to hear from someone who knows about the workings of Fast Startup as it was interacting even from a cold boot (shutdown) on startup (which I suppose makes sense). But it was clearly causing issues with nvidia startup.

memory: 16GB, 8Gx2, DDR4, 2933MHz - not sure of rank yet, will have to try cpuz when back at machine.

cpu: i9-11900K

Gfx: NVIDIA GeForce RTX 3060 Ti 8GB GDDR

SSD set to raid (for some unknown reason?)

500W power supply

tried both game ready and studio drivers.

no external bits.

(I have removed the intel dynamic tuning firmware related to temp also I recall as it was causing a warning in event viewer - had to put in some reg entries to stop windows replacing it)

fingers crossed  

cheers

 

 

 

 

4 Operator

 • 

1.7K Posts

August 6th, 2022 07:00

@inoodle 

"oh no, there goes my reward lol!"

Only if I am right However you chances of collecting were slim anyway 🤨

"It would be good to hear from someone who knows about the workings of Fast Startup as it was interacting even from a cold boot (shutdown) on startup (which I suppose makes sense)."

Well, here is a link that explains it, https://www.howtogeek.com/243901/the-pros-and-cons-of-windows-10s-fast-startup-mode/ .

That is a hold over from when booting a PC, with mechanical hard drives was slow. With today's SSD's and faster CPU's and RAM it is not really needed.

It is a form of Hybernation and the PC is never fully powered down.

I didn't exhibit any after effects of Fast Boot though when I had it on.

If you got 2x8GB from Dell, it is Single Rank I'm sure.

Since you have a K series CPU, did you modify any stock settings? Not sure the Dell BIOS will allow that though?

Oh, your SSD will perform faster if you change to AHCI. Can search this forum and find it or use this data in the beginning of this link, https://gist.github.com/chenxiaolong/4beec93c464639a19ad82eeccc828c63 

There are many of us trying various way and trading thoughts on the 'fix' and 'why' this is happening. Every though is drawing a blank.

Surprised you found something in your Event Viewer, no one else has, but you also had a different problem as well it seems, related to the Nvidia driver. Might be related to a Linux user who also captured an Nvidia related message that the card was 'falling off the bus'. Many reports of that in the Nvidia forums as well. Check this out, https://forums.developer.nvidia.com/t/gpu-fallen-off-bus/215675 and there are many reports like that. This however could be a Linux only problem.

4 Operator

 • 

1.7K Posts

August 6th, 2022 08:00

Looking at your post a little closer I see this:

" Watching the startup I could see a screen flicker when this happened and caused the nvidia to instigate a recovery. This is essentially the same timeout issue which appears to precede the hang, so resolving this seemed likely to be important."

Nothing like that I had seen, nor recall others reporting that specifically.

I Google'd "VIDEO_TDR_TIMEOUT" and the event ID.

Check these links, some may not apply, and most point to a Driver problem though:

https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0x117---video-tdr-timeout-detected 

https://windowsreport.com/video-tdr-timeout-detected-windows-10/ 

https://www.fileinspect.com/blog/fix-video-tdr-timeout-detected/  (I'd do what Section 8 suggests)

Those are a few that I found, but they all are about the same.

Your subject calls it an 'intermittent freeze', however the problems most here have is best described as a 'solid lock up'. A 'freeze' would be temporary and resume shortly. Semantics maybe, but it is more describing of the problem failure.

4 Operator

 • 

1.7K Posts

August 6th, 2022 11:00

@inoodle 

I am pretty sure you've got a different problem than those with the hard lock-up. No one has found any Event Viewer entries, seen a Dialog box, or had a BSOD, or even a .DMP file on  the Lock up. Sort of points down low in the OS that it didn't have the capability (or time) to log anything. All 'we' see is the Event 41 for not shutting down correctly in the Event Viewer.

Your 'flicker' seems to support that you have a different problem too as I don't think anyone else has reported that?

I don't find it hard to imagine Dell can't fix it immediately. To REALLY be able to get a  good handle on it, especially if there is no 'clue' or error code/DMP to look at, is to recreate it. Usually with a Debug PC connected to it or some s/w tool. As far as I know they have not recreated it.

They did want to 'capture' my XPS, but what I'd get and what I had to do with my XPS were terms I could not accept.

Of course, one big problem is no one knows how to recreate it? Nor how long it will take to happen.

So I suspect they have to look at the source code for BIOS 2.3.0 and  2.4.0 and work from that on the changes. (CVE's it seems). I am not even sure Dell 'owns' that code? Usually the motherboard maker will handle that code, and Dell might spec/design the motherboard but not produce it. A third part does.

I would rule out the Nvidia cards as for most of us, just going back to BIOS V2.3.0 'fixes' it.

Since each BIOS release seems to work 'longer', I would think it is safe to think they are working on it.

I would run the SFC and DISM commands to be sure the OS doesn't have corrupted files.

16 Posts

August 6th, 2022 11:00

Thanks for the info, will definitely look into the AHCI switch! To be honest I hadn’t even realised I’d got the K variant so not done any tweaks

So the VIDEO_TDR_TIMEOUT only came up from WinDbg, and stopped at some point when the Dmp was no longer written. The freeze has always been the same power off required issue, but interesting that it has evolved slightly as I’ve moved to different mobo / gfx card with refurb replacements.

The flicker at startup after shutdown was pretty much as you’d imagine considering the video card was timing out and restarting, so black screen for a second then back to the login screen. 

It could be that the TDR event log has always been secondary to the actual problem and is essentially caused by the freeze, in which case fixing this startup issue won’t have helped at all sadly!

I do find it hard to imagine that dell haven’t found the root cause considering so many people seem to be affected. The worry then is that they do know what it is but know it’s actually not resolvable without replacing with different hardware and so are only trying to minimise the occurrence

I’m still living in hope that I can get through next week without a crash

16 Posts

August 6th, 2022 12:00

I don’t agree it’s a different problem - I’ve not mentioned a dialogue box? Or a blusecreen ? It’s a full freeze, the only difference is it did a few months ago manage to write a dmp, which may or may not be useful - it no longer manages to do that. However if I run with only intel gfx, it’s perfectly fine, which is exactly the same as everyone else no?

4 Operator

 • 

1.7K Posts

August 6th, 2022 12:00

@inoodle 

OK, my mistake, I thought the main problem was the 'flicker' after boot?

I was also stating what no one who has the  problem has seen.

However, this is also what you first posted:

================

I’ve been having problems with the machine randomly freezing every 2-5 days for many many months. I have to hold down the power button to reset and on the rare occasions where I’ve had a mini dmp, the error is always VIDEO_TDR_TIMEOUT.

=========================

No one else reported it that way? The error indicates a driver problem or OS problem by the links I posted.

I do see when I have to power off and then on after a lock-up, I have to press the power button twice. First time the power button comes on, the CD light flicker's and then both turn off. Press it again (no need to hold it down long) and Windows comes up as it normally would. I assume the first time the BIOS is resetting the H/W that was probably 'stuck' in an odd state.

16 Posts

August 6th, 2022 14:00

Thanks Irv, 
yeah my power off is the same as you describe. When it used to write a partial mini dmp (It wasn’t complete afaik), the only way I could see this was to shutdown as described as it was frozen, then on restart, connect to the dmp using windbg preview app to see what was written - no dialogs were involved at freeze time.

Anyhow, let’s keep fingers crossed for a fix from some avenue as lots of people seem affected. 
I’ll update next week to say if it’s still running

16 Posts

August 8th, 2022 15:00

Just a quick update, my machine has been up for over 7days now - this is a record, so the ‘fix’ has had some effect so far. Let’s see if it can make it through the week

4 Operator

 • 

1.7K Posts

August 8th, 2022 16:00

Can you provide some info on your 8940 please?

  • Amount of RAM?
  • How Many DIMM's?
  • What physical Slots are they in (easy to see if the case is open or in the BIOS).
  • What are the RANK's of all the DIMM's (CPU-Z will show you this on the SPD tab).
  • Did you get ALL the RAM with the 8940 or did you add some?
  • Make/Manufacturer of the DIMM's (CPU-Z will show this)?
  • Have you made ANY BIOS changes over the DEFAULT values?
  • S.E. or Std. 8940?
  • Did you add any H/W or External drives to the 8940 (this includes changing the keyboard/mouse)?

 

4 Operator

 • 

1.7K Posts

August 9th, 2022 06:00

@inoodle 

BIOS V2.9.0 was just released. No details other than CVE updates, not even the normal 'boilerplate' of 'and other fixes'.

If I were you, I'd stay where you are to see if it works OK. Then upgrade if you think was solved or you do fail.

Me, I've gone back the V2.4.0. I've upgraded my DIMMs to all DUAL Rank as one of the thought were the problem only occurs with Single Rank DIMMs.

On V2.8.0 I had my last lock up on 7/28, but I did swap out the RAM on 8/6. Still I have not had a lock up yet.

Going back to V2.4.0 should increase my chances of having a lock up in 3 days or less. Don't have on by then, I'll 'assume' that DIMM Rank does matter, and I'll move to V2.9.0.

If/when you upgrade, go into BIOS and verify all settings were retained, including not to allow Capsule Updates is OFF.

16 Posts

August 10th, 2022 16:00

Ok, soooo disaster struck after 10 days - you were right to be unconvinced @ispalten! However, it’s definitely different now - for the first time ever, the system rebooted itself after locking up, and wrote a full mini dmp. 
The problem is once again VIDEO_TDR_TIMEOUT and part of the dmp might indicate which monitor is involved. I do feel like this is some small progress as it’s  never stayed up this long before and the failure seems more controlled.

Not sure what to try next. Probably bios 2.9 and perhaps running with just one monitor.

bios settings have been default so far. Memory in 1/3 or 2/4, will clarify once I get cpuz.

 

cheers

 

4 Operator

 • 

1.7K Posts

August 10th, 2022 16:00

@inoodle 

I take no joy in having you fail.

However, I somehow feel your in a slightly different category.

VIDEO_TDR_TIMEOUT is a known error, and if you Google, you'll find many ways to fix it.

Also, those of us locking up, we never see any such error, nothing the Event Viewer, DMP file, BSOD, or even a auto-reboot. All are totally locked up and can't do anything but use the Power button off and then on again.

If you REALLY think you have the 'fix', I suggest go back to BIOS V2.4.0.  Lock ups can take a few days and sometimes it happens more than one. If nothing else, it will speed up discovery of the problem as you make tweaks to the PC.

CPU-Z will not show you the slots. It does label them as Slots 1 to 4, but that are really DIMM positions in the Channel 1 and 2. If you populated the White Clips, they will show as SLOT 1 and 2.

ispalten_0-1660175184740.png

While you are at it, can you fill in what you do have from the list I put up?

===================

  • Amount of RAM?
  • How Many DIMM's?
  • What physical Slots are they in (easy to see if the case is open or in the BIOS).
  • What are the RANK's of all the DIMM's (CPU-Z will show you this on the SPD tab).
  • Did you get ALL the RAM with the 8940 or did you add some?
  • Make/Manufacturer of the DIMM's (CPU-Z will show this)?
  • Have you made ANY BIOS changes over the DEFAULT values?
  • S.E. or Std. 8940?
  • Did you add any H/W or External drives to the 8940 (this includes changing the keyboard/mouse)?

=====================

There are so many different configurations and options, there has to be a 'key' here somewhere. Something that some people that have the problem have in terms of h/w?

Right now, as I said, I'm on V2.4.0 and in 3 days working fine. Used to be I couldn't go 2 days without a lock up. I'll stay like this for a week or two.

I swapped out the single 16GB Single Rank DIMM or 2 16GB Dual Rank DIMMs. All in the single Channel White Clips.

So I went from 16GB to 32GB's. Doubt that was going to fix anything but give me more performance in some cases. Going from Single Rank to Dual, that should also offer a small gain in performance.

Oh, I've not used the FAST STARTUP on ANY SSD I've had for a boot driver. Hardly needed as the savings are basically meant for Mechanical Drives. Also, Windows Update can have some problems if that is enabled (Google that too if you want).

4 Operator

 • 

1.7K Posts

August 10th, 2022 17:00

@inoodle 

BTW, this is also different some of us, "perhaps running with just one monitor."

That error could either be caused by one of the monitor's or driver for it, or, it started to crash before the real problem caused the reboot.

Just another reason why I think you have a different problem.

They the various web pages for fixes for that error, could help you out.

No Events found!

Top