12 Elder

 • 

274.2K Posts

4413

October 21st, 2022 11:00

XPS 8930, underclocking Nvidia GPU as possible fix?

A fairly common issue I've found with both custom built as well as pre-built OEM PC's is the display driver nvlddmkm driver will stop responding within the 1 sec TDR delay set in the Windows registry which will cause the display to either go black on else flicker and recover. But then many users will still find they have to re-boot their system... or sometimes not.

What I don't want to spend too much time discussing is all the testing I've already done and also all the troubleshooting already performed. With 100% certainty if asked, did you try this, or did you do that, the response will be undeniably 'Yes. Been there, done that'.

The only thing that hasn't been tried is underclocking the Nvidia GPU. One might ask, who the heck would consider doing that? There's been users who've posted on other forums that it has worked (for them) to correct this issue.

It was my understanding Dell GPU cards are clocked slightly lower than the reference specs. Nvidia specs for my GPU clock are 1506 MHz and the GPU boost of 1708 MHz.

Perhaps trying MSI Afterburner to lower the Core clock to a negative (-) boost range may work

Both GPU-Z Tech Power Up and HDWInfo64 sensors read the graphics clock of 1911 MHz. I wouldn't consider messing around with undervoltaging since I'm informed enough to know of the possible cause of damage to the GPU. 

 Anyone tried underclocking and if so, did it work to fix black screens, and critical error 4101 "Display driver nvlddmkm stopped responding and has successfully recovered" ?     

HWInfo64HWInfo64GPU-Z Tech Power-UpGPU-Z Tech Power-Up       

       

         

12 Elder

 • 

274.2K Posts

October 22nd, 2022 09:00

Figured out how to get Snip it to work with TM. You need to point the mouse onto the desktop outside of the TM window. A main difference with TM is, I never see the dotted cut lines when dragging. When the mouse pointer is inside TM it doesn't work.  You learn something new every day.

 

running applications / background and Windows processes using GPU resources after start-up

GPU resources.JPGGPU Usage.JPGGPU Usage Resources.JPG

 

 

12 Elder

 • 

274.2K Posts

October 22nd, 2022 09:00

NVIDIA System Information report created on: 10/22/2022 11:57:39


[Display]
Operating System:    Windows 10 Home, 64-bit
DirectX version:    12.0
GPU processor:        NVIDIA GeForce GTX 1060 6GB
Driver version:        516.94
Driver Type:        DCH
Direct3D feature level:    12_1
CUDA Cores:        1280
Core clock:        1506 MHz
Memory data rate:    8.01 Gbps
Memory interface:    192-bit
Memory bandwidth:    192.19 GB/s
Total available graphics memory:    14239 MB
Dedicated video memory:    6144 MB GDDR5
System video memory:    0 MB
Shared system memory:    8095 MB
Video BIOS version:    86.06.6A.00.5F
IRQ:            Not used
Bus:            PCI Express x16 Gen3
Device Id:        10DE 1C03 11D71028
Part Number:        G410 0030

[Components]

nvui.dll        8.17.15.1694        NVIDIA User Experience Driver Component
nvxdplcy.dll        8.17.15.1694        NVIDIA User Experience Driver Component
nvxdbat.dll        8.17.15.1694        NVIDIA User Experience Driver Component
nvxdapix.dll        8.17.15.1694        NVIDIA User Experience Driver Component
NVCPL.DLL        8.17.15.1694        NVIDIA User Experience Driver Component
nvCplUIR.dll        8.1.940.0        NVIDIA Control Panel
nvCplUI.exe        8.1.940.0        NVIDIA Control Panel
nvWSSR.dll        31.0.15.1694        NVIDIA Workstation Server
nvWSS.dll        31.0.15.1694        NVIDIA Workstation Server
nvViTvSR.dll        31.0.15.1694        NVIDIA Video Server
nvViTvS.dll        31.0.15.1694        NVIDIA Video Server
nvLicensingS.dll        6.14.15.1694        NVIDIA Licensing Server
nvDevToolSR.dll        31.0.15.1694        NVIDIA Licensing Server
nvDevToolS.dll        31.0.15.1694        NVIDIA 3D Settings Server
nvDispSR.dll        31.0.15.1694        NVIDIA Display Server
nvDispS.dll        31.0.15.1694        NVIDIA Display Server
NVCUDA64.DLL        31.0.15.1694        NVIDIA CUDA 11.7.101 driver
nvGameSR.dll        31.0.15.1694        NVIDIA 3D Settings Server
nvGameS.dll        31.0.15.1694        NVIDIA 3D Settings Server












































 

===================================================

 

I have the same global settings in Nivida CP ...exactly.  just didn't capture every single setting in 2 screen shots to save digital ink

 

Nvidia Control Panel Global Settings_Pg 1.JPGNvidia Control Panel Global Settings_Pg 2 .JPG

 

 

October 22nd, 2022 09:00

This is a system issue unrelated to the video card sense it's getting a load

With a video card swap use a usb boot device with a backup reinstall if possible if not your gonna have to start anew as suggested before

12 Elder

 • 

274.2K Posts

October 22nd, 2022 10:00



@Tesla1856 wrote:

Well, definitely some strange-ness going on there with your system.

It's most likely your Windows or it's drivers.


I ruled out the OS

Also Windows drivers . . . Been there, done that. I doubt there are any corrupted files

 


@Tesla1856 wrote:

What Nvidia driver are you running? On my Nvidia GTX-1070 cards ... since they are not new and several generations behind now, I run on both ... Nvidia GTX-1070 Driver is still v457.51 WHQL (DCH) .


Windows Update installed 512.15 (DCH) on August 7th. From the words of an "expert" on Nvidia's GeForce forum, not mine, 512.15 is the most common version Microsoft is pushing onto users.

After running 512.15, the system went for almost two whole months without an issue. The black screens came back late Sept. So I read up on the Nvidia May and August 2022 security bulletins, and did a clean uninstall with DDU in safe mode. Downloaded the next branch version which is the recommended version from the August bulletin addressing all CVE's - v516.94 DCH from nvidia.com. Selected clean install with no other drivers except game ready graphics and with internet disconnected. 10 days later - black screens. That's basically a summary cap. I did try an older version than 512.15. It didn't work either   

 

 

     
 

 

12 Elder

 • 

274.2K Posts

October 22nd, 2022 10:00


@Tesla1856 wrote:

Yeah, until you can figure it out ... MSI-AfterBurner should be able to be used to set it back to stock-clocks (and you can set it to load Config #1 on Windows boot).

 


I'm not familiar with "loading configuration #1 on Windows boot". Can you explain ?

 

12 Elder

 • 

274.2K Posts

October 22nd, 2022 10:00


@BrandonGainey wrote:

This is a system issue unrelated to the video card sense it's getting a load


How have you arrived at this is a system issue and that its unrelated to the video card ?

 


@BrandonGainey wrote:

With a video card swap use a usb boot device with a backup reinstall if possible if not your gonna have to start anew as suggested before


Dell actually replaced this graphics card back in April. Its pretty safe to rule out there is anything wrong with the Dell replacement graphics card or that there was anything wrong with the original Dell graphics card

Swap out the card and then re-install the OS ? Again, I'm not sure I'm understanding that correctly. I've performed three Windows re-installations since April.

 

I'm not following what  "your going to have to start anew as suggested above" means ?

 

10 Wizard

 • 

17.7K Posts

 • 

70.5K Points

October 22nd, 2022 12:00


@Anonymous wrote:


@Tesla1856 wrote:

Well, definitely some strange-ness going on there with your system.

It's most likely your Windows or it's drivers.


1. I ruled out the OS. Also Windows drivers . . . Been there, done that. I doubt there are any corrupted files

 


@Tesla1856 wrote:

What Nvidia driver are you running? On my Nvidia GTX-1070 cards ... since they are not new and several generations behind now, I run on both ... Nvidia GTX-1070 Driver is still v457.51 WHQL (DCH) .


2. Windows Update installed 512.15 (DCH) on August 7th. From the words of an "expert" on Nvidia's GeForce forum, not mine, 512.15 is the most common version Microsoft is pushing onto users.

 

 


1. Must be nice. Personally, I've never had that luxury. I only "rule out" problem-vectors after I get it working 100% properly.

2. I'm not an Expert. Just a user who just so happens to be familiar with using older (but still capable) GPUs in older (but still capable) computer systems ... but on the latest OS.

AFAIK, a Nvidia v5xx.xx driver is quite-a-bit different from a v4xx.xx one.

10 Wizard

 • 

17.7K Posts

 • 

70.5K Points

October 22nd, 2022 12:00


@Anonymous wrote:

@Tesla1856 wrote:

Yeah, until you can figure it out ... MSI-AfterBurner should be able to be used to set it back to stock-clocks (and you can set it to load Config #1 on Windows boot).

 


I'm not familiar with "loading configuration #1 on Windows boot". Can you explain ?

 


I suggest you read the documentation that comes with MSI-AfterBurner that explains how to configure it. 

12 Elder

 • 

274.2K Posts

October 22nd, 2022 14:00

@Tesla1856 

First of all, it sounded like maybe my comment possibly came off a bit as being overly confident or that I was casting  aspersions about your experience and knowledge about OS's and the system's drivers. It wasn't meant to be taken that way and.i apologize if I sounded like it was. It's only that I've been able to successfully rule out those after 8 months

I also want to say that I read everything that gets posted on my threads. Referring to the first comment you posted on this thread with the link you included,

https://www.dell.com/community/Alienware-Desktops/Aurora-R6-hardware-error-Nvidia-GTX-1070/m-p/8279402#M63101

I should mention I did exactly every step you also performed in that thread (two times in fact), re-seaing the card. We must think alike since all of the steps I did were identical to yours.. Unfortunately for me, those attempts never worked

You also asked a user in that linked thread the question "maybe I have a fundamental misunderstanding of how hardware, software and Operating Systems work together.

Let me ask you ... when YOU observe a possible hardware problem with a computer, where do YOU start looking for problems?"

I too found like @Vandual that looking at the sensors from HWinfo64 is the first place.to look, i.e, temperatures, voltages and flags, I found back around March and April many of the GPU PCLe +12 V Input voltages, and the GPU 6-pin Input voltages we're below the threshold of +11.72 V 

The low readings on 6-pin GPU had me wondering whether the PSU was failing since this is power straight from the PSU on the 6- pin. The GPU  PCLe +12 V Input had caused me to wonder whether the there was bad power from the motherboard or if the card was just bad. But the caps at least all checked out fine under magnification, i.e no bulging, cracking, etc.

These sensor readings were logged when performing benchmark testing without placing any tremendous stress at all on the GPU... Never crashing the system

Then there were thermal 'performance limit readings throwing flags as well. Temp issues can happen with all various Nvidia drivers, but not likely with v472.12 when I was running that back then in April

The fan speed % was only reaching 67%, so that was another thing I has planned to do with MSI afterburner.  Bring it up to 75% and establish a fan curve when the card hits 80°C, curve it so it's 100% @ 87° C

When talking about Reliability Monitor reported errors, Aren't they pretty much just that?  How much info does it tell you compared to these type of sensor readings?.Not really a lot by comparison, in my opinion.

SFC, DISM commands, Chkdsk r/f are all reliable tools to fix a lot of issues. Sysnative even has a tool that has been found to fix some system driver corruption that the SFC scannow can't even do. Its proven to be successful once for me

I didn't even touch on all the other troubleshooting. For example, the Memory sticks tested with Memtest+86 all passed. 

After 8 months of diligently searching,for an answer to a fairly common issue,  I didn't think a shot with MSI Afterburner would be the fix- all solution to this, but I also didn't think it would hurt to try 

12 Elder

 • 

274.2K Posts

October 22nd, 2022 17:00

only hope now is hearing some refreshing explanation as to the graphics clock, fan speed, and engine load reaching the levels circled. If there's a better idea than tweaking down the clock to the stock reference level 1506 MHz, I'm all ears to that as well 

just found out voodoo witch doctors don't perform exorcisms on weekends . . . something to do with union rules

GPU-Z TechPowerUp.jpg

October 22nd, 2022 19:00

While most of the info in this post is over my head, I'm wondering if the problem I am currently experiencing is related.  AFTER my computer enters sleep mode, it never gets anything back on the screen.  I can move the mouse and hit a key on the keyboard and the keyboard lights up like it is waking up, but the screen is black.  I have to reboot to get it back.

These are in the event log - have no idea what they mean but saw a reference to the error in this post.

  • Display driver nvlddmkm stopped responding and has successfully recovered.
  • The description for Event ID 0 from source nvlddmkm cannot be found.
  • The system firmware has changed the processor's memory type range registers (MTRRs) across a sleep state transition (S4).

I spent 2 hours with tech support running every test they have access to to no avail. I asked if the video driver might be a problem because this started when I did a BIOS update 2 days ago.  The date on my DELL video driver is 7/22 and I noticed NVIDIA has a much newer one.  Tech support says that's not the problem.

So if this is not related to this post, kindly let me know and I will start a new one.

PS - one more oddity:  when I hold in the power button to force a shutdown, the light goes off.  Then 2 seconds later, the power light comes on, the DVD light comes on, they both turn off and the machine is down.  I don't recall seeing that sequence before.....  

10 Wizard

 • 

17.7K Posts

 • 

70.5K Points

October 22nd, 2022 22:00


@Anonymous wrote:

@Tesla1856 

1. First of all, it sounded like maybe my comment possibly came off a bit as being overly confident or that I was casting  aspersions about your experience and knowledge about OS's and the system's drivers. It wasn't meant to be taken that way and.i apologize if I sounded like it was. It's only that I've been able to successfully rule out those after 8 months

2. I also want to say that I read everything that gets posted on my threads. Referring to the first comment you posted on this thread with the link you included,

https://www.dell.com/community/Alienware-Desktops/Aurora-R6-hardware-error-Nvidia-GTX-1070/m-p/8279402#M63101

3. Let me ask you ... when YOU observe a possible hardware problem with a computer, where do YOU start looking for problems?"

4. I too found like @Vandual that looking at the sensors from HWinfo64 is the first place.to look, i.e, temperatures, voltages and flags, I found back around March and April many of the GPU PCLe +12 V Input voltages, and the GPU 6-pin Input voltages we're below the threshold of +11.72 V 

5. When talking about Reliability Monitor reported errors, Aren't they pretty much just that?  How much info does it tell you compared to these type of sensor readings? Not really a lot by comparison, in my opinion.

6. I didn't think a shot with MSI Afterburner would be the fix- all solution to this, but I also didn't think it would hurt to try 


1. Thanks. Yeah, I'm pretty accustomed to it by now (I've been a Rockstar, helping here for about 10 years now).

2. Good. Too bad it didn't help. But to be clear ... my symptoms were a little different than yours. While they both include the screen going black for a few seconds, in the Reliability Monitor mine said:
- Hardware Error
While yours likely said something like:
- Display driver nvlddmkm stopped responding and has successfully recovered.

Pretty different. Mine apparently had a poor connection after 5 years.

3. That's what I get for asking rhetorical questions

4. Yes, I use programs like that also. All sensor readings were normal.

5. The OS (Windows) is in control. The Reliability Monitor is a simple way of viewing problems Windows encountered while trying to run the computer properly. 

If you like, you can ignore it ... and look at the sensors only.

6. Until you figure-out the actual cause, I can't think of another way to lock the Nvidia card's operating parameters ... can you?

10 Wizard

 • 

17.7K Posts

 • 

70.5K Points

October 22nd, 2022 22:00


@gobble-de-gook wrote:

 

So if this is not related to this post, kindly let me know and I will start a new one.

 


It might be "related" somehow, but you really should start your own thread.

Repeat what you said above, but also include your computer make and model, version of Windows, and some system specs (like exactly what make/model of video card you have).

 

12 Elder

 • 

274.2K Posts

October 23rd, 2022 07:00

@Tesla1856 

2. I 'm just guessing you might've forgotten the time stamped sequence of critical error occurrences when you were investigating your black screens.

Do you also remember looking at the system logs that listed the critical errors in Event Viewer ? 

Critical error 4101 "Display Driver nvlddmkm stopped responding and successfully recovered",  had to have been logging in your system. With your black screen occurrences, that error just precedes the Windows hardware error kernel live event in Reliability Monitor

What's going on is Windows gives the graphics card a time to complete an operation. That time is called TDR - "Timeout Detection and Recovery" set by default in the registry.

When the GPU begins underperforming, Windows gives the GPU that default set time to recover. When the time elapses, Windows will go to try to reset and recover.  Since the time has elapsed, critical error 4101 is already logged in Event Viewer

So if you remember, the time when the Windows Hardware error in RM happened, has occurred after Windows had tried to recover and reset the display driver.

This event then sets off the Windows Hardware error(s) with live kernel events 117, or 141,  etc in RM. Basically, that isn't telling me other than there is a hardware issue, which may or may not be the case but not the logical root cause of these errors. Obviously, something associated with the card, and not caused by the operating system

I never mentioned this in this thread. I too receive those Windows - Hardware Errors and live kernel events in RM. So it's the same issue as yours, just a different root cause

https://learn.microsoft.com/en-us/windows-hardware/drivers/whea/introduction-to-the-windows-hardware-error-architecture

 

Take care

 

10 Wizard

 • 

17.7K Posts

 • 

70.5K Points

October 23rd, 2022 08:00


@Anonymous wrote:

 

Do you also remember looking at the system logs that listed the critical errors in Event Viewer ? 

Critical error 4101 "Display Driver nvlddmkm stopped responding and successfully recovered",  had to have been logging in your system. With your black screen occurrences, that error just precedes the Windows hardware error kernel live event in Reliability Monitor

 

 


Yes, I did ... and no, it was not that I saw (as I already posted).

I think maybe because I was just using Google Chrome, so only the Intel-IGP was being used. I think the Nvidia card was idle. 

No Events found!

Top