Start a Conversation

Solved!

Go to Solution

2770

November 17th, 2022 06:00

Aurora R12, RTX 3080, black screen under load

Aurora R12 Specs:

  • CPU: i7-11700f (liquid cooled)
  • GPU: AW RTX 3080
  • RAM: HyperX DDR4 3200MHz 2x8GB
  • PSU: 1000W

Been working through this issue for probably the better part of 2 months. I've tried a whole long list of things:

  1. Fresh OS install
  2. DDU and reinstall from Nvidia, Dell, Device Manager
  3. msconfig startup only Windows apps
  4. Uninstall iCue, AWCC
  5. TDR Delay changes
  6. Disabling hardware acceleration
  7. Setting power to high performance
  8. Disabling Gsync
  9. Disabling OC
  10. Trying XMP1, XMP2, and Disabling it
  11. Disabling/enabling Intel Speedstep, Speedshift
  12. MSI Afterburner undervolting, unchecking IO boxes
  13. New PSU (Tried corsair 1000w from Best Buy)
  14. New Motherboard (Dell RMA)
  15. New GPU (Dell RMA)

The mobo and GPU swaps were both from dell support. The new GPU was the "last resort" and it actually resolved the issue, but only for 3-4 days. Now I'm unable to run any game on the tower, and cannot complete any UNIGINE benchmarks either. So....what exactly happens?

  1. Game with high load kicks off
  2. GPU fans ramp up
  3. Screen starts flickering
  4. PC Freezes
  5. Screen goes black
  6. PC drops to idle state - all LEDS still on, keyboard light off, display off. Powers off with a simple press of the power button

It can take up to 20 tries to get the PC to reboot...which is why I'm wondering if it might be temp related. ANYWHO....WhoCrashed shows mainly VIDEO_TDR_ERROR and other GPU/Driver related errors. Event Viewer is mostly inconclusive but will sometimes show video issues too.

I spent some time debugging and monitoring with GPU-Z, and I noticed that PerfCap was showing Pwr/Vrel a ton when under load before the crashes. 

As far as temps, CPU is chilling at low 50's, GPU temps spike up to 75-80C under max load. I'm curious if the RMA replacement might have bad thermal paste/pads and some components are overheating. I might take it apart to check in the next few days.

To help confirm it might be GPU I tried a friends 3080 and the PC was running much better, but still felt somewhat unstable at least in Fortnite (FPS drops every 2-3 seconds from 100Hz to 30Hz, some lag). With their GPU, GPU-Z was showing a ton of Vop/Vrel issues moreso than Pwr. FH5 ran fine though, no issues.

So I'm just curious what is causing this. I've tried literally every option I've found online and maybe half a dozen not listed above. I don't want to keep having Dell support send parts out but they don't seem to want to take time to debug my system before just shipping parts out. Would love some help!

Audio crackling/high latency with my old 3080 and RMA'd 3080 (since day 1), but not with friends 3080.

Below is a capture from GPU-Z right before the screen flickers and the PC crashes to black screen/idle.

IMG_2373.PNG

 

15 Posts

November 30th, 2022 04:00

So after doing the case swap and adding a new motherboard my "R12" has been running seamlessly. I would honestly recommend anyone who has an alienware to budget in a new motherboard/case. The mobo and case design are pretty poor for regulating thermals, especially with some of the bigger card pre-builts (3070 and up). Also, you will definitely not miss the Alienware BIOS and having only 3 fan headers.

I've included photos of my old build/new build just for reference on how much extra space and airflow there are in standard cases. My case swap was to a 4000d Airflow with a z590 Tomahawk motherboard. I was able to reuse the Dell PSU with these extension cables. All other components (CPU, GPU, RAM, and even fans if you wanted) required no extra parts and were integrated easily. I'll also eventually replace the noctua with a chromax...but that's for another day. 

 IMG_2214.jpgNew Case (4000D Airflow)New Case (4000D Airflow)

6 Professor

 • 

6K Posts

November 17th, 2022 17:00

Use Afterbuner, set your GPU fans at manual 100% and try again.

15 Posts

November 21st, 2022 08:00

Tried it but no luck. Adjusted all the curves and was able to get temp down to low 70's, but now it will crash even @ 56C when kicking off the Superposition benchmark. Going to see if tech support will RMA it...Kinda sad I need to RMA 2 GPU's in less than a month. There's a whole forum here where multiple people are experiencing the same issue and it could point towards some design flaws in the 3080. Wouldn't be surprised if this is my case....

6 Professor

 • 

6K Posts

November 21st, 2022 08:00

I have an RTX3080 and had zero issues, and I use it every day in one form or another.

So I don't think it's a design flaw. But it could be temperature related. But if it crashes even at 56 Celsius than temperature is not an issue.

Your 12 volt is still within spec, but it is on the low side. What is the 12 volt at idle?

8 Wizard

 • 

17K Posts

November 21st, 2022 09:00


@Gallifrae wrote:
  1. Uninstall iCue, AWCC
  2. TDR Delay changes
  3. Disabling hardware acceleration
  4. Disabling Gsync
  5. Disabling OC
  6. Trying XMP1, XMP2, and Disabling it
  7. Disabling/enabling Intel Speedstep, Speedshift

 


OC, XMP1-2 should never have been enabled in the first place. Did you get a new Intel CPU on each motherboard swap?

And I understand you wanting to try everything, but these I listed above ... should not have been messed with.

It taking 20 tries to reboot (not sure exactly what that means, but it sounds bad).

Yes, I use AfterBurner on my RTX-3080 and it's rock-solid.

 Voltage remains locked (so, should be safe stock level).
Core and Memory clock stay at +0 (no OC, just stock clocks)
Power Limit reduced to 80%. Temp Limit reduced to 80c. Auto Fan Speed.

And some other settings:

In Windows-11, Game-Mode is ON. I set it to use the (Nvidia) "High Performance" dedicated video-card whenever possible (Steam, all games, etc.)
In Windows-11's Hardware Accelerated GPU Scheduling (HAGS) is Disabled.

In Nvidia Control Panel, I have "Fast Vertical Sync" Enabled (similar to having it off, but without the occasional tearing).

 

15 Posts

November 21st, 2022 09:00

I can take a look. What's curious is they did an RMA on my mobo the week before my GPU wet the bed because BIOS was corrupted.

Also curious that another 3080 runs fine, and I also tried a 3060ti running the same game and benchmark settings and it didn't show any sign of problems either.

thanks for the replies!

15 Posts

November 21st, 2022 09:00


@Tesla1856 wrote:

@Gallifrae wrote:
  1. Uninstall iCue, AWCC
  2. TDR Delay changes
  3. Disabling hardware acceleration
  4. Disabling Gsync
  5. Disabling OC
  6. Trying XMP1, XMP2, and Disabling it
  7. Disabling/enabling Intel Speedstep, Speedshift

 


OC, XMP1-2 should never have been enabled in the first place. Did you get a new Intel CPU on each motherboard swap?

And I understand you wanting to try everything, but these I listed above ... should not have been messed with.


OC is disabled and greyed out. As far as I know they kept the original CPU in the system. Makes sense the other items should never have been messed with, and it goes to show there's some instability (whether hardware or software). Just became somewhat desperate especially after already having 2 parts replaced - Anything I tried I found on another forum. Not ideal to try them all, but at least helps with process of elimination before trying to RMA.


@Tesla1856 wrote:

It taking 20 tries to reboot (not sure exactly what that means, but it sounds bad).


Basically after it black screens/artifacts...I'll hard reset the pc. Press power button - powers up/LEDs on does nothing for a very long time - no display, no bios, nothing. Press power button again - instant off. Rinse, repeat until it kicks in or wait 15min+. Easy to tell when it actually starts up because the fans don't just go to an auto-on speed.


@Tesla1856 wrote:

Yes, I use AfterBurner on my RTX-3080 and it's rock-solid.

 Voltage remains locked (so, should be safe stock level).
Core and Memory clock stay at +0 (no OC, just stock clocks)
Power Limit reduced to 80%. Temp Limit reduced to 80c. Auto Fan Speed.

And some other settings:

In Windows-11, Game-Mode is ON. I set it to use the (Nvidia) "High Performance" dedicated video-card whenever possible (Steam, all games, etc.)
In Windows-11's Hardware Accelerated GPU Scheduling (HAGS) is Disabled.

In Nvidia Control Panel, I have "Fast Vertical Sync" Enabled (similar to having it off, but without the occasional tearing).


I'll give these a shot and see if they help with stability.

6 Professor

 • 

6K Posts

November 21st, 2022 10:00

I would suggest once the new card is installed, to stress it with some benchmarks and watch the temperature. You likely will end up adjusting the fan curves for better thermals. I would suggest MSI Afterburner for that.

15 Posts

November 21st, 2022 10:00

Just finished up with a followup from Dell Tech Support and they sent me a list of things to try. After none of them resolve the issue they have determined the card is faulty and they're scheduling a replacement. Guess we'll see if this is the correct fix, probably should have it early next week after the holidays.

15 Posts

November 21st, 2022 11:00

Sounds good - I have a good fan profile through MSI/AWCC that helps a ton when the card actually wants to run.

I'm planning on stressing it a good amount when I get it - my last card started acting up as soon as 1 day after I got it.

8 Wizard

 • 

17K Posts

November 21st, 2022 13:00


@Gallifrae wrote:

 

I'm planning on stressing it a good amount when I get it - my last card started acting up as soon as 1 day after I got it.


That is good news.

But I swear ... how does this happen? One hypothetical but possible scenario is ...

Some user Over-Clocks their nice Nvidia card constantly and permanently damages it ... so it's only 95% good now and tends to fail under stress (likely forever ... as I think they crush-them, not take the time to actually re-solder/replace parts and fix them).

Dells swaps it out for a new one.

The old Nvidia card comes back to Dell. The technician drops it into an appropriate system and stress-tests it for a few minutes and it passes. They have a bunch of high-dollar video-cards to test so they pull-it and they deem it "good", and go-on to the next one.

You get it as your replacement . Since you assumed your "refurbished" video-card is 100% working ... and you are still having problems ... you start looking elsewhere in the system for problems.

Someone might even say "your system is blowing video-cards".

It's just crazy. Just a couple of flakey video-cards in the "refurbished pool" could cause all sorts of ramifications across many different machines and customers.

And all the time, this counts as failure trends and the costs of this Release of machine's Extended Warranty Contract skyrockets (due to all the perceaved failures).

6 Professor

 • 

6K Posts

November 21st, 2022 17:00

I have always wondered how they refurbish those cards. If they don't really push them hard for an hour or so, it will just end up being a merry go around...

15 Posts

November 22nd, 2022 12:00

Alright, update time!

 

Tech came out today for the RMA. Booted it up this morning just to check the issue is still repeatable, and it was. Went to boot up again one more time to check error reports and PSU clicked on and off. Removed CPU power connector and tried booting and PSU stayed on...leading me to believe it's either CPU or mobo. With my luck so far I rolled the dice that it was a mobo and went to Best Buy and picked up a MSI z590.  dropped it in a spare PC case I had (4000d airflow) and did a "case swap" of sorts (parted over CPU, PSU, RAM, drives). System booted up, was able to successfully run some benchmarks on the old card. 

Still RMA'd the GPU because it still seems somewhat unstable, but new one is in the spare PC case and all is well. No weird errors, DPC latency, or crashes under load. 

Been thinking about it the last few hours and I don't know if I'll ever RMA the mobo. The longer I've been into this build the more and more I don't like the restricted BIOS setup and lack of fan options (ML120 Pro's or Stock to avoid BIOS error), fan control, and settings. Plus the airflow in the R12 is HORRIBLE. My CPU idle (even with liquid cooling) dropped about 5C after the case swap. Under load I'm seeing 5-10C lower temps than in the R12 case. GPU is definitely benefiting from this too.

Later today I'm going to add some more fans and call it a night. Will test extensively over the coming days and report back. Hoping I'm finally in the clear....

8 Wizard

 • 

17K Posts

November 22nd, 2022 15:00


@Vanadiel wrote:

I have always wondered how they refurbish those cards.


Same here.

I could make more guesses (based on my experience in similar shops) but what we used to do isn't really relevant to what Dell does now ( in their shops).

 

8 Wizard

 • 

17K Posts

November 22nd, 2022 15:00


@Gallifrae wrote:

 

1. Still RMA'd the GPU because it still seems somewhat unstable, but new one is in the spare PC case and all is well. No weird errors, DPC latency, or crashes under load. 

2. picked up a MSI z590.  dropped it in a spare PC case I had (4000d airflow) and did a "case swap" of sorts (parted over CPU, PSU, RAM, drives). System booted up, was able to successfully


1. Good

2. Sounds similar to the MSI PRO Z690-A WIFI-DDR5 in my latest Intel i9-12900K / RTX-3080 custom-build. After 6 months, it's been rock-solid.

The x590 should be a good board also. Certainly does not have a "limited BIOS". 

No Events found!

Top