18 Posts

May 30th, 2013 23:00

Well, I ran Memtest well into Pass #7 (17.5 hours) without errors.  Also attempted a procedure I found through Guildwars support page where I ran the game executable with a -repair argument added.  This also triggered a lock up.

Opened a ticket with NCSoft regarding the problem and uploaded a long text document generated by their proprietary diagnostic program.  I don't know if they'll help as I do not own the game.  (This is not my computer.)

Also attempted to verify the theory that this is a GuildWars only problem by installing another game and playing it.  Skyrim played beautifully on high settings at around 55 FPS for approximately an hour and a half before crash to desktop and system lock up.  While it's conceivable that that's a coincidence, I doubt it under the circumstances.

Any insight into this mysterious 85 degree temp reading I'm getting from multiple monitoring programs?

Any suggested next steps in troubleshooting?

8 Wizard

 • 

17.4K Posts

May 31st, 2013 00:00

You have two nVidia 295 cards ? ... that's quad-SLI. That's pushing things a bit.

Either pull one and re-test (trying each by itself) or pull both and drop in a known good card.

8 Wizard

 • 

17.4K Posts

May 31st, 2013 00:00

Try stress testing with OCCT (Power Supply test), Heaven Benchmak, FutureMark ... remove games from equation.

That sensor is likely either:

1. Chipset. Check chipset heat-sink temp with laser thermometer, check it's fan, feel it, etc. Maybe even clean and re-apply thermal paste.

2. A sensor around or under main processor. Clean dust and even go ahead and re-apply new thermal paste. Machine is old and that stiff doesn't last forever.

Also, re-seat all cable connections, add-in cards, ram DIMMs, etc.

2 Intern

 • 

406 Posts

May 31st, 2013 07:00

Looks like you're going to have to test each piece of hardware to find out what the issue is. Here is how to test each piece of hardware in your rig one by one to rule out problems... 
  
CPU Testing: Run IBT v2.54 with the stress level set to maximum and times to run set to 10 to rule out a faulty CPU.  
http://www.mediafire.com/download/azzprpvwkonowbv/IntelBurnTest+2.54%2811.0.1.005%29.zip 
   
RAM Testing: Run Memtest86+ v5.00 for ten passes to rule out faulty RAM. http://www.memtest.org/download/beta/500rc1/mt500rc1.usb.exe (You will need a USB Flash drive to install Memtest onto.)  
   
GPU Testing: Run 3dmark11 three times consecutively. (Use performance preset with display scaling mode set to stretched, test one card at a time with the other taken out of the system.) http://www.majorgeeks.com/files/details/3dmark_11.html 
  
HDD/SSD Testing: For HDD/SSD testing check your HDD/SSD manufacturers website for testing tools. e.g. For Western Digital HDD's use Data Lifeguard Diagnostics for Windows. For Intel SSD's use Intel Toolbox. 
  
PSU Testinghttp://forums.extremeoverclocking.com/t137886.html &  
http://www.hardwareheaven.com/guides/testingPSU/ (You will need a Digital Multimeter for testing your PSU. You want all three rails to be within ATX spec, ATX Specification: 12V: 11.40 to 12.60,  5V: 4.75 to 5.25,  3V: 3.135 to 3.465. If any of your rails hit the minimum spec I would replace the PSU with a new one.)   
HOW TO: Properly load your system when checking PSU voltages.    
#1 Download & Install Prime95   
#2 Download & Install EVGA OC Scanner X   
#3 Open both Prime95 & EVGA OC Scanner X. (Run Small FFT test for Prime95. Run stress test Furry E @ 1024x1024 with 8x AA. Run both of these tests at the same time while checking voltages.)

You can skip doing memtest since you've already done it. Good luck!

18 Posts

May 31st, 2013 13:00

No, there's only one 295 unit with the two GPU's set up in SLI.  I didn't mean to portray otherwise.

OCCT is an interesting tool that I had not heard of before.  Thank you for the tip.  But I'm not sure how to interpret its output.  I'm searching now for more documentation.  Ran it for one hour.  PSU voltages all flucuated slightly,but remained within +-5% tolerance except +12V which never moved from 9.20V.  I'm pretty far outside my experience with this, but my understanding has always been that a greater difference that +-5% equals a bad PSU.  I'll be following up in a bit with my handheld tester, but any input on that reading would be appreciated.  How is the tool intended to be used?  Is it intended to make a bad PSU simply fail altogether?  What constitutes a pass or fail?

2 Intern

 • 

406 Posts

May 31st, 2013 14:00

No, there's only one 295 unit with the two GPU's set up in SLI.  I didn't mean to portray otherwise.

 

OCCT is an interesting tool that I had not heard of before.  Thank you for the tip.  But I'm not sure how to interpret its output.  I'm searching now for more documentation.  Ran it for one hour.  PSU voltages all flucuated slightly,but remained within +-5% tolerance except +12V which never moved from 9.20V.  I'm pretty far outside my experience with this, but my understanding has always been that a greater difference that +-5% equals a bad PSU.  I'll be following up in a bit with my handheld tester, but any input on that reading would be appreciated.  How is the tool intended to be used?  Is it intended to make a bad PSU simply fail altogether?  What constitutes a pass or fail?

 

I wouldn't trust software to accurately read your psu voltages, you need to use a digital multimeter instead while having the system under a full load to get accurate readings. A 12v at 9.20v at full load would be a fail. 12V: 11.40 to 12.60,  5V: 4.75 to 5.25,  3V: 3.135 to 3.465. Anything lower than 11.40 for the 12V, 4.75 for the 5V & 3.135 for the 3V is considered to be a fail. I would replace the PSU even if you were getting the lowest passable values.

8 Wizard

 • 

17.4K Posts

May 31st, 2013 14:00

I wouldn't trust software to accurately read your psu voltages, you need to use a digital multimeter instead while having the system under a full load to get accurate readings. A 12v at 9.20v at full load would be a fail. 12V: 11.40 to 12.60,  5V: 4.75 to 5.25,  3V: 3.135 to 3.465. Anything lower than 11.40 for the 12V, 4.75 for the 5V & 3.135 for the 3V is considered to be a fail. I would replace the PSU even if you were getting the lowest passable values.

 
Yes, I'm surprised how many people try to troubleshoot computers without one. They think that if a power supply turns on a machine, it's a good PS. That simply is not the case.
 
Also, PS can "become un-stable" under load or max-load. Running at idle is not a good test.

8 Wizard

 • 

17.4K Posts

May 31st, 2013 14:00

OCCT is an interesting tool that I had not heard of before.  Thank you for the tip.  But I'm not sure how to interpret its output.  I'm searching now for more documentation.  Ran it for one hour.  PSU voltages all flucuated slightly,but remained within +-5% tolerance except +12V which never moved from 9.20V.  I'm pretty far outside my experience with this, but my understanding has always been that a greater difference that +-5% equals a bad PSU.  I'll be following up in a bit with my handheld tester, but any input on that reading would be appreciated.  How is the tool intended to be used?  Is it intended to make a bad PSU simply fail altogether?  What constitutes a pass or fail?

 
Yes, probably my favorite Stress Tester at the moment. It just runs everything at the same time. The only thing it doesn't do is load the disk-system, but you can run a Full Scan virus checker at the same time if you want.
 
If 12v is reading 9v when loaded down, that is a problem. Is it really or just a glitch I can't say. A Digital Power Supply tester might tell you but it will be tricky (not impossible) using that device while machine is under max load.
 
It appears to have a copy of CPUID HW-Monitor built-in. Meaning, it should read the same if run stand-alone.

8 Wizard

 • 

17.4K Posts

May 31st, 2013 14:00

1. How is the tool intended to be used?

 

2. Is it intended to make a bad PSU simply fail altogether?

 

3. What constitutes a pass or fail?

 
1. They way you are
 
2. No, just max-load the whole machine.
 
3. Machine doesn't shut-down or reboot, performance numbers look good, voltages stay within tolerances, etc.
 
Oh, and re-read all of our post carefully. We can only advise you ... you must do the real work. Listen to Sajin because he knows his stuff.

18 Posts

May 31st, 2013 15:00

Thanks for the tools and the knowledge.  I hadn't heard of IBT either before starting this project.  Always used Prime 95 if I wanted to verify CPU.

You obviously know what you're doing, so not second guessing you, but just curious.  I use Memtest 4.2 (stable) you advocate 5.0 beta.  Why?

I read further about OCCT after my previous post.  It seems that a failure is constituted by either a.) the PSU shutting down due to lack of capacity to handle the load, or b.) a computational error occurring in the course of crunching the numbers for the test, which will trigger the termination of the test by the software.

I've been using an all-in-one suite of hardware diagnostics from Eurosoft called QA+Win32.  Yes, I got it from where you think. No I don't work there.  I'm not sold on it being the best in class for the purpose, but I do like the fact that I can set it and go on to the next machine.  A few hours later, I check back and have a *reasonably* reliable report about the health of the various hardware components.  I'd be interested to see what if any knowledge or opinions you guys have on the product.

8 Wizard

 • 

17.4K Posts

May 31st, 2013 15:00

I wouldn't trust software to accurately read your psu voltages, you need to use a digital multimeter instead while having the system under a full load to get accurate readings. A 12v at 9.20v at full load would be a fail. 12V: 11.40 to 12.60,  5V: 4.75 to 5.25,  3V: 3.135 to 3.465. Anything lower than 11.40 for the 12V, 4.75 for the 5V & 3.135 for the 3V is considered to be a fail. I would replace the PSU even if you were getting the lowest passable values.

 
So I use this to check them all at the same time (with some load on them).
 
 
It measures (among other things):
 
12v1 (to 24 pin harness ... I'm guessing Rail-1)
and 12v2 (CPU harness and or video card PCIe harnesses ... Rails2-5)
 
I'm not sure what PS design these older AW machines were using, but the Aurora's and Area-51s for example have Five 12v rails. So, to check them all, you have to eventually plug-in all spare PCIe cables. Right?
 
Even better ... to check rails while under max-load ... When PCIe harnesses split into two (in parallel) ... connect one side to video card and the other into tester. I'll have to test that sometime. If the tester doesn't operate without the 20/24pin attached, seems like using the multimeter on that PCIe harness would be next best choice.

18 Posts

May 31st, 2013 15:00

Oh, and re-read all of our post carefully. We can only advise you ... you must do the real work. Listen to Sajin because he knows his stuff.

He does indeed.  You both do.  I've learned a lot already from this thread.  

18 Posts

May 31st, 2013 16:00

I've got a similar unit here.  CoolMax brand, but with same features.  I'm out of time for the project tonight, but I'll try it first in the morning, the get out the multimeter if it passes that.

I usually load the PSU just by having it spin up a HDD, but your approach sounds a lot more thorough.

Oh, and incidentally, I did some more searching about the mysterious 85 degree reading.  It seems that such readings pop up when there is on sensor at all for that reading in the rig.  The software tries to extrapolate a temperature based on resistance, frequency, voltage and temperature of nearby components, etc and can come with something wildly inaccurate.  The consensus seems to be to verify it with common sense visual and touch tests.  If fans are spinning, you don't burn your finger, and there are no other indications of extreme heat, you take the reading with a grain of salt.

2 Intern

 • 

406 Posts

May 31st, 2013 18:00

Thanks for the tools and the knowledge.  I hadn't heard of IBT either before starting this project.  Always used Prime 95 if I wanted to verify CPU.

 

You obviously know what you're doing, so not second guessing you, but just curious.  I use Memtest 4.2 (stable) you advocate 5.0 beta.  Why?

 

I read further about OCCT after my previous post.  It seems that a failure is constituted by either a.) the PSU shutting down due to lack of capacity to handle the load, or b.) a computational error occurring in the course of crunching the numbers for the test, which will trigger the termination of the test by the software.

 

I've been using an all-in-one suite of hardware diagnostics from Eurosoft called QA+Win32.  Yes, I got it from where you think. No I don't work there.  I'm not sold on it being the best in class for the purpose, but I do like the fact that I can set it and go on to the next machine.  A few hours later, I check back and have a *reasonably* reliable report about the health of the various hardware components.  I'd be interested to see what if any knowledge or opinions you guys have on the product.

No problem. I always use the latest version available when testing hardware. Don't know anything about QA+Win32. I do know if you pass all the above tests that I mentioned without errors you most likely have a faulty motherboard. 

 

 
I'm not sure what PS design these older AW machines were using, but the Aurora's and Area-51s for example have Five 12v rails. So, to check them all, you have to eventually plug-in all spare PCIe cables. Right?
 
Even better ... to check rails while under max-load ... When PCIe harnesses split into two (in parallel) ... connect one side to video card and the other into tester. I'll have to test that sometime. If the tester doesn't operate without the 20/24pin attached, seems like using the multimeter on that PCIe harness would be next best choice.

 

No, you don't have to plug in all spare PCIe cables as all those rails in most cases are actually just a single +12V source just split up into multiple +12V outputs.  There are a few units that actually have two +12V sources, but these are typically very high output power supplies. And in most cases these multiple +12V outputs are split up again to form a total of four, five or six +12V rails for even better safety. REAL multiple +12V rail units are very rare.

8 Wizard

 • 

17.4K Posts

May 31st, 2013 21:00

No, you don't have to plug in all spare PCIe cables as all those rails in most cases are actually just a single +12V source just split up into multiple +12V outputs.  There are a few units that actually have two +12V sources, but these are typically very high output power supplies. And in most cases these multiple +12V outputs are split up again to form a total of four, five or six +12V rails for even better safety. REAL multiple +12V rail units are very rare.

 
I guess. Obviously I don't have the schematics for these Alienware SMPS, I'm just going by what it says on the main sticker with all the voltages and amps for each. Definitely describes 5 (A-E) +12dc Rails (IIRC @ 18amps each). I've got a pic here somewhere.
 
And yes, I understand what you are saying.

No Events found!

Top