Unsolved
This post is more than 5 years old
17 Posts
0
2480
February 19th, 2020 07:00
Precision T7600 Win10 Random freezes
Trying to get this machine stable.
I have run BIOS diagnostics a number of times with no issues (including leaving Memory Diagnostics running overnight).
I DID swap in a PERC H710 card for the original (H310). This card runs really hot. I have a supplemental fan blowing on it (best I can), and I still think it's running between 60-70C semi-regularly. That might be too hot, but I don't know how to monitor it within Win10.
System seems to randomly freeze. Can not isolate any s/w doing it, time of day, etc. Can happen after 2-3hrs of use or after 2 weeks! When it freezes, you can't really get anything out of it; doesn't even respond to pings, KVM all not responsive.
Drivers are up-to-date for this old machine from what I can tell.
Any ideas, tips or tricks would be appreciated.


alejrive
6 Posts
0
July 10th, 2020 21:00
I have the exact same issue.
aweber1nj
17 Posts
0
July 13th, 2020 09:00
I am still not certain I found the issue, but I did catch some "major event" a couple of weeks ago and found that the OS was writing thousands of entries to the Event Log (like many per second). When I googled the error, it appeared to be an ECC/memory error that I guess it was trying to self-correct. (That took me a relatively long time to figure out, because the issue was consuming a ton of resources.) I did not let it get any further, and did a shutdown - so I don't know if it would have actually fully frozen.
FWIW: I can not get memtest86+ to run on the machine. It just crashes soon after starting - seems to be something with Dell BIOS that won't run it. I did run the BIOS memory test for a few hours, and it didn't report anything so very strange.
I ordered new memory and swapped it out completely. I have not had the same issue since, but that's only a few weeks so far.
alejrive
6 Posts
0
July 22nd, 2020 19:00
I also suspect this could be a RAM issue. The diagnostics show no warnings/alerts on Memory.
This T7600 has 128 GB RAM (16*8), It's my personal labbing machine, so, I can't replace the whole RAM at once, Guess I'll have to remove one by one until I hit the faulty one.
aweber1nj
17 Posts
0
July 23rd, 2020 06:00
The problem with trying to replace one stick at a time is you'll almost certainly end-up with mismatched RAM and who knows if that'll cause different/additional issues. This is a sticky situation (troubleshooting RAM when there's no obvious culprit).
My recommendation would be to, instead, remove 1/2 your RAM for some time...I assume you can tolerate running with 64GB for your workload. See if you still encounter the issue. If you DO, switch the set of 64GB and try again. (Really whether you do or don't get the error...)
You could start to isolate the stick that's going bad that way (you'll have to repeat the process in an organized and methodical way).
If you don't get any conclusive results...well, more mystery, unfortunately.
Best of Luck.
alejrive
6 Posts
0
August 16th, 2020 18:00
alejrive
6 Posts
0
August 16th, 2020 18:00
bradthetechnut
9 Technologist
•
9.5K Posts
•
40.1K Points
0
August 29th, 2020 19:00
1,300w is output D.C. Check the label, either on PSU or tower for A.C. input and get a PSU that can exceed that and run PC as long as you want. A PSU might be rated as 10 amp for 15 min., for example, and so on. Also remember you won't always be drawing full power.
I couldn't find any documemtation to be any more specific, let alone immediately find an image of a label indicating input amps.
What I have ran into in the not so distance past is make sure the UPS puts out a true sine wave, not just simulated.
alejrive
6 Posts
0
October 14th, 2020 11:00
So far the issue still persists.
I've changed RAM, eventually same behavior, now it's not lasting even a day turned on.
Ran a diagnostics and still no alarms.
What is the known behavior of a faulty PSU?
I'm about to throw this thing away.
honker_tube
1 Rookie
•
20 Posts
0
October 14th, 2020 11:00
Ahhh! This was the same problem I had been through on my Dell T3610 for almost a month and finding a solution. I took it back to the shop vendor I bought this PC from and he tried almost all ram modules he had but the error did not go away. I am certain that it is a RAM compatibility issue or maybe a RAM slot issue. When I changed all of my 4 RAM modules from 2 channels to all 4 channels (1 module on each) the Error was gone. I still did not know the reason but just ignored it as it might be due to a bad RAM slot. I would suggest that before changing the RAM slots check a non-ECC RAM. I wished I could have checked that as well. I used the Memtest86 to check the RAM.
honker_tube
1 Rookie
•
20 Posts
0
October 19th, 2020 06:00
Hey @aweber1nj , any Update? Did you find the solution?
aweber1nj
17 Posts
1
October 19th, 2020 06:00
When I replaced my RAM, I swapped-out 8 sticks for 4 (but to the same capacity), so I can't say if it's a bad slot or some bad RAM. It virtually fixed the problem. I say that because I did have one re-occurrence after the swap that acted the same way. But it was only once.
I used to get the hang/freeze at random times between 1 day and 2-3 weeks after reboot. I have only had the one additional issue after months of the new RAM.
So is it solved? IDK. But it's much better.
Dryne
1 Rookie
•
26 Posts
0
October 21st, 2020 14:00
Just as a side note for this freeze issue, I also experience this now after Windows 10 updates under Windows 1909. I was not experiencing this at all last year. This is a newly introduced issue that happened after Microsoft Windows 10 updates. I just experienced it prior to workstation power off. What happens for me is that my system gets very slow at times and freezes. Either a reboot or a system shutdown followed by powering back on makes everything faster again. However, the slowness will eventually return after my machine has been on for a very long time.
I typically never power my workstation off and leave it running 24/7. Now after Windows 10 updates, I am facing this slowness issue after the workstation has been up for over 12 hours.
aweber1nj
17 Posts
0
October 22nd, 2020 05:00
I leave my workstation running nearly 24x7 as well. Thus, almost all the cases I had, I caught it pretty-much after the fact. I would wake the screen and that was about it, the tray-clock would show a date/time well in the past, as it had not been updated.
The one time I did catch the issue happening, I was able to slowly work on the machine before it finally froze completely. I found the OS was writing thousands of events to the Windows (System, I think) log. I feel like that certainly contributed to the slow-down and eventual freeze of the machine.
You might try scanning through your Windows logs around the time of the event and see if you notice any huge influx of errors.
alejrive
6 Posts
0
November 13th, 2020 13:00
Ok, Swapped the RAM, now it's working fine.
It's not power, it's something with the slot, since I replaced RAM twice, and the issue now hasn't come back. The documentation below also gave me some ideas on what could be happening as I was troubleshooting.
https://www.dell.com/support/article/en-us/sln265594/power-button-led-status-on-precision-t3600-t5600-t7600-systems?lang=en
Thanks everyone.