Highlighted
8 Krypton

E521 lock up, freeze, hang, wait state circumvention

re: symptoms of hanging half way through POST, hanging during normal operation, USB mouse freezing, USB keyboard freezing, mouse, keyboard, and video freezing, i.e. entire machine locking up under conditions of normal use in Windows, aka. random lock ups -
It appears the E521 is sensitive to entering a perpetual wait state for an as yet undetermined and unresolved reason.  I have experienced many of the permutations of freeze up you will find described in the forum.  I'll apologize in advance for this being a lenghty entry.  Considering the artifacts of the symptoms my hunch is that the E521 is in a wait state it will never come out of when it locks up, either in POST or while running normally.  I derive this because if the problem were due to a tight loop in some code (expression for programming) then the processor would warm up and the fan speed would increase, but the fan does not increase suggesting the machine is in a perpetual wait state, waiting for an event that will never occur.  Our E521 has a single HDD and single optical drive.  The machine came with the HDD connected to SATA0 and the DVD connected to SATA1.  This is contrary to the E521 Owners and Service Manuals.  I noted the OHCI USB controller shares an IRQ with the first serial ATA controller.  You know the symptoms with USB devices so won't go into that except to say the symptoms and IRQ sharing suggest the possibility of the various wait states being attributable to some IRQ related anomaly, namely a lost or mishandled interrupt.  Note that the E521 has four (4) SATA connectors on the system board (see manuals) labeled SATA0, SATA1, SATA2, and SATA3.  I have moved the DVD (optical) drive from SATA1 to SATA3 and the HDD from SATA0 to SATA1 to change the (electrical) interrupt signal they use because although there are unused IRQs you could manually assign, Windows will not permit resource reassignment nor can it be done in the BIOS.  The HDD on SATA1 and DVD on SATA3 is now undergoing the test of time as a circumvention of whatever the root cause of the E521's woes is.
Day later.......
Whelp. unfortunately the E521 has locked up with the HDD on SATA1 and DVD on SATA3.  So, in order to completely get off the serial ATA controller sharing an IRQ with the OHCI USB controller I have moved the HDD to SATA2 leaving the DVD on SATA3.  This puts both drives on the second serial ATA controller which does not share an IRQ in our sample of E521.  BTW, the IRQ sharing is seen in Device Manager under View "Resources by type" and the IRQ selection.  It shows the first serial ATA controller assigned to IRQ 21 along with the OHCI USB controller.  Am also seeing signs that AMD's Cool-n-Quiet technology is fairly aggressive at throttling the CPU speed.  The latency of ramping the CPU back up to full speed seems to have an audible effect on audio quality.  When the Power Scheme is chaged from the default "Minimal Power Management", which compliments CoolnQuiet, to either Home/Office Desk or Always On, which essentially defeat CPU throttling, the audio quality of Windows sounds improves and the frequency of locking up diminishes.  Don't you just love pervasive problems?  But more and more the evidence seems to be pointing up a condition is developing that is timing related.  At any rate the test of time resumes with the HDD on SATA2.  Let you know if it freezes in this posture.

Here's an interesting observation.  As long as I had all drives removed from the first serial ATA controller I thought I'd see if it would be configured on a different IRQ if I removed it in Device Manager.  Prior to restarting after the remove, much to my surprise, the controller has already been reconfigured on IRQ 20, just what you'd want, i.e. not sharing IRQ 21 with the OHCI USB controller.  Unfortunately the IRQ 20 assignment doesn't last.  After restart the first serial ATA controller is right back on IRQ 21 with the OHCI USB controller.  This scenario would repeat everytime I removed the first serial ATA controller.  The difficulty in determining the culprit is that both the BIOS and Windows get in the IRQ assignment act- the BIOS takes a shot at it first but then Windows can override the assigments and it is not readily apparent whether the IRQ assigments are those the BIOS made or if Windows didn't honor the BIOS and reconfigured.  Seeing the serial ATA controller on IRQ 20 in Device Manager after remove and prior to restarting suggests this is Window's assignment.  After restart, i.e. having gone through the BIOS, and the controller coming up back on IRQ 21 suggests the BIOS is muffing the IRQ assignment up.  BTW, no hangs with the HDD on SATA2, DVD on SATA3 after a day of normal use.  Might be getting warm!

Whelp performance fans, looks like our hands are tied with WinXP and (newer) ACPI machines.  See Article ID 315278 in the Microsoft KB.  Isn't going to be any undoing of IRQ sharing by legacy machine and OS methods.  Unfortunately this seems to be another case where artificial intelligence reveals it has deficits yet prohibits superior knowledge of the application from mitigating a degrading or problem aggrevating circumstance, i.e. IRQ sharing when there are unused IRQs available.  Sure wish the hardware segment of the industry would realize the need for more robust PICs in this day and time.  BTW, PIC = Programmable Interrupt Controller = the component that avails IRQ lines for use and manages the electrical signals that raise an interrupt from a device.  As Tim the Tool Man might say: "more IRQs!".  IRQ sharing is a days-are-numbered solution before the sharing compounds to a point where it is no longer practical, like nowadays!  There is a better way but it means big change.  Wasn't that long ago we went from 16 to 24 interrupts.  That didn't get the PC platform far with all the new exploits.  Now we have hot swap busses sharing IRQs with statically installed devices, such as your main hard drive, that demand zero contention for performance and collision avoidance so as not to open the door to timing problems which users and level 1 support have difficulty diagnosing, myself included.  Oops!  Sorry, got on my soap box!  And last but not least for this entry- no lockups yet with HDD and DVD on second serial ATA controller, SATA2 and SATA3.

More days later....

I'll qualify this entry to help identify what scenario the circumvention seems to work in.  Our E521 is an AMD X2 4200 in the Dell OEM nVidia 430 chipset motherboard.  BIOS is 1.1.4 and latest Processor and Chipset SMBus drivers are installed.  Power Scheme used is Home/Office Desk with all timer pops set to Never, no standby election anywhere, and no Hibernation.  Using a screen saver seems to be safe.  BTW, there are small power consumption consequences operating on Home/Office Desk but you pick your evils vs. frustration.  The important factor, I believe, in this circumvention, given the prerecs just mentioned, is the sharing of IRQ 21 by the first serial ATA controller with the OHCI USB controller.  Look in Device manager and see if you have this case to determine if this circumvention might work for you.  BTW, the second serial ATA controller is on IRQ 22 by itself.  Moving both the HDD and DVD to the second serial ATA controller, via connection to SATA2 and SATA3, seems to prevent or avoid the condition that provokes the lock ups.  Having diverse drives on the same controller isn't the preferred setup for performance, especially if you rip optical media from the hard drive.  The final manipulation I have done is to move the DVD to SATA0, back on the first controller, leaving the HDD on SATA2, i.e. the second serial ATA controller.  The DVD is dormant most of the time in our family's use of the E521 so there is essentially no interrupt traffic from the DVD to contend with the USB devices.  I have made an effort to use the E521 as much as possible over the relatively short period of moving both drives to the second serial ATA controller then moving the DVD back to the first controller, and the E521 finally seems solid and is pacing with me with more consistent response time which are some positive signs.  It is ridiculous that Dell has not solved this.  Our E521 is 6 months old and we have experienced lock ups since day one and worked with Dell support periodically over the duration and all I have is a statement of good intent and a lot of my own time invested in trying to isolate the problem.  BTW, the system board has been replaced once to no avail.  At any rate, I hope your shoe (E521) resembles the situation described and if you elect to try this that it will settle your E521 down.  Good luck!  I'll come back if it locks up in the future and let you know the circumvention didn't endure the test of time.  Note that the 'Home/Office Desk' Power Scheme should be considered a part of the circumvention to stabilize the processor speed since it is a change from out-of-box setting.  Thanks!  Optie.

Whelp, here we are, Sept. 7th, 2007, and although the drive reconnection described above seemed to keep the hanging at bay for a good long while, in the past week we've experienced 2 lock ups, one after the screen saver had engaged no less.  Will mention that we received notification of a BIOS update in the last couple of days.  Flashed the BIOS in what is now the third board in the box from 1.1.5 to just released 1.1.11.  Much to our surprise the flash was successful and did NOT leave the BIOS half cocked!  Thought for sure we'd be calling Dell for board #4, but nope, not this time!  From here we'll start the clock over to see if 1.1.11 just happens to rid the E521 of the lock up ghost.  Sure am curious what BIOS is in the E521 Dell in putting in Walmart.  Also curious how many Walmart purchased E521's have had their BIOS clobbered by failed flashes.

Update Oct. 1, 2007

There is a known defect in the AMD Athlon X2 processor's memory controller where it fails to wait long enough after a frequency increase before accessing the RAM that can result in a hang.  I'll refer you to document 33610 in AMD's tech doc titled "Revision Guide for AMD NPT Family 0Fh Processors".  Errata #152 describes the known defect which can result when using any of the Power Schemes that operate the processor in 'adaptive' mode.  Adaptive mode compliments Cool n Quiet by letting the processor adapt the clock frequency to demand.  AMD's recommended Power Scheme is Minimal Power Management which puts the processor in 'adaptive' mode.  The Dell BIOS does not seem to implement the workaround AMD prescribes for Errata #152.  There is, however, an alternative workaround.  It is to change the Power Scheme to one that does NOT operate the processor in adaptive mode.  The two (2) Power Schemes that operate the processor constantly at its highest frequency are Always On and Home/Office Desk, provided the machine is running on A/C with Home/Office Desk as opposed to a battery backup UPS.  These Power Schemes defeat Cool n Quiet with the consequences being slightly greater heat generation and power consumption.  With the E521 set to the Power Scheme 'Minimal Power Management' the machine will reliably hang up within 24 hours running the slide show screen saver.  Switching to the 'Always On' Power Scheme the machine has yet to hang up after several days of running the slide show screen saver continuously.  I am trying to obtain certification from Dell that they, in fact, do not have the Errata #152 workaround implemented but trying to get this sort of information out of Dell is like trying to move a mountain.  Anybody that wants to join in the fray, give Dell a call and ask if they have the workaround for AMD X2 Errata #152 implemented in the latest 1.1.11 BIOS and maybe if enough of us call they will eventually respond.  Thanks for your indulgence reading all this.  Hope it solves your hang ups!  Optie

 

Message Edited by Opt-e-tech on 06-23-2007 01:52 PM

Message Edited by Opt-e-tech on 06-24-2007 12:54 AM

Message Edited by Opt-e-tech on 06-24-2007 10:57 PM



Message Edited by Opt-e-tech on 06-26-2007 12:32 AM

Message Edited by Opt-e-tech on 06-26-2007 11:31 AM

Message Edited by Opt-e-tech on 06-28-2007 12:02 PM

Message Edited by Opt-e-tech on 06-28-2007 08:02 PM

Message Edited by Opt-e-tech on 06-28-2007 08:15 PM

Message Edited by Opt-e-tech on 09-07-2007 09:05 PM

Message Edited by Opt-e-tech on 10-01-2007 10:04 AM

Message Edited by Opt-e-tech on 10-01-2007 10:05 AM
0 Kudos