Knowledge Base

How to troubleshoot "LINT1 motherboard interrupt" errors on PowerEdge Servers running VMware vSphere.


Article Summary: How to troubleshoot "LINT1 motherboard interrupt" errors on PowerEdge Servers running VMware vSphere.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1804

Details

In an ESX/ESXi host, you experience these symptoms:
  • Instability, including purple diagnostic screens citing an NMI, Non-Maskable or LINT1 Interrupt.
  • The console displays an entry similar to:

    LINT1 motherboard interrupt. This is a hardware problem: please contact your hardware vendor.

  • The VMkernel log file at /var/log/vmkernel or /var/log/messages contains entries similar to one of:

    ALERT: APIC: 1143: Lint1 interrupt on pcpu 0   (port x61 contains 0x5)
    ALERT: APIC: 1150: Lint1 interrupt on pcpu 0   (port x61 contains 0xb1)
    WARNING: NMI: 2550: Forwarding LINT1 motherboard interrupt to host (75188 forwarded so far)
    Fatal NMI: IO Parity Error (0xNN)
    Fatal NMI: RAM Parity Error (0xNN)


  • The messages log at /var/log/messages contains entries similar to:

    kernel: [1831046.301319] Uhhuh. NMI received for unknown reason 11.
    kernel: [1831046.323134] Dazed and confused, but trying to continue


  • You may experience a purple screen when passing through a device to a virtual machine and when reviewing the vmkernel core dump you see the following events:


    WARNING: IOMMUIntel: 2211:  IOMMU Unit # 0: R/W  = 1, Device 007:00.0 Faulting PA = 0xdf63e000 Fault Reason =  6

Solution


The VMkernel log entries indicate that an Non-Maskable Interrupt (NMI) event occurred.
An NMI is a physical hardware event, not a software event. An NMI is typically the result of a non-recoverable condition (in the context of continued operation during that specific boot cycle) that the system BIOS and/or management chipset encounters.
Depending on the version of ESX and the configuration, NMI log entries may appear in the /var/log/vmkernel or /var/log/messages log files, on the console, or in the VMkernel core dump file if the condition triggers a VMkernel purple diagnostic screen.
NMI events are routed by the CPU via the Advanced Programmable Interrupt Controller (APIC) to the operating system (in this case, the ESX host) via the operating system kernel (in this case, the VMkernel). NMI data is transmitted via port 0x61 (ISA-Compatible Register Address hex-61), which is an 8bit register reserved for NMI data.

An NMI event may be caused by:
  • Physical hardware faults, such as a bad memory module or processor.
  • Severe thermal cycling of a critical component, often seen after a long period of downtime or a cooling component failure.
  • Components running out-of-specification, such as an over-voltage or under-voltage condition as a result of a hardware fault involving a voltage regulator module.
  • Non-approved or incompatible components, such as an active memory backplane whose design revision is too early for the chassis.
  • A firmware, BIOS or other component mismatch, such as option-card of revision "X" requiring a minimum option-card firmware revision "Y" and a minimum chassis BIOS revision "Z".
  • This outage due to purple screen is occurring due to a faulting interaction at the hardware level. The CPU IOMMU feature that is used to map the DMA memory for a device from the host operating system to the guest operating system is encountering an error and cannot proceed. The PCI ID for the device can be seen in the event in the symptoms section (Device 007:00.0). The device can be identified by running lspci from the ESXi shell and matching the PCI ID to the device. Please note that the PCI device may not be the cause, but only the trigger to the issue with another hardware component.

If you experience an NMI event, take note of these details:
  • Which virtual machines (if any) were powered on at the time of the NMI event?
  • Does powering on a specific virtual machine trigger an NMI event?
  • Does moving a suspect memory module to a new slot (and therefore higher or lower in memory address space) alter the behavior?

    Note: Replacing or relocating hardware components does not necessarily help you determine the root cause of the NMI event and may result in unplanned downtime.

If you experience an NMI event, provide your hardware vendor with this data:
  • Timeframe that the event happened.
  • At least 10 minutes of logs leading up to the event
  • Chassis diagnostics log output and management chipset log output
  • Chassis vital product data
  • A copy of the vm-support output.
  • The relevant VMware Service Request number, if opened.
 


Quick Tips content is self-published by the Dell Support Professionals who resolve issues daily. In order to achieve a speedy publication, Quick Tips may represent only partial solutions or work-arounds that are still in development or pending further proof of successfully resolving an issue. As such Quick Tips have not been reviewed, validated or approved by Dell and should be used with appropriate caution. Dell shall not be liable for any loss, including but not limited to loss of data, loss of profit or loss of revenue, which customers may incur by following any procedure or advice set out in the Quick Tips.

Article ID: SLN155921

Last Date Modified: 06/18/2013 12:00 AM


Rate this content

Accurate
Useful
Easy to understand
Did this article solve your problem?
Yes
No
Send us feedback
CAPTCHA
Change the CAPTCHA codeSpeak the CAPTCHA code
 
Enter Captcha Code
There is an error with an entry. Please try again entering your CAPTCHA code.
Feedback shows invalid character, not accepted special characters are <> () \
Sorry, our feedback system is currently down. Please try again later.

Thank you. Your feedback has been sent.