跳转至主要内容
  • 快速、轻松地下订单
  • 查看订单并跟踪您的发货状态
  • 创建并访问您的产品列表
  • 使用“Company Administration”(公司管理),管理Dell EMC站点、产品和产品级联系人。
某些文章编号可能已更改。如果这不是您要查找的内容,请尝试搜索所有文章。搜索文章

How to troubleshoot memory or battery errors on the PERC controller on Dell PowerEdge servers

摘要: Here you find detailed information about how to troubleshoot memory and battery issues on PowerEdge RAID Controllers (PERC), which are used in Dell servers.

本文可能已自动翻译。如果您对其质量有任何反馈,请使用此页面底部的表单告知我们。

文章内容


症状

This article provides information on how to troubleshoot the "Memory/battery problems were detected. The adapter has recovered, but cached data was lost. Press any key to continue" error and other memory related errors that may occur on the Dell PERC controller on Dell PowerEdge servers.
 

Table of Contents:

  1. RAID Controller Error Message During Post
  2. Troubleshooting Conditions That Lead to Error Message
    1. Reboot to OS
    2. Clear Controller Cache
    3. Check the Physical PERC Controller
  3. Additional Information
    1. PERC Battery Maintenance
    2. Cache Use 
       

1. RAID Controller Error Message During Post

During POST, the RAID controller presents a message:

Memory/battery problems were detected. The adapter has recovered but cache data may be lost. Press any key to continue.

For errors that appear on the LCD or when running diagnostics, refer to the following article:

Interpreting LCD and Embedded Diagnostic (ePSA) event messages. 

Back to Top
 


2. Troubleshooting Conditions That Lead to Error Message

This message can occur normally when one of the following conditions occur. Troubleshooting the associated events will likely also prevent this message from occurring.

  • OS indicates abnormal shutdown.
  • OS indicates error occurred (blue screen occurred in Windows).
  • Spontaneous power loss condition.

Common troubleshooting steps include:
 

1. Reboot to OS

If the OS boot is successful, rebooting again should result in no message being displayed.

2. Clear Controller Cache

  1. CTRL-M for SCSI controllers (PERC 3, PERC 4).
  2. CTRL-R for SAS/SATA controllers (PERC 5, PERC 6 and newer controllers).
  3. Wait five minutes to allow contents of cache to purge.
  4. Reboot back to controller BIOS.
    Note: If error persists, the likelihood of a hardware error is increased. Please contact Technical Support for further troubleshooting steps.
  5. If error is eliminated, boot to OS.
  6. If OS boot is still not successful and/or the error persists, this may indicate a problem with the OS. Please contact Technical Support for further troubleshooting steps if you have an active warranty.

Back to Top
 

3. Check the Physical PERC Controller

 

  1. Inspect the DIMM and DIMM Socket for Damage.
    1. Power the system off and remove the power cable(s) from the system.
    2. Let the system sit for 30 seconds to allow any remaining flea power to drain.
    3. Remove the PERC controller. For information on removing and replacing parts in this system, refer to the user guide located at Dell Support.
    4. Remove the RAID memory battery. Remember to reinstall the memory battery after inserting the DIMM.
    5. Remove the memory DIMM from the controller (If applicable).
    6. Check DIMM socket for any bent pins or other damage. Check the edge connector of the memory DIMM for any damage.
  2. If the controller has embedded memory or the memory socket is damaged, the PERC Controller will need replacement.
  3. If the memory is damaged, the controller memory needs replacing.
  4. If there is no damage, replace the memory DIMM and reinstall the controller.
  5. Swap the controller memory with known good memory (if possible).
    1. No known good memory available: contact support.
    2. The error does not occur with the known good memory: replace the memory.
    3. The error remains with the known good memory: replace the PERC Controller.
 

Back to Top



3. Additional Information


This error message displayed during POST indicates that the controller's cache does not contain all of the expected information, or it contains data destined for a hard drive that cannot or has not been written to the drive. The most common reasons why this error may be presented are:
 

  • Server did not perform a normal shutdown process - Power loss and/or spontaneous restarts can result in incomplete or corrupted data to remain in cache that cannot be written to a drive.
  • Cache memory is defective - Bad cache memory can cause data to become corrupted. This can cause OS-related issues and spontaneous reboots.
  • Loss of battery power while server is shutdown - Controllers that do not use NVCACHE (Non-Volatile Cache) memory utilize batteries that can retain the contents of cache for a limited time (24-72 hours) while the server is not powered on. Once the battery drains, the entire contents of cache is lost and the controller recognizes that the cache memory does not contain all of the information expected. Controllers that do utilize NVCache (some H700/H800 controllers and newer controllers such as H710, H710P, H810) are very unlikely to encounter this issue since the battery only needs to maintain power for 30 seconds or less in most cases.


Back to Top

1. PERC Battery Maintenance


A PERC battery that is suspected to be failed or has a warning symbol displayed in OpenManage Server Administrator (OMSA) should have a manual Learn Cycle performed.  A Learn Cycle causes the battery to discharge and recharge, and should restore the battery to a fully functional condition. In some cases, multiple Learn Cycle procedures may be required to restore the battery to an effectively charged state. To perform a manual Learn Cycle, select Start Learn Cycle from the Battery Tasks dropdown menu in OMSA.

SLN130018_en_US__1I_PERC_Battery_JM_V1
Figure 1: OMSA Battery Tasks drop down menu

Back to Top

2. Cache Use

Hardware RAID controllers utilize cache (a temporary repository of information) for its normal operation. The normal operation cache is comprised of DRAM memory, which like system memory, only retains data while powered.

Newer controllers utilize NVCache, which is utilized when the server is powered off. NVCache memory contains both DRAM memory (for normal operation) and flash memory (non-volatile). The controllers battery (if operational) powers the DRAM memory during a power loss so the contents can be copied into the flash memory for indefinite storage.
 

The contents of cache can essentially be broken into three parts:
  1. RAID configuration and metadata - Information about the RAID arrays including configuration information, disk members, role of disks, etc.
  2. Controller logs - RAID controllers maintain several log files. Dell technicians rely on the TTY log as the primary log for troubleshooting various RAID and hard drive issues.
  3. RAID data - This is the actual data destined to be written to the individual hard drives. Data is written into the cache of the controller in both Write Through and Write Back cache policy modes.


Back to Top


原因

-

解决方案

-

文章属性


受影响的产品

PowerEdge, OEMR R720xd

上次发布日期

25 3月 2022

版本

6

文章类型

Solution