База знаний

How to Troubleshoot Hard Drives and RAID Controller Errors on Dell PowerEdge 12G


This article provides information on how to troubleshoot hard drive and RAID Controller errors on Dell PowerEdge™ 12G Servers.

Message Meaning
There are X enclosures connected to connector Y, but only maximum of 4 enclosures can be connected to a single SAS connector. Please remove the extra enclosures then restart your system. When the BIOS detects more than 4 enclosures connected to a single SAS connector, it displays this message. You must remove all additional enclosures and restart your system.
Cache data was lost, but the controller has recovered. This could be due to the fact that your controller had protected cache after an unexpected power loss and your system was without power longer than the battery backup time. Press any key to continue or 'C' to load the configuration utility. This message displays under the following conditions:
  • The adapter detects that the cache in the controller cache has not yet been written to the disk subsystem.
  • The controller detects an Error- Correcting Code (ECC) error while performing its cache checking routine during initialization.
  • The controller discards the cache rather than sending it to the disk subsystem because the data integrity cannot be guaranteed. To resolve this problem, allow the battery to charge fully. If the problem persists, the battery or adapter DIMM might be faulty.
  • The following virtual disks have missing disks: (x). If you proceed (or load the configuration utility), these virtual disks will be marked OFFLINE and will be inaccessible. Please check your cables and ensure all disks are present. Press any key to continue, or 'C' to load the configuration utility. The message indicates that some configured disks were removed. If the disks were not removed, they are no longer accessible. The SAS cables for your system might be improperly connected. Check the cable connections and fix any problems. Restart the system. If there are no cable problems, press any key or to continue.
    All of the disks from your previous configuration are gone. If this is an unexpected message, then please power off your system and check your cables to ensure all disks are present. Press any key to continue, or 'C' to load the configuration utility. The message indicates that all configured disks were removed. If the disks were not removed, they are no longer accessible. The SAS cables for your system might be improperly connected. Check the cable connections and fix any problems. Restart the system. If there are no cable problems, press any key or <C> to continue.
    The following virtual disks are missing: (x) If you proceed (or load the configuration utility), these virtual disks will be removed from your configuration. If you wish to use them at a later time, they will have to be imported. If you believe these virtual disks should be present, please power off your system and check your cables to ensure all disks are present. Press any key to continue, or 'C' to load the configuration utility. The message indicates that some configured disks were removed. If the disks were not removed, they are no longer accessible. The SAS cables for your system might be improperly connected. Check the cable connections and fix any problems. Restart the system. If there are no cable problems, press any key or <C> to continue.
    The cache contains dirty data, but some virtual disks are missing or will go offline, so the cached data can not be written to disk. If this is an unexpected error, then please power off your system and check your cables to ensure all disks are present. If you continue, the data in cache will be permanently discarded. Press 'X' to acknowledge and permanently destroy the cached data. The controller preserves the dirty cache from a virtual disk if the disk becomes offline or is deleted because of missing physical disks. This message indicates that some configured disks were removed. If the disks were not removed, they are no longer accessible. The SAS cables for your system might be improperly connected. Check the cable connections and fix any problems. Restart the system. Use the Ctrl+R utility to import the virtual disk or discard the preserved cache.
    A discovery error has occurred, please power cycle the system and all the enclosures attached to this system. This message indicates that discovery did not complete within 120 seconds. The SAS cables for your system might be improperly connected. Check the cable connections and fix any problems. Restart the system.
    Entering the configuration utility in this state will result in drive configuration changes. Press 'Y' to continue loading the configuration utility or please power off your system and check your cables to ensure all disks are present and reboot. The message is displayed after another BIOS warning indicating there are problems with previously configured disks and you have chosen to accept any changes and continue. The SAS cables for your system might be improperly connected. Check the cable connections and fix any problems. Restart the system. If there are no cable problems, press any key or <Y> to continue.
    BIOS Disabled. No Logical Drives Handled by BIOS. This warning message displays after you disable the ROM option in the configuration utility. When the ROM option is disabled, the BIOS cannot boot to INT 13h and cannot provide the ability to boot from the virtual disk. Int 13h is an interrupt signal that supports numerous commands that are sent to the BIOS, then passed to the physical disk. The commands include actions you can perform with a physical disk, such as reading, writing, and formatting.
    Adapter at Baseport xxxx is not responding where xxxx is the baseport of the controller. Contact Dell Technical Support.
    There are offline or missing virtual drives with preserved cache. Please check the cables and ensure that all drives are present. Press any key to enter the configuration utility. The controller preserves the dirty cache from a virtual disk if the disk becomes offline or is deleted because of missing physical disks. This preserved dirty cache is called pinned cache, and is preserved until you import the virtual disk, or discard the cache. Use the Ctrl+R utility to import the virtual disk or discard the preserved cache. For the steps used to manage preserved cache.
    x Virtual Disk(s) Offline where x is the number of virtual disks failed. When the BIOS detects virtual disks in the offline state, it displays this warning. You must check to determine why the virtual disks failed and correct the problem. The BIOS does not take any action.
    x Virtual Disk(s) Degraded where x is the number of virtual disks degraded. When the BIOS detects virtual disks in a degraded state, it displays this warning. Take corrective action(s) to make the virtual disks optimal. The BIOS does not take any action.
    x Virtual Disk(s) Partially Degraded. When the BIOS detects a single disk failure in a RAID 6 or RAID 60 configuration, it displays this warning. You must check why the member disk is not present to correct the problem. The BIOS does not take any action.
    A discovery error has occurred, please power cycle the system and all the enclosures attached to this system. This message indicates that discovery did not complete within 120 seconds. The SAS cables for your system might be improperly connected. Check the cable connections and fix any problems. Restart the system.
    Memory/Battery problems were detected. The adapter has recovered, but cached data was lost. Press any key to continue. This message occurs under the following conditions:
  • The adapter detects data in the controller cache that has not yet been written to the disk subsystem.
  • The controller detects an Error- Correcting Code (ECC) error while performing its cache checking routine during initialization.
  • The controller discards the cache rather than sending it to the disk subsystem because the data integrity cannot be guaranteed.
  • The battery may be under charged. Allow the battery to charge fully to resolve this problem. If the problem persists, the battery or adapter DIMM might be faulty; contact Dell Technical Support.
  • Foreign configuration(s) found on adapter. Press any key to continue, or ’C’ to load the configuration utility or ’F’ to import foreign configuration(s) and continue. When a controller firmware detects a physical disk with existing foreign metadata, it flags the physical disk as foreign and generates an alert indicating that a foreign disk was detected. Press <F> at this prompt to import the configuration (if all member drives of the virtual disk are present) without loading the BIOS configuration utility. Or, press <C> to enter the BIOS configuration utility and either import or clear the foreign configuration.
    The foreign configuration message is present during POST but no foreign configurations are present in the foreign view page in CTRL+R. All virtual disks are in an optimal state. Ensure all your PDs are present and all VDs are in optimal state. Clear the foreign configuration using CTRL+R or Dell OpenManage™ Server Administrator Storage Management.
    CAUTION:
    If you insert a physical disk that was previously a member of a virtual disk in the system, and that disk’s previous location has been taken by a replacement disk through a rebuild, you must manually remove the foreign configuration flag of the newly inserted disk.
    Previous configuration(s) cleared or missing. Importing configuration created on XX/XX XX.XX. Press any key to continue, or ’C’ to load the configuration utility. The message indicates that the controller and physical disks have different configurations. You can use the BIOS Configuration Utility to clear the foreign configuration.
    Invalid SAS topology detected. Please check your cable configurations, repair the problem, and restart your system. The SAS cables for your system are improperly connected. Check the cable connections and fix problems if any. Restart the system.
    Multibit ECC errors were detected on the RAID controller. If you continue, data corruption can occur. Contact technical support to resolve this issue. Press 'X' to continue or else power off the system, replace the controller and reboot. This error is specific to the PERC H700 controller. Multi-bit ECC errors (MBE) occur in the memory and can corrupt cached data and discard it.
    CAUTION:
    MBE errors are serious, as they cause data corruption and data loss. In case of MBE errors, contact Dell Technical Support.
    NOTE:
    A similar message displays when multiple single-bit ECC errors are detected on the controller during boot up.
    Multibit ECC errors were detected on the RAID controller. The DIMM on the controller needs replacement. If you continue, data corruption can occur. Press 'X' to continue or else power off the system, replace the DIMM module, and reboot. If you have replaced the DIMM please press 'X' to continue . Multi-bit ECC errors (MBE) occur in the memory and can corrupt cached data and discard it.
    CAUTION:
    MBE errors are serious, as they cause data corruption and loss. In case of MBE errors, contact Dell Technical Support.
    Some configured disks have been removed from your system or are no longer accessible. Check your cables and ensure all disks are present. Press any key or ’C’ to continue. The message indicates that some configured disks were removed. If the disks were not removed, they are no longer accessible. The SAS cables for your system might be improperly connected. Check the cable connections and fix any problems. Restart the system. If there are no cable problems, press any key or <C> to continue.
    The battery is currently discharged or disconnected. Verify the connection and allow 30 minutes for charging. If the battery is properly connected and it has not returned to operational state after 30 minutes of charging then contact technical support for additional assistance. NOTE:
    This message may appear for controller without battery, depending on the virtual disks’ policies.
  • The controller battery is missing or currently inaccessible, contact Dell support is the problem persist after 30 minutes.
  • The controller battery is completely discharged and needs to be charged for it to become active. You must first charge the battery, then wait for a few minutes for the system to detect it.
  • Problem Suggested Solution
    The controller displays in Device Manager but has a yellow bang (exclamation point). Reinstall the driver. For more information on reinstalling drivers, see "Driver Installation".
    No Hard Drives Found message displays during a media-based installation of Microsoft® Windows Server® 2003 or Microsoft Windows® XP because of the following causes:
    1. The driver is not native in the operating system.
    2. The virtual disks are not configured properly. The controller BIOS is disabled.
    The corresponding solutions are:
    1. Press <F6> to install the RAID device driver during installation.
    2. Enter the BIOS Configuration Utility to configure the virtual disks. Enter the BIOS Configuration Utility to enable the BIOS. For information on configuring virtual disks.
    Problem Suggested Solution
    One of the physical disks in the disk array is in the failed state. Perform the following actions to resolve this problem:
    1. Check the backplane for damage.
    2. Check the SAS cables.
    3. Reseat the physical disk.
    4. Contact Dell Technical Support if the problem persists.
    Cannot rebuild a fault tolerant virtual disk. The replacement disk is too small or not compatible with the virtual disk. Replace the failed disk with a compatible good physical disk with equal or greater capacity.
    One or more physical disks is displayed as Blocked and can not be configured. PERC H700 and PERC H800 controllers support only Dell-certified SAS and SATA hard disk drives (HDD) and solid-state drives (SSD). If you are using a Dell-certified drive but are still experiencing this problem, perform the following actions:
    1. Check the backplane for damage.
    2. Check the SAS cables.
    3. Reseat the physical disk.
    Problem Meaning / Solution
    An error occurred while reading non-volatile settings. An error reading any one of a number of settings from the firmware. Reseat the controller and reboot.
    An error occurred while reading current controller settings. Controller setup and initialization has failed. Reboot the system.
    Advanced Device Properties settings not found. Failed to read vital configuration page from firmware. Re-flash the firmware and reboot.
    Error obtaining PHY properties configuration information. Failed to read vital configuration page from firmware. Re-flash the firmware and reboot.
    Configuration Utility Options Image checksum error. Failed to properly read Configuration Utility options from flash. Restart and retry. If the issue persists, re-flash the firmware on the controller.
    Can't load default Configuration Utility options. Failed to allocate memory for Configuration Utility options structure.
    An error occurred while writing non-volatile settings. An error occurred while writing one or more settings to the firmware.
    Issue Suggested Solution
    Rebuilding the physical disks after multiple disks become simultaneously inaccessible. Multiple physical disk errors in a single array typically indicate a failure in cabling or connection and could involve the loss of data. You can recover the virtual disk after multiple physical disks become simultaneously inaccessible. Perform the following steps to recover the virtual disk.
    CAUTION:
    Follow the safety precautions to prevent electrostatic discharge.
    1. Turn off the system, check cable connections, and reseat physical disks.
    2. Ensure that all the drives are present in the enclosure.
    3. Turn on the system and enter the CTRL+R utility and import the foreign configuration. Press <"F"> at the prompt to import the configuration, or press <"C"> to enter the BIOS configuration utility and either import or clear the foreign configuration. If the virtual disk is redundant and transitioned to Degraded state before going Offline, a rebuild operation starts automatically after the configuration is imported. If the virtual disk has gone directly to the Offline state due to a cable pull or power loss situation, the virtual disk is imported in its Optimal state without a rebuild occurring You can use the BIOS Configuration Utility or Dell OpenManage storage management application to perform a manual rebuild of multiple physical disks.
    Rebuilding a physical disk after one of them is in a failed state. If you have configured hot spares, the PERC H700 or PERC H800 controller automatically tries to use one of them to rebuild a physical disk that is in a failed state. Manual rebuild is necessary if no hot spares with enough capacity to rebuild the failed physical disks are available. You must insert a physical disk with enough storage in the subsystem before rebuilding the physical disk. You can use the BIOS Configuration Utility or Dell OpenManage storage management application to perform a manual rebuild of an individual physical disk.
    A virtual disk fails during rebuild while using a global hot spare. The global hot spare goes back to Hotspare state and the virtual disk goes to Failed state.
    A virtual disk fails during rebuild while using a dedicated hot spare. The dedicated hot spare goes to Ready state and the virtual disk goes to Failed state.
    A physical disk fails during a reconstruction process on a redundant virtual disk that has a hot spare. The rebuild operation for the inaccessible physical disk starts automatically after the reconstruction is completed.
    A physical disk is taking longer than expected to rebuild. A physical disk takes longer to rebuild when under high stress. For example, there is one rebuild I/O operation for every five host I/O operations.
    You cannot add a second virtual disk to a disk group while the virtual disk in that disk group is undergoing a rebuild. The firmware does not allow you to create a virtual disk using the free space available in a disk group if a physical disk in a virtual disk group is undergoing a rebuild operation.
    Problem Suggested Solution
    A SMART error is detected on a physical disk in a redundant virtual disk. Perform the following steps:
    1. Force the physical disk offline.
    NOTE:
    If a hot spare is present, the rebuild starts with the hot spare after the drive is forced offline.

    2. Replace it with a new physical disk of equal or higher capacity Perform the Replace Member operation. The Replace Member operation allows you to copy data from a source physical disk of a virtual disk to a target physical disk that is not a part of the virtual disk.
    A SMART error is detected on a physical disk in a non-redundant virtual disk. Perform the following steps:
    1. Back up your data.
    2. Use Replace Member or set up a global hot spare to replace the disk automatically.
    3. Replace the affected physical disk with a new physical disk of equal or higher capacity.
    4. Restore from the backup.
    A SMART error occurs during a Consistency Check (CC) Specify how the CC operation must perform when a SMART error is encountered.
    There are two settings, Yes and No. No is the default setting and allows CC to continue when the first error is encountered. The Yes setting halts CC when the first error is encountered. Events are generated in the Event Log when errors are encountered during CC.

    Note: How to put offline a physical disk using OMSA is explained in this article.

    Problem Suggested Solution
    A The source drive fails during the Replace Member operation. If the source data is available from other drives in the virtual disk, the rebuild begins automatically on the target drive, using the data from the other drives.
    Target drive fails. If the target drive fails, the Replace Member operation aborts.
    Other drives fail. If the target drive fails and the Replace Member operation aborts but the source data is still available, then the Replace Member operation continues as Replace Member.
    Error Message Suggested Solution
    <Date:Time> <HostName>
    kernel: sdb: asking for
    cache data failed
    <Date:Time> <HostName>
    kernel: sdb: assuming
    drive cache: write
    through
    This error message displays when the Linux Small Computer System Interface (SCSI) mid-layer asks for physical disk cache settings. The controller firmware manages the virtual disk cache settings on a per controller and a per virtual disk basis, so the firmware does not respond to this command. The Linux SCSI midlayer assumes that the virtual disk's cache policy is Write-Through. SDB is the device node for a virtual disk. This value changes for each virtual disk.

    Except for this message, there is no effect of this behavior on normal operation. The cache policy
    of the virtual disk and the I/O throughput are not affected by this message. The cache policy
    settings for the PERC H700 and PERC H800 SAS RAID system remain unchanged.
    Driver does not auto-build into new kernel after customer updates. This error is a generic problem for Dynamic Kernel Module Support (DKMS) and applies to all DKMS-enabled driver packages. This issue occurs when you perform the following steps:
    1. Install a DKMS-enabled driver package.
    2. Run up2date or a similar tool to upgrade the kernel to the latest version.
    3. Reboot to the new kernel.
    The driver running in the new kernel is the native driver of the new kernel. The driver package you installed previously in the new kernel does not take effect in the new kernel. Perform the following procedure to make the driver auto-build into the new kernel:
    1. Type: dkms build -m <module_name> -v<module version> -k <kernel version>
    2. Type: dkms install -m <module_name> - v <module version> -k <kernel version>
    Type the following to check whether the driver is successfully installed in the new kernel: DKMS
    The following details appear: <driver name>, <driver version>, <new kernel version>: installed
    smartd[smartd[2338] Device: /dev/sda, Bad IEC (SMART) mode page, err=-5, skip device This is a known issue. An unsupported command is entered through the user application. User applications attempt to direct Command Descriptor Blocks to RAID volumes.
    The error message does not effect the feature functionality.
    smartd[2338] Unable to register SCSI device /dev/sda at line 1 of file /etc/smartd.conf The Mode Sense/Select command is supported by firmware on the controller. However, the
    Linux kernel daemon issues the command to the virtual disk instead of to the driver IOCTL node. This action is not supported.
    The LED on the physical disk carrier indicates the state of each physical disk. Each drive carrier in your enclosure has two LEDs: an activity LED (green) and a status LED (bicolor, green/amber) as shown in below. The activity LED flashes whenever the drive is accessed.
    LED Description
    Off Slot is empty, drive is not yet discovered by the system.
    Steady green Drive is online.
    Green flashing (250 milliseconds [ms]) Drive is being identified or is being prepared for removal.
    Green flashing (On 400 ms, Off 100 ms) Drive is rebuilding or undergoing a Replace Member operation.
    Amber flashing (125 ms) Drive has failed.
    Green/amber flashing (Green On 500 ms / Amber On 500 ms, Off 1000 ms) Predicted failure reported by drive.
    Green flashing (Green On 3000 ms, Off 3000 ms, Amber On 3000 ms, Off 3000 ms) Drive being spun down by user request or other non-failure condition.


    Need more help?
    Find additional Product Resources

    Visit and ask for support in our Communities

    Create an online support Request


    Код статьи: SLN129432

    Дата последнего изменения: 10/11/2016 03:05 AM


    Оцените эту статью

    Точно
    Функционально
    Просто понять
    Помогла ли вам эта статья?
    Да Нет
    Отправьте нам свое мнение
    Отзыв содержит недопустимый символ, следующие символы не принимаются: <> () &#92;
    К сожалению, наша система обратной связи в настоящее время не работает. Повторите попытку позже.

    Спасибо. Ваш отзыв отправлен.