Remote Systems Management Using the Dell Remote Access Card
By Donnie Bell, Lance Osborne, and Jon McGary (May 2002)
The DellTM Remote Assistant Card II (DRAC II) and Dell Remote Access Card III (DRAC III) provide users with the necessary tools and functionality to monitor, troubleshoot, and repair servers that are around the corner or around the world. This article discusses the DRAC features and functionality and explores how they can reduce time to manage servers, enable faster recovery of remote servers, and lower cost of overall network ownership.
Managing distributed servers from a remote location is often mandatory in today's business environment. IT administrators must easily and effectively manage servers in secure data centers or in locations that have no administrative IT staff. Such scenarios require remotely performing all server management operations and responding to server-down situations.
Remote-access capabilities help to improve system administrator productivity and overall server availability by reducing administrator visits to the system and by allowing some operations on groups of systems instead of individual devices.
The DellTM Remote Assistant Card II (DRAC II) and Dell Remote Access Card III (DRAC III) provide IT administrators with continuous access to servers. Administrators also achieve full control of the server hardware and operating system from any client system running a Web browser, even if the server is down or hung.
The Dell remote-access architecture consists of hardware and software components that allow administrators to do the following:
- Access a server after a server failure, power outage, or loss of a network connection (using a network interface card (NIC) or modem)
- Remotely view a server's internal event logs and power-on self test (POST) codes for diagnostic purposes
- Manage servers in multiple locations from a remote console
- Manage servers by redirecting the console output to a remote console (graphic and text)
- Perform an orderly shutdown of a server for maintenance tasks
- Diagnose a server failure and restart the server
- Alert the administrator using alphanumeric page, numeric page, e-mail, or Simple Network Management Protocol (SNMP) trap when a server detects an error
Hardware to enable remote access
DRACs are peripheral component interconnect (PCI) cards that work with the Embedded Server Management (ESM) chip on the server motherboard. Figure 1 illustrates a typical system architecture using DRACs.
Figure 1. Typical system architecture using DRACs
DRAC II occupies a single, full-length PCI slot. In addition to the processor, the card includes 16 MB of memory, flash RAM/nonvolatile random access memory (NV-RAM), onboard NIC for 10 Mbps Ethernet, PC Card interface, PCI controller, battery, real-time clock, and ESM2 connector.
DRAC II is compatible with Dell PowerEdgeTM x3xx, x4xx, and x5xx servers (2300, 4300, 6300, 2350, 4350, 6350, 2400, 4400, 6400, 2450, 6450, 2500, and 2550). The software necessary to use DRAC II is incorporated into the Dell OpenManageTM IT Assistant that ships with every Dell server.
DRAC III is a half-length PCI card that requires one 33 MHz, 32-bit PCI slot. It provides 16 MB of memory, 8 MB flash RAM/NV-RAM, onboard NIC for 10/100 Mbps Ethernet, one serial interface, battery, real-time clock, and ESM3 connector. The card may optionally include a PCMCIA modem and AC power adapter. Figure 2 illustrates the DRAC III components.
Figure 2. DRAC III components
DRAC III is compatible with PowerEdge 1650 and 4600 servers and the PowerEdge 7150. These systems are based on a standard hardware implementation called Intelligent Platform Management Interface (IPMI), which allows Dell to bring remote management capabilities to market at a lower cost.
DRAC III allows up to 16 administrators to access the same card simultaneously; four administrators may use console redirection. This improvement in access allows administrators to work together from different locations to isolate problems more quickly.
DRACs are not cross-compatible: DRAC III cannot be used in DRAC II systems and vice versa.
Software to enable remote access
Dell OpenManage Server Administrator installs the driver and supplies both a graphical user interface (GUI) and command-line interface (CLI) to set up and use DRAC III. Server Administrator is not used to manage DRAC II.
Server Administrator installs and manages one server at a time. The Remote Access function (provided in Server Administrator 1.1) lets administrators install and update the remote-access software, configure the remote-access architecture, and remotely access the server while it is operational.
Dell OpenManage IT Assistant can configure and launch access to DRAC II and DRAC III. OpenManage IT Assistant lets administrators remotely access an operating or non-operating server in context.
Figure 3 highlights the hardware and software required for DRAC II and DRAC III.
Figure 3. Requirements for DRAC II and DRAC III
Capabilities of DRAC II and DRAC III
DRAC II and DRAC III enable several monitoring and notification capabilities, facilitate diagnostics, provide mechanisms for remote operations, and offer various connection and backup alternatives. Unless specified otherwise, both DRAC II and DRAC III include the following capabilities.
SNMP support. SNMP provides a standard message format to notify the administrator of server problems. DRAC can send notices, even if the server is down, to most of the industry's leading consoles.
Ability to monitor server health. DRAC monitors the health of the hardware to identify any failures and provide the administrator with information to isolate components with problems. This capability enables faster troubleshooting of operational and non-operational servers and potentially yields higher availability.
Alphanumeric and numeric paging. Wireless devices immediately notify administrators of server problems or failure.
E-mail support. Mail systems provide immediate notification of server problems or failure.
Review of hardware logs. DRAC provides access to data showing the state of the hardware and any errors that may have been logged. These logs are the best source of hardware data and help to troubleshoot problems, potentially yielding higher availability.
Access to hardware sensors. DRAC provides readouts on all sensors including power, fans, disks, temperature, and voltage. Administrators can ascertain the server's condition regardless of the state of the operating system.
Boot path analysis. DRAC III lets administrators determine whether any failures occurred during the boot sequence using boot path analysis, which displays the success or failure of POSTs as the server comes up (see Figure 4 ). This capability helps administrators to identify the status of the components before the operating system is operational. Any errors in the boot path may be the source of why a server might not boot. By identifying the component early and isolating problems quickly, administrators may be able to reduce downtime.
Figure 4. Boot path analysis
Ability to start the server, even from powered-down state. DRAC allows remote startup of the server so administrators can power up systems that are powered off at night or over a weekend. This capability reduces expenses and potentially makes administrators more productive.
Shutdown and graceful shutdown procedures. Administrators can increase their productivity by powering down the server quickly or in a controlled fashion without visiting the system.
Automatic server recovery. This feature watches the system to identify periods of inactivity that may indicate a hung condition. The server recovery, or watchdog, is a timer set by the user that counts down from X to 0, if enabled. If the recovery timer cannot talk with the operating system before reaching 0, it initiates a recovery action, ranging from "do nothing" to "reboot the system automatically." The recovery timer tries to talk with the system every second or so to ensure that the system is not hung. This proactive approach could yield higher availability.
Remote console redirection. DRAC provides full console redirection of Microsoft® Windows® , Red Hat® Linux® , and Novell® NetWare® operating systems. This capability allows administrators to perform many maintenance functions from the remote console, resulting in potentially higher application availability and administrator productivity. DRAC III offers faster graphic console redirection and simplifies keyboard functions, such as control-alt-del, making them easier to use.
Remote floppy boot. DRAC offers remote media access, allowing the server to boot from remote media. DRAC II uses floppy redirection. Administrators can insert a bootable DOS diskette into the diskette drive of the desktop machine and boot a remote server to that floppy. Administrators can then run operations from the floppy, including functions such as flash BIOS to recover servers with BIOS problems.
DRAC III uses Trivial File Transfer Protocol (TFTP) to transfer an image to the card and lets administrators enhance remote floppy performance by downloading floppy images to the memory on the card (see Figure 5 ). Functions on the "diskette" are executed in a DOS environment for 32-bit systems.
Figure 5. Remote floppy boot
Operating system down console access. DRAC can enable a remote session when the operating system is down.
Ethernet access. DRAC II uses a 10 Mbps NIC, and DRAC III uses a 10/100 Mbps NIC.
PCMCIA modem. When the network interface is not available or desirable for remote management, the PCMCIA modem provides wide area network (WAN) access to DRAC II and DRAC III.
Optional wall adapter. This adapter provides a backup power source and plugs into a separate power grid or uninterruptible power supply (UPS) to gain server access, even when the server power grid is not available.
Battery backup. This capability provides 30 minutes of server access when all external power sources are non-functional.
Last screen capture. DRAC uses the watchdog timer to talk to the system. When the system is hung or not responding, DRAC begins taking screen shots. When the system stops responding for a user-specified number of seconds, the system is determined to be hung and the last data on the screen is captured for review by the administrator. The information on the system console—the screen capture—can help the administrator to understand what might have been happening at the time of the failure.
Remote access to improve availability and productivity
As managing servers from remote locations becomes a requirement in IT environments, components such as the Dell Remote Assistant Card II and Dell Remote Access Card III enable the monitoring, troubleshooting, and repair of servers remotely. These capabilities allow administrators to manage servers more quickly and to address minor issues proactively before they multiply, thereby increasing productivity and system availability.
Donnie Bell (firstname.lastname@example.org) is a senior marketing consultant of Dell OpenManage Strategy. Prior to joining Dell, Donnie was employed by IBM for 16 years in technical marketing support and technical education. His areas of expertise are distributed systems management, network communications, UNIX® administration, UNIX to Windows NT® interoperability, electronic commerce, and software channel relationships. He has a B.B.A. from West Texas State University.
Lance Osborne (email@example.com) is a product marketing manager for Dell OpenManage for enterprise systems. Before joining Dell, Lance provided product marketing for BMC Software in the U.S., Europe, the Middle East, and Asia. Lance has an M.A. from the University of Texas.
Jon McGary (firstname.lastname@example.org) is a senior software developer in Dell OpenManage Remote Management. Prior to joining Dell, Jon was employed by Tandem Computers and specialized in remote management of fault-tolerant computers. He has a B.S. from Texas A&M University.
For more information
OpenManage white papers: http://www.dell.com/us/en/biz/topics/openmanage_server_mgt_white_papers.htm