OpenManage IT Assistant: A Scalability Study

OpenManage IT Assistant: A Scalability Study

Dell Magazines

Dell Magazines

Dell Power Solutions

Dell Power Solutions
Subscription Center
Advertise
Submit an Article
Magazine Extras

Dell Insight

Dell Insight Archives

OpenManage IT Assistant: A Scalability Study

By Roger Foreman and Sandra Woodcock (Issue 4 2001)

Dell OpenManage IT Assistant performs systems management for Dell PowerEdge and PowerApp products. This article describes tests performed by the Dell team to measure performance and scalability of IT Assistant version 6.1 for managing 500, 1000, 2000, and 3000 nodes. The results of the study show it is capable of managing a minimum of 3,000 nodes.

Dell® OpenManageTM  IT Assistant is a full-featured systems management console for monitoring and managing Dell PowerEdge®  servers, PowerAppTM  appliances, OptiPlex®  desktops, Precision®  workstations, and Latitude®  notebooks. It performs the standard functions of discovery, event management, inventory, reporting, storage and remote management.

IT Assistant supports the industry-standard management protocols: Simple Network Management Protocol (SNMP), Desktop Management Interface (DMI), and Common Information Model (CIM). It allows system administrators to monitor for events such as system overheating, chassis intrusion, fan or power supply failure, RAID controller issues, and SMART disk alerts.

Dell wanted to test the scalability of the latest version of IT Assistant. Previous releases were certified up to 500 nodes.1 The goal was to measure IT Assistant's performance and related scalability based on managing 500, 1000, 2000, and 3000 nodes. The Dell team developed a scalability study to show that OpenManage IT Assistant could successfully manage a high volume of end nodes.

The test objectives included:

  • Measuring the performance of IT Assistant Server while increasing the volume of end-nodes to manage
  • Measuring the amount of server resources such as CPU required by IT Assistant to monitor a set of nodes
  • Measuring the overall network throughput based on a variety of workload conditions

Setting up the test environment

The Dell team used the smallest 1U form factor rack-mountable Dell PowerEdge 1550 server for the tests. Software included Microsoft® Windows®  2000 Server SP2, IT Assistant version 6.1, and Microsoft SQL Server version 7.0.

Note that the original configuration included dual 866 MHz processors (one CPU disabled to establish a baseline configuration), 1 GB RAM, two Seagate ST318404LC SCSI disks, Intel®  8255x-based Peripheral Component Interconnect (PCI) Ethernet Adapter, ATI RAGE XL PCI SVGA, and a Samsung CD-ROM SN-124 drive.

Managed nodes configuration
Managed nodes used in the study were Dell PowerEdge servers running in Dell's production environment. They represented a broad range of operating systems such as Novell® NetWare® , version 4.1x-5.1, Microsoft Windows 2000 SP1 and SP2, Windows NT®  4.0/SP6a, and Red Hat® Linux®  7.0. These had various versions of the Dell OpenManage Server Instrumentation ranging from Hardware Instrumentation Package (HIP) 3.5.2 through Server Agent 4.22.

Network configuration
The IT Assistant Server used in this scalability study was located in a 10/100 Mbps laboratory environment on a non-optimized network (fortunately, the effective throughput of IT Assistant was never greater than 10 Mbps at any one time). The managed client nodes, dispersed across several campuses, resided on a fully redundant Metropolitan Area Network (MAN). Figure 1 shows the network configuration.

Figure 1. Network configuration
Figure 1. Network configuration

IT Assistant configuration
After IT Assistant is installed, it prompts users to set up default discovery. In the Dell testing, the polling interval was shortened to 10 minutes.

The second panel that appears defines the subnets to be probed for systems, but this panel was not used (push Cancel) for the tests because of the large number of nodes to be entered. Instead, the tests utilized subsets of an exported listing of all discovered servers from a previously invoked copy of IT Assistant to build the test database. Hosts (subnets) were entered using either Microsoft Access to copy rows to the Discovery Configuration Table or by using the Database Management (dcdbmng.exe) program provided with IT Assistant to back up and restore tables within the database. In a typical installation, dcdbmng.exe is found in C:\Program Files\Dell\OpenManage\IT Assistant\bin. This program also backed up the database at each test point.2

To ensure that only the default settings of the Desktop Management Interface and Simple Network Management Protocol polling were enabled (see Figure 2 ), the Dell team ran ConfigServices.exe. This executable is found in C:\Program Files\Dell\OpenManage\ IT Assistant\bin.

Figure 2. Configuring IT Assistant
Figure 2. Configuring IT Assistant

The team did not perform any further custom configuration of IT Assistant, such as event filters or actions. Both Microsoft Data Engine (MSDE) and SQL Server 7 were potential database engines, but the team chose SQL Server 7 for IT Assistant because of its potential for scalability.3 For environments with less than 500 nodes, MSDE is a satisfactory solution but it was not tested as part of this study.

Test measurements

One objective of the test was to determine the number of systems that IT Assistant can manage. Since the predominant activity is polling systems for status, the logical measurement was the time required for all the nodes within the database to be updated.

In practical terms, this time represents the minimum time that could be set for the status-polling interval and the maximum time that would elapse before a server is detected as not responding. This would be a worst-case situation because most errors are detected and immediately reported to IT Assistant as either SNMP traps or DMI indications.

The expectation is that it requires more time to poll more systems; therefore, the Dell team measured IT Assistant configurations of 500, 1000, 2000, and 3000 nodes. They chose a status-polling interval of 10 minutes because even the 3000-node configuration easily completed within that time.

Once the initial discovery of systems occurs, inventory information is relatively slow to change. This allows off-loading those tables and the associated queries to another server running SQL Server, Microsoft Access, or even Microsoft Excel if the load becomes too great. Off-loading could be as easy as downloading all the records into an Excel spreadsheet or restoring the tables and records to a different machine. Since queries can be handled in this manner, they were not characterized in this study.

The effects of user interaction via the Web interface also were not measured because typical use is in response to a problem, which justifies resources required of the machine.

The second objective of the test was to determine the resources required to manage the various numbers of systems. Information provided below should help decision makers determine whether a separate machine is required or if an existing machine can be used with additional disks and RAM.

The perfmon tool, included with Windows 2000 and available to everyone at no additional cost, captured performance data every five seconds in a binary Counter Log and included CPU, memory, disk, and network activity.4

Test results for IT Assistant

The CPU utilization and network traffic over 60 minutes, shown in Figure 3 , indicates a clear pattern in which the polling activity starts every 10 minutes and completes in less than five minutes. Since these tests used production systems, IT Assistant received and stored traps/indications into the database while measurements were being taken. This represented additional activity of 1,800-8,000 events per day or three to four per server.

Figure 3. Processor utilization over 60 minutes
Figure 3. Processor utilization over 60 minutes

Figures 4 and 5 show the peak and average CPU utilization, respectively.

Figure 4. Peak CPU utilization
Figure 4. Peak CPU utilization

Figure 5. Average CPU utilization while polling
Figure 5. Average CPU utilization while polling

The result was still impressive with only two to five minutes for a complete status poll of 500 to 3,000 servers (see Figure 6 ). Results should be even better in an environment where both desktops and servers are polled because of the lighter user load and fewer instrumentation points for desktops.

Figure 6. Polling duration time
Figure 6. Polling duration time

Figure 7 shows that each node sent 20 KB of data during a poll. Closer inspection of the Network Monitor in Windows 2000 Server showed that most nodes used the DMI protocol for communication. However, when the nodes used SNMP, the number of bytes received was much less: 1 KB to 2 KB.

Figure 7. Kilobytes per node
Figure 7. Kilobytes per node

For this study, the predominance of DMI resulted from the large number of systems with older DMI-only agents on the managed nodes. Figure 8 shows the total number of kilobytes received. For future reference, note that network traffic will be reduced because the next release of the Dell OpenManage Server Instrumentation will use SNMP and Common Information Model (CIM) only.

Figure 8. Average kilobytes received
Figure 8. Average kilobytes received

Storing events significantly increases the amount of disk space needed for the IT Assistant database. For example, with 2,841 nodes and 5,914 events in the database, SQL Server Enterprise Manager shows that the IT Assistant database uses 27 MB from the allocated 39 MB. After deleting the events and compressing the data, only 17 MB were used. Some quick calculations provide an estimate of the bytes used:

10 MB/5914 = 1.7 KB per event

17 MB/2841 = 6 KB per node

Even 3,000 nodes in the database did not stretch either the memory or disk subsystems. Memory use ranged from 250 MB for 500 nodes to 400 MB for 3,000 nodes. Disk I/O varied from six to eight transfers per second at peak usage with the corresponding average transfers of one to two per second.

Since the 3,000 nodes required less than five minutes to complete, adding additional nodes should only require increasing the polling interval to allow enough additional time. Clearly, the number of systems that can be managed will be determined more by the people and processes using the tool than with the performance of the tool itself.

The server resources necessary to manage these systems can be summed up in a few rules of thumb:

  • 20 KB+ of data are received per node per polling interval
  • 20-30 percent average CPU utilization during polling with 2-3x peaks
  • 512 MB total RAM adequately served the monitoring of 3,000 nodes
  • Database size = Base install + 6 KB per node + 2 KB per event

IT Assistant makes managing PowerEdge servers easy

Managing Dell servers is easy and affordable because every PowerEdge server includes Dell OpenManage IT Assistant. Since IT Assistant can successfully manage and monitor the hardware of thousands of end nodes, companies can rely on IT Assistant as they continue to grow.

Roger Foreman (roger_foreman@dell.com) is a senior consultant on Dell's Solution Enablement Labs and Technology Showcase team. Roger holds a B.S. and M.S. in Electrical Engineering from Iowa State University, as well as an M.S. in Computer Science from the University of Arizona. He is a Microsoft Certified Systems Engineer (MCSE).

Sandra D. Woodcock (Sandra_woodcock@dell.com) is a systems engineer in the IT Infrastructure Systems Engineering Tools and Architecture Group at Dell. She has worked in the IT industry for 15 years. Prior to joining Dell, she served as an IT consultant for the Deloitte & Touche International Consulting Group in Houston, Texas. Sandra is a Certified NetWare Engineer and a Microsoft Certified Professional.

© 2010 Dell | About Dell | Terms of Sale | Unresolved Issues | Privacy | About Our Ads | Dell Recycling | Contact | Site Map | Feedback

snWW01