PERC 3/QC Array Storage Controller
By Mike Molloy, Ph.D. and Serdar Acir (Issue 3 2001)
PERC 3/QC is the newest member of the Dell PowerEdge line of RAID controllers. This article describes the results of several tests exploring and comparing the performance of the PERC 3/QC array controller and the PERC 2 controller. These tests measured PERC 3/QC performance with different workloads, RAID levels, disk speeds, and array sizes, but were limited to configurations that PERC 2 could also support.
The PowerEdge® Expandable RAID Controller, version 3, Quad Channel (PERC 3/QC) controller is a Peripheral Component Interconnect (PCI)-to-SCSI host adapter with RAID capabilities. PERC 3/QC provides reliable, high-performance, and fault-tolerant disk subsystem management. Designed for the internal storage of Dell® workgroup, departmental, and enterprise servers, PERC 3/QC offers a cost-effective method of implementing a RAID solution.
The PERC 3/QC card provides four SCSI channels. Throughput on each SCSI channel can reach up to 160 MB/sec. PERC 3 supports both a Low Voltage Differential (LVD) SCSI bus and a single-ended SCSI bus. The LVD feature supports cables up to 12 meters long.
PERC 3/QC features include:
- Ultra3 LVD SCSI providing 160 MB/sec maximum transfer rate per channel (burst mode)
- 64 MB to 128 MB removable and battery-backed error-correction coding (ECC) synchronous dynamic RAM (SDRAM) cache with Dual Inline Memory Module (DIMM)
- 64-bit PCI host interface
- Onboard Intel® i960® processor to improve controller performance and off-load host CPU
- Two internal and four external connectors
- Support for RAID levels 0 (striping), 1 (mirroring), 5 (distributed parity), 10 (combination of striping and mirroring), and 50 (combination of striping and distributed parity)
- Advanced array configuration and management utilities
- Battery backup for up to 72 hours
- Support for up to 12 SCSI drives per channel using the Dell PowerVault® 210S or 211S storage systems
Although the disk subsystem is rated for high throughput, deployed configurations rarely achieve maximum throughput rates. Due to the random nature of the typical server workload, disk subsystem performance is not usually limited by the bandwidth of the drive interface, but by the seek time on the drives themselves. Since servers usually process requests from many clients at once, the drives spend the most time seeking out the next piece of information rather than transferring large pieces of continuous information.
Test configuration and benchmark tool
We conducted several performance tests on a PowerEdge 6350 with four Pentium® II XeonTM 500 MHz processors with 1 MB L2 cache, 512 MB of memory, and two 64-bit/33 MHz buses. The server ran a Windows® 2000 Server platform.
The PERC 3/QC device driver was not specially optimized or tweaked for performance. Although PERC 3/QC reaches 66 MHz, it ran with a 33 MHz PCI bus. However, because the random workloads tested involved low throughput, we expected the PCI bandwidth limitation to have minimal impact.
We also used up to four PowerVault 210S external SCSI storage units, which support up to 12 one-inch, hot-pluggable SCSI drives on each channel. With the external storage module (ESM), the 210S supports Ultra3 SCSI. Figure 1 illustrates the test configuration.
Figure 1. Test configuration
We used the Intel® Iometer version 98.10.8 benchmark tool for these tests. Iometer includes both an I/O workload generator and a performance analysis tool, and is used on either single or clustered servers. The workload generator accurately and reliably reproduces a predefined workload, while the analysis tool gathers I/O performance data such as throughput, latency, and CPU utilization.Iometer can be configured to simulate the workload of any type of application or benchmark, or used to create a completely artificial workload that stresses the system in specific ways.
Iometer can be used to measure and characterize:
- Performance of disk and network controllers
- Bandwidth and latency capabilities of buses
- Network throughput of attached drives
- Shared bus performance
- System-level hard drive performance
- System-level network performance
Running the same workload on multiple system configurations can determine the strengths and weaknesses of each system and help users select the best configuration for particular needs. We used the standard suite of workload simulations included with this version of Iometer. This includes workloads emulating online transaction processing (OLTP), file servers, Web servers, and media-streaming servers.
The multithreaded workload generator simulates multiple client programs simultaneously. The individual threads are known as workers. Because the test server has four processors, Iometer was configured to run four workers per target array for all of the tests. We chose this configuration to most accurately reflect the way multiprocessor servers access the disk subsystem.
Testing RAID levels 0, 5, and 10
We tested the throughput of PERC 3/QC with several different RAID configurations and up to 32 disks. The RAID configurations tested were RAID-0, RAID-5, and RAID-10. The workload was an OLTP environment with an I/O block size of 8 KB. The Iometer 8 KB OLTP workload consists of 8 KB random transfers in a read to write ratio of 2:1. Figure 2 shows the results, given in I/Os per second (IOPS).
Figure 2. I/O throughput with several RAID levels in an OLTP environment
RAID-0. Although RAID-0 is less reliable than the other RAID levels, we noted that it performed better than RAID-10 in a random 8 KB OLTP environment.
RAID-5. While the performance of RAID-5 in an OLTP environment was much lower than that of RAID-10, RAID-5 is less expensive to implement. RAID-5 tends to perform better in read-intensive environments that involve fewer write penalties.
RAID-10. RAID-10 seems to be less frequently used than RAID-5, but in many cases RAID-10 is a better choice. We observed that OLTP performance was much better for RAID-10 than for RAID-5. With a small number of disks in the volume, RAID-10 provides nearly twice the OLTP disk performance of RAID-5.
Comparing PERC 3/QC and PERC 2/QC
Next, we compared the performance of PERC 3/QC with the performance of PERC 2/QC running both a sequential workload and a random OLTP workload. PERC 3/QC performance outpaced PERC 2/QC performance running a sequential workload due to the higher bandwidth SCSI technology (see Figure 3 ). With a RAID-0 array consisting of eight disks on a single channel, both controllers reached roughly half of the maximum theoretical throughput running four worker threads. Both controllers might have reached higher throughput rates if they had run a single worker thread because there would have been no disk contention between threads, thus making the physical disk access more sequential.
Figure 3. Controller throughput for 64 KB sequential reads
Figure 4 compares PERC 3/QC and PERC 2/QC throughput in a random 8 KB OLTP environment with RAID-0. While PERC 3/QC and PERC 2/QC performed similarly in configurations with a small number of disks, PERC 3/QC performed much better with larger numbers of disks.
Figure 4. Controller throughput for random 8 KB OLTP environment
The types of workloads tested explain these performance characteristics. A sequential workload physically locates the information on the disk in a sequential pattern. Less waiting for the disk hardware to seek out the next part of the disk to access makes sequential workloads faster.
Random disk access, typical of many server applications, involves many users accessing information located in different places on the disk at the same time. A typical OLTP environment is an application built around a database where data is added, updated, or deleted in real time by small I/O blocks. In the 8 KB OLTP scenario, the disk must spend a lot of time rotating and moving the heads to access the correct part of the disk. This overhead is a part of the drive's seek time and accounts for much of the difference between performance of the sequential workload shown in Figure 3 and the random workload shown in Figure 4 .
Comparing disk speeds
The next test compared the performance of an array with 10,000-rpm Ultra2 drives with an array of 15,000-rpm Ultra3 drives. The array tested had 32 drives across four channels, configured for RAID-0. Again, we used the workloads included with Iometer. Figure 5 shows the results.
Figure 5. Disk comparison results
For the random workloads, the performance gains of the Ultra3 disks ranged between 38 percent and 44 percent. These gains result primarily from the decreased seek time associated with the Ultra3 disks' higher rotational speed. For the more sequential, 64 KB streaming read workload, the rotational speed also boosted the transfer rate. In these tests, the transfer rates still fell far below the PERC 2/QC limitation of 80 MB/sec.
Exploring disk subsystem recovery and rebuilding
In capacity planning, administrators must ensure that the system can sustain the maximum load under the most adverse conditions. One condition that adversely affects performance is a disk failure in a redundant array, resulting in a degraded array. In a RAID-5 array, the system uses distributed parity information to dynamically re-create the original data as it is requested by the processor subsystem.
The PERC 3/QC controller allows dynamic rebuilding of a degraded array to restore it to a healthy state. Rebuilding a failed drive can be initiated automatically with a hot spare in the system or by replacing the failed drive. Because rebuilding is resource intensive, a rebuild rate is specified to balance the speed of the rebuild with the performance of the system during the rebuild. The default rebuild rate is 30 percent, which favors application performance over rebuilding.
In our tests, we examined the disk subsystem performance of a 12-disk, RAID-5 array under healthy, degraded, and rebuilding conditions. Running the 8 KB OLTP workload, we found that the healthy volume performed 120 IOPS, whereas the degraded volume performed 100 IOPS, and the rebuilding volume dropped to only 20 IOPS. We recorded a rebuild time of 2.5 hours under this workload with the default rebuild rate. Rebuild time varies with the amount of data, rebuild rate, size of the array, RAID level, and disk speed, but usually takes several hours. See Figure 6 .
Figure 6. Degraded and rebuilding array performance results
Automatic rebuilding can cause problems if administrators do not anticipate performance degradation. Rebuilding a degraded array during off-peak hours or otherwise off-loading server traffic during rebuilding usually helps maintain acceptable performance.
Next-generation controller improves performance
The PERC 3/QC controller, Dell's four-channel Ultra3 SCSI RAID controller, takes the PERC family of controllers to a new level. PERC 3/QC performs up to 50 percent better than PERC 2/QC with sequential workloads that take advantage of its high-performance Ultra3 SCSI technology. For random workloads common in file and print, database, and messaging environments, PERC 3/QC performance offers a significant improvement over PERC 2/QC.
Mike Molloy, Ph.D. (firstname.lastname@example.org) is a senior manager on the System Performance Team at Dell and has held professorships at the University of Texas and Carnegie Mellon University. He has been active in computer performance for over 20 years and has served as chairman of the Association for Computing Machinery (ACM) special interest group on computer performance. Mike has published many papers and most recently a book, Fundamentals of Computer Performance Modeling. He received a Ph.D. in Computer Science from the University of California at Los Angeles.
Serdar Acir (serdar_acir @dell.com) is a development engineer and advisor on the System Performance Team at Dell. His area of specialization is RAID and disk controller technologies. Serdar holds an M.S. in Computer Engineering with a specialization in Distributed Computing from Duke University.
For more information
PERC 3/QC: www.dell.com
Lower level interfaces: www.t10.org