Understanding RAID-5 and I/O Processors
By Paul Luse (May 2003)
Intelligent RAID is becoming popular for general-purpose file and Web servers because it improves data protection as well as performance. This article discusses how a RAID-5 I/O processor subsystem can off-load interrupts from the host CPU and significantly improve the performance of specific applications using exclusive OR (XOR) operations.
Effective data storage is a critical concern in enterprise computing environments, and many organizations are employing RAID technology in server-attached, networked, and Internet storage applications to enhance data availability. Understanding how intelligent RAID technology works can enable IT managers to take advantage of the key performance and operating characteristics that RAID-5 controllers and arrays provide—especially the I/O processor subsystem, which frees the host CPU from interim read-modify-write interrupts. In addition, intelligent RAID boosts performance using exclusive OR (XOR) operations that are not available in RAID-0 and RAID-1.
The most common RAID implementations are host-based, hardware-assisted, and intelligent RAID. Host-based RAID, sometimes called software RAID, does not require special hardware. It runs on the host CPU and uses native drive interconnect technology. The disadvantage of host-based RAID is the reduction in the server's application-processing bandwidth, because the host CPU must devote cycles to RAID operations—including XOR calculations, data mapping, and interrupt processing.
Hardware-assisted RAID combines a drive interconnect protocol chip with a hardware application-specific integrated circuit (ASIC), which typically performs XOR operations. Hardware-assisted RAID is essentially an accelerated host-based solution, because the actual RAID application still executes on the host CPU, which can limit overall server performance.
Intelligent RAID creates a RAID subsystem that is separate from the host CPU. The RAID application and XOR calculations execute on a separate I/O processor. Intelligent RAID-5 implementations cause fewer host interrupts because they off-load RAID processing from the host CPU.
Defining RAID levels
Several types, or levels, of RAID exist. Each offers a unique set of performance and data-protection characteristics. One key concept in RAID is abstraction, which is the practice of hiding the details of an implementation to provide simplicity at a higher layer. For example, the RAID controller combines multiple disk drives into a disk array. In turn, the disk array appears to the host as a single logical disk. Two RAID levels provide the foundation for understanding how RAID works:
- RAID-0: This RAID level uses disk striping, in which the RAID subsystem lays out the data across a number of disks in stripes. This arrangement takes advantage of parallel processing over all the disks. A RAID-0 array is much faster for both reads and writes than a single disk because all disks work at the same time. RAID-0 is typically used for applications in which performance requirements outweigh data-protection requirements.
- RAID-1: Commonly referred to as mirroring, RAID-1 essentially duplicates the data over two or more disks. RAID-1 is typically used for applications in which data protection is more important than performance. A RAID-1 array provides faster reads than a single disk, but write performance is slower. RAID-1 is often used to mirror the operating system boot volume of a system server, where protecting the operating system drive is critical.
Explaining how RAID-5 works
RAID-5 protects the data for a number of disks using a single disk that is the same size as the smallest disk in the array. For example, a Web server uses five disks in an application, and failure of one disk must not cause server downtime. Assuming that each disk is 72 GB, the total usable capacity for a five-disk RAID-5 array would be 288 GB. The usable capacity of a RAID-5 array equals s(n - 1), where s is the capacity of the smallest disk in the array and n is the total number of disks in the array.
In the preceding example, a single 72 GB disk can protect all the data in the 288 GB array should one of the other disks fail. As another example, in a 15-disk array of 72 GB disks, a single 72 GB disk can protect the entire 1008 GB array.
RAID-5 provides an efficient way to protect data and achieves read performance similar to RAID-0. Meanwhile, write performance of the RAID-5 array is nearly as fast as a single disk. Because it protects data and boosts performance, RAID-5 is becoming popular for general-purpose servers such as file and Web servers.
Calculating XOR functions A single disk can protect the data on any number of other disks by performing the simple Boolean XOR operation. XOR is both an associative and commutative operation, which means that neither the order nor the grouping of the operands affects the results. XOR is also a binary operation and has only four possible combinations of two operands. Two operands have a "true" XOR result when one and only one operand has a value of 1.
Implementing the XOR function in dedicated hardware, which can be an XOR ASIC or an I/O processor with integrated XOR functionality, greatly increases the throughput of data requiring this operation. Every byte of data stored to a RAID-5 volume requires XOR calculations. Understanding how an XOR operation works is critical to understanding how RAID-5 can protect so much data with so little additional disk capacity.
In Figure 1 , each Dn represents a chunk of data, often referred to as a strip. All of the strips across a row are referred to as a stripe. In RAID-5, parity data is located in a different stripe on each disk, a concept called parity rotation. Implemented for performance reasons, parity rotation introduces a data element that represents the parity data: Pn, where n is the stripe number for which the parity data is stored. Parity data is simply the result of an XOR operation on all other data elements within the same stripe. Because XOR is an associative and commutative operation, administrators can find the XOR result of multiple operands by first performing the XOR operation on any two operands-then performing an XOR operation on the result with the next operand, and continuing to perform the XOR operation on all the operands until the final result is determined.
Figure 1. Data map of a typical four-disk RAID-5 array
A RAID-5 volume can tolerate the failure of any one disk without losing data. Typically, when a physical disk fails, such as physical disk 3 in Figure 2 , the disk array is considered degraded. The missing data for any stripe is easily determined by performing an XOR operation on all the remaining data elements for that stripe. In live implementations, each data element would represent the total amount of data in a strip. Typical values range from 32 KB to 128 KB. Figure 2 shows the array with arbitrary data values, assuming that each element represents a single bit. Parity for the first stripe is P1 = D1 XOR D2 XOR D3. The XOR result of D1 and D2 is 1, and the XOR result of 1 and D3 is 0. Thus P1 is 0.
Figure 2. Data map of a four-disk RAID-5 array with arbitrary data values
If the host requests a RAID controller to retrieve data from a disk array that is in a degraded state, the RAID controller must first read all the other data elements on the stripe, including the parity data element. It then performs all the XOR calculations before it returns the data that would have resided on the failed disk. The host is not aware that a disk has failed, and array access continues. However, if a second disk fails, the entire logical array will fail and the host will no longer have access to the data.
Most RAID controllers will rebuild the array automatically if a spare disk is available, returning the array to normal. In addition, most RAID applications include applets or system management hooks that notify system administrators when such a failure occurs. This notification allows administrators to rectify the problem before another disk fails and the entire array goes down.
Executing read-modify-write operations The RAID-5 write operation is responsible for generating parity data. This function is typically referred to as a read-modify-write operation. Consider a stripe composed of four strips of data and one strip of parity. Suppose the host wants to change just a small amount of data that takes up the space on only one strip within the stripe. The RAID controller cannot simply write that small portion of data and consider the request complete. It also must update the parity data, which is calculated by performing XOR operations on every strip within the stripe. So parity must be recalculated when one or more strips change.
Figure 3 shows a typical read-modify-write operation in which the data that the host is writing to disk is contained within just one strip, in position D5. The read-modify-write operation consists of the following steps:
Figure 3. Step by step: read-modify-write operation to a four-disk RAID-5 array
- Read new data from host: The host operating system requests that the RAID subsystem write a piece of data to location D5 on disk 2.
- Read old data from target disk for new data: Reading only the data in the location that is about to be written to eliminates the need to read all the other disks. The number of steps involved in the read-modify-write operation is the same regardless of the number of disks in the array.
3. Read old parity from target stripe for new data: A read operation retrieves the old parity. This function is independent of the number of physical disks in the array.
- Calculate new parity by performing an XOR operation on the data from steps 1, 2, and 3: The XOR calculation of steps 2 and 3 provides the resultant parity of the stripe, minus the contribution of the data that is about to be overwritten. To determine new parity for the D5 stripe that contains the new data, an XOR calculation is performed on the new data read from the host in step 1 with the result of the XOR procedure performed in steps 2 and 3.
- Handle coherency: This process is not detailed in Figure 3 because its implementation varies greatly from vendor to vendor. Ensuring coherency involves monitoring the write operations that occur from the start of step 6 to the end of step 7. For the disk array to be considered coherent, or "clean," the subsystem must ensure that the parity data block is always current for the data on the stripe. Because it is not possible to guarantee that the new target data and the new parity will be written to separate disks at exactly the same instant, the RAID subsystem must identify the stripe being processed as inconsistent, or "dirty," in RAID vernacular.
- Write new data to target location: The new data was received from the host in step 1; now the RAID mappings determine on which physical disk, and where on the disk, the data will be written.
- Write new parity: The new parity was calculated in step 4; now the RAID subsystem writes it to disk.
- Handle coherency: Once the RAID subsystem verifies that steps 6 and 7 have been completed successfully-and the data and parity are both on disk—the stripe is considered coherent.
In the Figure 3 example, assume that D = 0, D = 1, and P = 0. Processing step 4 on this data yields 0 1 0 = 1. This is the resultant parity element P. After the read-modify-write procedure, the second row in Figure 3 will be D = 1, D = 0, P = 1, and D = 0.
This optimized method is fully scalable. The number of read, write, and XOR operations is independent of the number of disks in the array. Because the parity disk is involved in every write operation (steps 6 and 7), parity is rotated to a different disk with each stripe. If all the parity were stored on the same disk all the time, that disk could become a performance bottleneck.
Off-loading host interrupts An interrupt is a request from a system component for CPU time. I/O subsystems generate a host CPU interrupt when they complete an I/O transaction. The following is a comparison of how the different RAID implementations generate interrupts for a simple one-bit write to a four-disk RAID-5 array:
- Host-based RAID: The host is responsible for mapping the data to various disks, so the host must generate each read and write required to perform the read-modify-write operation. As a result, the host CPU should receive four completion interrupts from the subsystem, consisting of two reads and two writes (steps 2, 3, 6, and 7 in the Figure 3 example).
- Hardware-assisted RAID: This approach generates the same four completion interrupts as host-based RAID because it is associated with only an XOR ASIC.
- Intelligent RAID: The I/O processor in an intelligent RAID subsystem typically has the ability to hide the interim read and write operations from the host by using various integrated peripherals. In an I/O processor-based subsystem, only one completion interrupt is sent to the host. The I/O processor handles all the other interrupts, freeing the host CPU to perform non-RAID-related tasks.
Implementing intelligent RAID features
Different vendors implement various types of RAID, which provide many different functions. Because RAID-5 combines both data protection and performance, it is becoming popular for general-purpose servers such as file and Web servers. IT managers who develop a working understanding of RAID-5 can take advantage of the key features that intelligent RAID controllers and arrays provide. Hardware-based XOR operations offer a significant performance boost in RAID-5 writes and degraded RAID-5 reads over implementations that rely on software to perform the XOR operation.
A detailed understanding of RAID-5 operations can help IT managers evaluate the importance of off-loading interrupts in specific applications. In situations where the host CPU must be freed of the interim interrupts that read-modify-write operations generate, such as general-purpose application servers, intelligent RAID is essential.
Paul Luse (firstname.lastname@example.org) is a senior software architect specializing in RAID development at the Intel Communications Group Storage Components Division. He has eight RAID-related U.S. patents pending.
FOR MORE INFORMATION
Intel I/O processors: http://www.intel.com/design/iio
Intel IOP321 processor: http://www.intel.com/design/iio/docs/iop321.htm
INTELLIGENT RAID-5 CONTROLLERS FROM DELL
The Intel® IOP321 I/O processor, based on Intel XScaleTM technology, provides intelligent RAID functionality in the DellTM PowerEdgeTM Expandable RAID Controller Dual Channel Integrated (PERC 4/Di). The IOP321 processor performs RAID-5 operations on PowerEdge 2600 servers, freeing the host CPU to focus on more application-centric, non-RAID-related tasks. The PERC 4/Di and PERC 3/Di also feature the Intel 80303 intelligent RAID-5 I/O processor.
Designed for I/O-intensive applications, the Intel IOP321 subsystem incorporates an application accelerator that performs hardware-based XOR operations, as well as a 1 KB queue to speed up RAID-related parity calculations. The application accelerator also expedites the transfer of read and write data to the memory controller, and computes data parity across local memory blocks. The IOP321 processor works on server platforms based on the Intel Pentium® Intel XeonTM , and Intel Itanium® processors.