Dell Power Solutions

Dell Power Solutions

Dell Magazines

Dell Magazines

Dell Power Solutions

Dell Power Solutions
Subscription Center
Advertise
Submit an Article
Magazine Extras

Dell Insight

Dell Insight Archives

Comparing Disk-based and Tape-based Backup Methods

By Claude Bouffard (May 2003)

As disk-based backup and restore technology becomes more affordable, businesses can benefit from combining the speed and functionality of disk with their existing tape systems. This article describes disk and tape backup methods and compares performance test results of both.

Many enterprise applications require frequent retrievals of recently captured data. Business-crucial data must be backed up, preserved, and retrieved quickly and efficiently. For many years, tape has been the standard data backup medium because of its cost-effectiveness. However, storage backup solutions that incorporate disk-based backups yield significant benefits compared to traditional backup to tape.

Disk-based storage systems offer quicker data backup and restore and a more reliable backup medium. In most situations, backups to disk are faster than backups to tape when comparing raw throughput performance. Faster backup shortens backup windows, helping businesses meet their availability commitments. Faster restore times help businesses resume operations more quickly.

The Advanced Technology Attachment (ATA) specification builds controller electronics into the disk drive itself. ATA-based backup narrows the price gap between tape and disk, and offers the performance benefits of disk. Backup arrays or disk-based backups will not replace tape completely, but their increasing use has started to shift tape into a chiefly archival role.

Tracing the evolution of tape backup systems

Tape's primary advantages are cost and its ability to be used as removable media. A relatively inexpensive (compared to the price of the tape drive itself) tape cartridge can be inserted, filled with data, and replaced with another cartridge, allowing the tape drive to write an infinite amount of data. However, this movement of cartridges in and out of the drive results in costly manual intervention. Tape libraries automate this mechanical process.

In first-generation tape libraries, all devices (the robot and all tape drives) were connected to one system through which all data written to the tape drives needed to pass. In this model, one large system (the backup server) handles heavy I/O and the CPU load of de-packetizing the backup data, which arrives inside network packets.

As backup models matured, backup software products began allowing tape library devices housed in one library to be connected to different systems. In a common configuration, the autochanger and one or more tape drives are connected to the backup server, and one or more of the remaining tape drives are connected to one or more other systems. These systems—either database or application servers—have significant amounts of data that must be backed up.

The systems and the backup server can also perform backup tape writing for other systems that send their data through an IP network. The chief limitation in this model is that the tape drives are statically assigned to the systems. Using the tape drive to back up another system's data requires that the data be transferred over an IP network, which places a substantial burden on both the sending and receiving systems.

Storage area networks (SANs) allow many host systems to access the same tape and disk devices. Multiple host systems can write to the same tape drive. Backup software products have become more sophisticated; the backup server functions as a traffic director, ensuring that only one system at a time writes to a tape drive. Tape drives can be virtually moved from host system to host system as needed to perform backups or restores.

Introducing the advantages of disk backup systems

Disk-based storage systems, such as the Dell|EMC storage systems, are faster than tape-based systems that use the latest technology, such as SuperDLT (SDLT) and Linear Tape-OpenTM  (LTO® ). Some tape technologies respond to a minimal data stream by shoe-shining, that is, excessive positioning. This behavior can cause damage to the tape and can significantly reduce a tape drive's performance. Because disks employ random access techniques, they do not experience this excessive positioning.

Performance of any tape device that uses hardware compression depends on the compressibility of the data to achieve maximum performance; otherwise, performance will suffer. Also, if the transfer rate of a tape drive is faster than the host data rate, the tape must stop and reposition frequently, degrading performance.

Faster recovery. Disk drives have faster recovery times than tape drives-seconds or minutes versus hours. Also, disks support random and sequential access whereas tapes support only sequential access. Supporting both types enables faster access of data files on disk, improving overall performance.

Restoring data from several tape cartridges requires many time-consuming steps. First, the library must mount each tape; the time elapsed is up to one minute per tape. Then the tape must load, consuming another 30 seconds to a few minutes. The tape must be positioned to the desired data; an average access time is a few minutes. Next, the tape must be rewound and unloaded, which takes 30 seconds to a few minutes. After the tape is loaded, the cycle repeats. Time to first byte (TTFB) is seconds to minutes for tape, but only milliseconds for disk.

Media reliability and data availability. Disk system RAID protection enhances data availability and prevents data loss during disk drive failures, whereas tape-specific media errors can be common. Maintaining the set of tapes from a tape library requires properly trained personnel. By using disk-based storage, IT organizations can reduce or eliminate tape handling.

Overall IT efficiency. Because RAID protection makes disks inherently more reliable, fewer full backups must be performed when using disks, saving network and CPU load. Tape technology and tape data formats typically change every few years, which forces IT organizations to convert to new media. Disk technology does not undergo such transitions because the data format does not change. Additionally, new, larger capacity disk drives can reduce floor space requirements compared with tape libraries of equivalent capacity.

Combining the best of disk and tape backups

Initially, backup software programs' implementation of backup to disk was not as complete as backup to tape, primarily because of the price of disk storage. The relatively high cost of disk compared to tape made backups to disk unaffordable in most situations. Some backup software products, however, used disk as an intermediary medium: the initial backup performed during the backup window occurred from disk to disk; then, at some later time, the backed-up data was moved to tape.

This method has several advantages, particularly when incremental or differential backups are performed. Because such backups capture only new and changed data, the backup application can spend a considerable amount of time looking through the data before finding something that needs to be backed up. Meanwhile, the backup device is idle and not available for other purposes. Many tape drives do not perform well when subjected to this alternating idle-busy-idle activity. Disk drives, on the other hand, were designed for exactly this kind of use and perform much better as receivers of incremental or differential data.

ATA technology
Combining the benefits of disk storage with the cost-effectiveness of tape, ATA disk technology enables users to keep more data online for longer periods of time. Dell|EMC storage systems can include high-capacity ATA drives and high-performance Fibre Channel drives in the same storage array under common management. This single-array implementation provides significant deployment flexibility.

Comparing disk-based and tape-based configurations

In both backup-to-tape and backup-to-disk environments, restoring the data involves a bulk movement of data from the backup medium to the destination disk. Although all system components have increased in speed tremendously over the years, so too has the size of data sets. Performing a bulk restore of data can still consume several hours.

Greater capabilities through RAID
Using Dell|EMC enterprise-class storage arrays as backup destinations can overcome many disadvantages of tape and provide more capabilities than a group of hard disks not set up in a RAID configuration. RAID helps to protect data from the failure of a single disk drive. The snapshots, mirrors, and clones available through array configurations can provide rapid and near-instant backups and restores. Storage array disks also can be used in replica-based backups.

Concurrent backup streams through multiplexing
The scenarios discussed so far have assumed that a single system can write a single backup stream to a single device. This procedure is often undesirable, especially in cases where a particular backup stream (for whatever reason) runs slowly. To maximize the investment in the backup device, most enterprise-class backup software products can write multiple backup streams to the same device concurrently, a process called multiplexing. This process enables a host system with four data disks to back up all four disks simultaneously to the same output device.

Multiplexing is particularly valuable when network backups over a relatively slow network of relatively slow host systems are sent to a relatively high-speed tape device. Now, with the advent of SANs and faster host systems, the output device can become the bottleneck. Backup to disk has the system throughput to eliminate this bottleneck, improving overall system performance.

Although multiplexing allows faster writing to tape or disk, restoring multiplexed data from tape can be much slower than from disk. Because tape is a sequential-access medium, performing both a backup and a restore using the same tape drive at the same time is impossible. So, if a restore must use a tape that is already in use for a backup, either the restore must wait for the backup to complete, or the backup must be aborted. Because disk is a random-access medium, backups and restores can use the same disk-based backup device simultaneously.

Testing backup and restore times

An EMC®  team of engineers conducted performance testing to compare backup-to-disk and backup-to-tape implementations. Testing measured both the throughput performance and overall time to complete a backup or restore (see Figure 1 ).

Figure 1. Total elapsed time: disk-to-disk restore
Figure 1. Total elapsed time: disk-to-disk restore

The testing team used a typical scenario in which a subset of data (1.5 GB) is requested for restoration. The data resided on three separate tapes. The disk-to-disk restore took approximately 45 seconds, and the tape-to-disk restore took approximately 12 minutes, 45 seconds. In the latter case, only 8 percent of the overall time was spent actually transferring data; the remaining 92 percent of the time was spent performing tape mechanical movement, file accessing, and loading and unloading cartridges.

Comparing Dell|EMC storage array to native tape drive performance
The testing team also compared throughput results for disk and tape devices using both Fibre Channel and serial ATA disks while backing up and restoring. The team measured throughput performance of the Dell|EMC CX200, CX400, and CX600 networked storage systems, and tape drives using LTO, SDLT 220, and SDLT 320 technology. Figure 2 compares backup results among the drives using two-to-one data.

Figure 2. Throughput for Dell|EMC storage arrays versus tape drives
Figure 2. Throughput for Dell|EMC storage arrays versus tape drives

Performance summary. Dell|EMC storage arrays with ATA can accomplish backups in one-third less time and restore in 80 percent less time than tape in typical environments. Major advantages of backup to disk include the following:

  • Faster backup performance
  • Faster restore performance
  • Enhanced media reliability and data availability
  • Improved IT efficiency
  • Elimination of tape positioning, tape errors, and other mechanical issues
  • Improved backup reliability

Backing up and restoring data quickly and cost-effectively

Backup-to-disk is emerging as a technology that offers significant benefits over the traditional tape backup process. With the changing economics of disk technology, backup-to-disk solutions are now affordable. Early adopters are implementing backup-to-disk solutions as improvements to their existing tape implementations, and many backup software applications support backup-to-disk functionality. By capitalizing on the speed, reliability, and flexibility of disk-based storage systems, businesses can help minimize the downtime related to backups and restores.

Dell|EMC products provide a highly available architecture, unique data integrity, and disk scrubbing capabilities that, together with ATA disk-based backup and restore, offer greater levels of reliability compared to offline media. As a result of this new drive technology, disk-based backup and recovery is now affordable for most organizations. Leveraging the reliability and performance of Dell|EMC storage systems and ATA disk-based backup technology, several companies have integrated this functionality within their applications.

Claude Bouffard (Bouffard_Claude@emc.com) is a principal software engineer in the CLARiiON Application Solution Integration Group at EMC Corporation. He has been the lead backup technologist for the Dell|EMC partnership for the last two years. Claude also has over seven years of experience with backup applications, tape drives (DLT), and libraries.

FOR MORE INFORMATION

Dell|EMC: http://www.dell.com/emc

EMC: http://www.emc.com

© 2010 Dell | About Dell | Terms of Sale | Unresolved Issues | Privacy | About Our Ads | Dell Recycling | Contact | Site Map | Feedback

snDWW8