Backup and Restore Strategies with SQL Server 2000
By Rudy Lee Martinez (Issue 4 2000)
Information technology (IT) professionals fall into one of two categories: those who have already experienced a significant data loss, and those who will. The survivors will be clearly distinguished from the casualties by the planning and implementation of their backup and restore strategies. This article discusses the costs incurred from data loss and the leading contributors to data loss. It also reviews how to plan a backup and restore process and describes data protection strategies using SQL Server 2000 Enterprise to back up SQL Server 2000 databases.
Information technology (IT) departments today must seriously engage in a working and targeted data protection plan. Without a clear commitment to protecting corporate data, organizations will inevitably experience a compromise of data integrity, possibly resulting in permanent loss of data. However, proper planning and implementation can mitigate much of the risk of data loss.
Investment in a backup and restore strategy and implementation plan is not inexpensive, nor is it optional. If an organization considers data important, then it must focus on data protection and be willing to bear the commensurate expenses. The elements of cost for such a strategy include:
- Planning time
- Trained personnel
- Backup and restore hardware
- Backup and restore software
- Backup and restore removable media
|
Given the substantial costs, why should a company invest in a backup and restore strategy? The answer is that few, if any, companies can afford not to make the investment. If, as is generally acknowledged, people are a company's number one asset, then company data—its corporate knowledge—is a close second. Without a carefully planned and executed backup and restore strategy, this data is at risk. Although most companies have backed up their data for years, examples of permanent data loss attributed to insufficient planning and execution still abound.
Figure 1 shows the average hourly impact of downtime—the inability to access corporate data and systems. These stark numbers make a compelling case for creating and adhering to a data protection plan. The case becomes especially strong when coupled with the fact that corporations endure an average of two hours of significant downtime per week.
Figure 1. The Costs of Data Loss
Causes of Temporary Data Loss
Threats to a company's data come in many forms. Figure 2 lists the causes of temporary data loss and the relative probability of each risk. Temporary data losses need not become permanent. If the damage does become permanent, it is because either the company knowingly and willingly tolerates a degree of data loss, or the company has failed to implement an effective data protection plan that includes a comprehensive backup and restore strategy.
Figure 2. Causes of Data Loss
A data protection plan that includes a backup and restore strategy can minimize the negative impact when damage occurs. Preventative measures can also effectively reduce the frequency of occurrence and impact of each threat.
Hardware Malfunction. Hardware and system malfunctions are the primary reason for data loss. These include failures such as disk head crashes, disk controller failures, network connectivity or component failures, off-line power supply failures, and cooling fan failures.
Fortunately, technological advances in recent years have made component redundancy affordable for even the smallest data centers. For instance, Redundant Array of Independent Disks (RAID) solutions dramatically reduce the likelihood of data loss because of disk-related failures. Redundant power supplies and cooling fans provide fault-tolerant protection if the primary component fails.
Human Error. Another substantial threat to data integrity is the risk posed by people. This type of damage occurs in two forms: intentional and unintentional. Individuals or groups can inflict deliberate harm, with malice as the end goal. The best defense against this type of threat is a very aggressive security program that includes appropriate segregation of data with access granted on an as-needed basis. Accidental data casualties also can be dramatically reduced by a rigorous policy of appropriate access management. For personnel who can access and modify sensitive data, training is often the best preventative measure against unintentional loss.
Software Corruption. Hardware can and will fail, and the complex software systems produced today will contain defects, thereby instigating some forms of data corruption. The most powerful defense against data loss caused by software defects is a suitable backup schedule. The IT organization should also carefully monitor the isolation of data stores from exposure to under tested and unsupported releases of internal and third-party software.
Computer Viruses. In recent years, worms and viruses have increased in number and become more sophisticated. Even simple programs written with destructive intent can inflict very costly damage on unprotected systems. Fortunately, most enterprise-quality antivirus packages provide excellent protection against the majority of viruses and worms. Data protection plans should include comprehensive information about viruses and clear user instructions using and updating the antiviral agents on their client machines.
Natural Disasters. Temporary data loss has many causes, but the least common—natural disasters—gets much of the attention. Hurricanes, tornadoes, fires, and floods can wipe out entire data centers. Two ways to protect against such catastrophic loss are to store backups in safe, off-site locations, and to maintain some degree —hot, warm, or cold—of recovery-site readiness.
Hot sites are secondary or tertiary data centers that can, in short order, resume processing for the primary data center. Warm sites are data centers with infrastructure such as wiring and cooling in place, but machines are procured and set up only on an as-needed basis. Cold sites are identified physical locations that are nothing more than empty shells. In both warm and cold sites, companies usually have contractual agreements with hardware vendors to supply the required equipment if an emergency occurs. Details of these types of agreements and other pertinent processes and procedures are contained in the disaster recovery plan, which is a part of an encompassing data protection plan.
Even the best efforts and most sophisticated measures to protect company data are never fail-safe. Intruders, disasters, and catastrophic hardware failure are statistically certain to afflict some businesses almost daily. However, permanent data loss is not a foregone conclusion. Data is volatile and vulnerable by nature, but the right plan can offer a degree of protection.
The Data Protection Plan
The first step toward developing a data protection plan is to understand the requirements and identify the key participants within the organization. The following list provides a sample of the information that should be compiled:
- Roles and responsibilities
- Location of data to be backed up
- Size of data to be backed up
- Data growth projections
- Backup window availability
- Currency requirements for each data package
- Business expectations for time to restore each data package
|
The Data Protection Team
Figure 3 identifies common roles and responsibilities of the data protection team. Some roles can be combined based on the size of the organization and technician availability.
Figure 3. Backup and Restore Roles and Responsibilities
Location Affects Strategy
The location of the company data and the degree of its physical separation will impact the backup and restore strategy. For example, because of severe relative bandwidth limitations, backing up large data packages across a wide area network (WAN) should be avoided. T1 lines with a throughput of 1.54 Mbps may provide inadequate pipe width compared with the Fast Ethernet (100 Mbps) or Gigabit Ethernet (1,000 Mbps) connections in modern data centers.
Also, depending on the size of the company, the backup and restore architect may choose to organize the backup responsibilities by corporate subnet or logical group. For example, a group of backup and restore technicians may be responsible for the sales organization while another group of technicians may control backup and restore activities for the manufacturing organization.
The architect determines and documents data locations, then assigns segment responsibilities. Together, the architect and the segment lead can then perform analysis of the data under the lead's supervision. The segment lead should determine the size of the data packages that require backup, and ensure that existing throughput and storage capacities are sufficient to support current requirements and accommodate a comfortable margin for future growth. Because of bandwidth and throughput limitations, the segment lead should give special consideration to the backup windows.
For example, suppose a new manufacturing operation will be up and producing in three months. The segment lead, working with business partners, determines that when the plant begins production, a total of approximately 100 GB of data must be backed up between 2:00 a.m. and 3:00 a.m. every Sunday morning. The source of the data for backup is eight SQL Server 2000 database servers, each with 12 GB that must be included in the full backup. The manufacturing managers expect data volume to increase by 10 percent per quarter. Fortunately, all servers will coexist in the same data center and will be connected via Gigabit Ethernet (1,000 Mbps). With fast connections, the bandwidth required to back up each server will be available.
The next issue to consider is throughput and capacity. The segment lead chooses enterprise-caliber backup equipment to accommodate current and future requirements of the manufacturing organization. A common selection is tape media because of its combination of high capacity and portability; the business requires the ability to store backup copies off-site for protection against catastrophic loss. The latest tape technology is well suited to meet the backup and restore requirements of the manufacturing organization. Figure 4 briefly describes backup hardware from Dell for illustrative purposes.
Figure 4. Tape Backup Device Comparison
How Much to Lose and How Quickly to Recover?
The segment lead now faces the more complex questions of how much data loss is tolerable and how quickly the data must be restored if a failure occurs. With only the full database backups performed weekly, up to a week of data could be lost if a database server suffered an irrecoverable hardware failure. However, the corporate and business partner requirements state that no more than one hour of data can be compromised, and it must be restored within two hours after failure notification. If the business cannot afford to lose any data, then it must implement a data protection plan that includes replication or distributed transactions. (See Dell Power Solutions, Issue 3 2000 for more information about these topics.)
The segment lead must formulate a plan that allows less than one hour of data loss. This plan should include nightly differential backups, which will capture all changes since the last full backup. The plan should also use transaction log (T log) backups, which act as incremental backups of database changes. The T log is a repository of database changes that are applied to the database on a periodic basis after transactions reach the committed (successful) state.
Best practice states that the T log should be stored on a separate disk from the database for two reasons: performance and protection. On a dedicated disk, access to the T log would be faster by not having to contend with database reads/writes. Placing the T log contents on a dedicated disk also enhances the data protection plan. If a database disk fails, the T log backups can be used with differential and full backups to restore the database. To focus on backup and restoration, this example has the T log stored on one disk and the database on another. It does not consider the more powerful and appropriate RAID solutions.
The following scenario shows how full, differential, and T log (incremental) backups can be used together. (Figure 5 shows the backup schedule.) The database suffers corruption on Wednesday at 10:33 a.m., and the following actions are taken to recover the database:
Figure 5. Backup Type and Schedule
- The database administrator (DBA) performs immediate T log backup of the corrupted database
- The DBA attempts to repair the database using consistency checks and other tools; the repair attempt is unsuccessful; DBA notifies backup and restore technician, if not already engaged
- Backup and restore technician applies last full backup from Sunday morning
- Backup and restore technician applies differential backup from Tuesday night
- Backup and restore technician applies all T log backups in succession, from the most recent—after the Tuesday night differential backup—up to and including the T log backup obtained by the DBA immediately after the database failure
|
If steps 3 through 5 are successful, the database will be restored to its condition just before the corruption occurred.
Backups with SQL Server 2000 Enterprise
Using SQL Server 2000 Enterprise, the DBA can back up and restore databases and T log files. The DBA can schedule backups to occur at off hours, direct backups on remote disks to occur across the network, and configure the monitoring agent to send e-mail and page alerts if the backups are unsuccessful.
More sophisticated and expensive third-party backup tools provide enhanced functionality, such as full autoloader and tape library support. However, Enterprise Manager in SQL Server 2000 Enterprise provides capabilities that should satisfy many database requirements in small-to-medium-complexity data centers.
Before creating backups in SQL Server 2000 Enterprise, the DBA must create a backup device. A backup device is a disk, tape, or named pipe used in a backup or restore operation. (A named pipe is an interprocess communication construct that allows programmatic communication between a client and a server.) Figure 6 shows the device creation step.
Figure 6. SQL Server Dialog Box to Create Backup Device
To access this dialog box, open Enterprise Manager in SQL Server 2000 Enterprise and expand the server in the tree view. Expand the Management folder and right-click on the Backup icon. Select New Backup Device from the pop-up menu. Notice that the tape option will be disabled unless a tape drive is directly attached to the SQL Server 2000 Enterprise server.
After designating a device, the DBA is ready to create backup operations for the database and the T log. Right-click on the Backup icon and select Back up a Database from the pop-up menu. This step also can be accomplished by right-clicking on a particular database—DellStar in this example—and selecting Back up a Database from the pop-up menu.
Figure 7 shows the SQL Server Backup dialog box. The DellStar database is selected and the backup is named DellStar backup. The DBA can choose between complete, differential, and T log backups. Database File and filegroups also can be backed up, but these are rarely used since RAID technology replaces their added value. Append and overwrite options are also available.
Figure 7. SQL Server Backup Dialog Box
The next step in creating the backup task is to set the schedule. Figure 8 shows the scheduling options. The complete backup selected, referenced above, is now scheduled to recur every Sunday at 2:00 a.m. The Options tab in Figure 9 presents more configurable choices for the DBA.
Figure 8. SQL Server Dialog Box to Set Schedule
Figure 9. Options Tab Selected in SQL Server Backup Dialog Box
"Verify" specifies that the media integrity of the backup is verified upon completion. Time permitting, this step should be performed because it provides additional assurance of the backup quality. The Check Media Set option ensures that the proper tape will be overwritten before the overwrite action actually occurs. Now that the DBA has addressed the complete backup, the DBA should create and schedule differential and T log backups, as appropriate. After configuring and scheduling all backup types, the DBA is now prepared for a restoration, if required.
If a restore is necessary, enact the following steps: Right-click on the database to be restored, select All Tasks , then choose Restore Database from the pop-up menus. Figure 10 shows the Restore Database options. If Restore Database is selected, backups of the chosen database will be displayed and the DBA should choose which backup to apply first.
Figure 10. SQL Server Dialog Box to Select Restore Database Options
Figure 11 shows the options available to the DBA after selecting the Options tab at the top of the dialog box. It is usually a good idea to select Prompt before restoring each backup.
Figure 11. Options Tab Selected in the SQL Server Restore Database Dialog Box
The Recovery completion state section of the dialog box has three choices to carefully consider. The first option, Leave Database Operational , should only be used when applying the last restoration in a series. Subsequent backups cannot be applied after the database becomes operational if the Leave Database Operational option is chosen. Under most circumstances, the DBA should select the second choice, Leave Database Nonoperational , until all backups have been applied-unless it is necessary for the database to be in a read-only mode as soon as possible.
Quality Assurance: Test the Backups
The DBA or backup and restore technician should test the integrity of the backups by performing periodic test restorations. The restorations should be random and occur frequently enough to detect faulty media, hardware, or processes.
The primary benefits of implementing a rigorous test restoration strategy are the assurance that backups are performed correctly, the hardware and medium functions properly, and personnel can perform actual restoration when required.
Rudy Lee Martinez (rudy_lee_martinez@dell.com) is the program manager for the DellStar Enterprise project and also manages a team of software developers in Product Group IT. Rudy holds an M.S. in Computer Science from Wake Forest University and is a Microsoft Certified Systems Engineer (MCSE), Solutions Developer (MCSD), Database Administrator (MCDBA), and Trainer (MCT).