DR Testing – A Real World Example

I am frequent traveler and have visited numerous airports. In my travels, it has always struck me how the design of airports has changed. Older airports like Boston’s Logan have multiple discrete and disconnected terminals that require passengers to exit security and take a bus or a long walkway to travel between them. In these airports you hope, no pray, that your trip does not require inter-terminal transfers. 

180520428Contrast that with newer airports like Atlanta(ATL) or Denver(DEN) where the terminals are all inside the same security perimeter and automated trains are required to travel to and from the terminals. A more efficient design for certain, but are there other benefits beyond this? Does it create a more disaster-resilient environment?

Now in the traditional airport case, the terminals are disconnected so a major outage in one terminal, like a power issue, won’t necessarily affect other terminals since they are logically separate. Sure, the situation will likely impact airport operations, but theoretically, the unaffected terminals should operate normally.

Compare the above situation with an airport like DEN. In this facility, they have a single train that travels between the terminals A, B and C and the train is the only way to get to terminals B and C.

How do you think DEN would handle an unexpected outage?

Let me tell you from experience, not well.

I was flying out of DEN about a year ago when the train system unexpectedly shut down. I have no idea if this was a hardware or software glitch, but like any disaster it was unexpected. At the time, I had already traveled to terminal C and so reached the gate, but the problem was that no one else could get there…literally, no one – think, no crew, no flight attendants, no gate staff! Yes, it was a significant outage that effectively shut down terminals B and C in the airport.

You would think that the airport would have a recovery plan, right?

Well, they didn’t and instead had a poorly thought out strategy that relied on buses that were housed in some distant location. By the time the wheeled transport arrived, the trains were partially working. It was an exercise in frustration for the travelers and resulted in multi-hour delays.

Contrast the above experience with ATL.

ATL is larger than DEN and most people find it confusing to navigate given the sheer size. However, unlike DEN, ATL offers three ways to travel between terminals – train, moving walkways and walking. This is noteworthy because ATL has insulated itself from unexpected train failures. While walking would clearly be inconvenient, it is infinitely superior to waiting for some bus to arrive from some far off depot. This design provides ATL with an alternative recovery strategy in the case of unexpected failure.

As an IT practitioner, you can learn an important lesson from these examples.

The DEN designers did not appear to think through the impact of a train outage and the result was chaos and a massive disruption in service. In contrast, ATL offers multiple transport mechanisms to ensure access in times of train failure. To avoid a DEN-like outage, an IT practitioner must think through the applications that he/she is supporting and create recovery strategies for each. Like ATL, leveraging multiple technologies can help a business operate through an outage. IT practitioners have the further benefit of access to many technologies including backup, replication, high availability or even cloud that can provide a range of recovery time, recovery point and cost options.

What about you, does your protection strategy align with DEN or ATL?

Please chime in the comments sections with questions or observations about how your organization addresses these challenges. I will publish a follow up next week with your comments as well some best practices and tips on how to move your organization to an environment that looks more like ATL than DEN.

About the Author: Jay Livens