Rick Adams explores recent air traffic control glitches and asks what lessons have been learned about mitigating future system anomalies
Business Continuity

Recent air traffic control (ATC) technical issues have revealed some important business continuity lessons for the industry.

The first lesson is the need for speedy communication. The UK National Air Traffic Services (NATS) posted periodic updates on Twitter when a technical issue at Prestwick caused flight disruptions in late October 2015.

It wasn’t such a pretty picture less than 12 months earlier, however. In December 2014, there was a failure in the computer system used to provide information to air traffic controllers managing the traffic flying at high level over England and Wales. All departures were stopped from London Airports as well as flights from European airports that were planned to route through affected UK airspace. It took about five hours for normal computer service to be resumed although disruptions continued into the following day.

The independent enquiry panel into the December 2014 NATS system failure, chaired by Sir Robert Walmsley, noted the following exchange in its report: At Gatwick, the controller managing take-offs was called by the pilot of the leading aircraft along the lines of ‘my passengers are telling me that they’re hearing on Sky News that there’s an air traffic problem. Can you tell me something?’

The enquiry panel report highlighted a second important lesson in resolving ATC problems; the need for collaboration. It notes that during the December 2014 incident, calls were predominantly structured around “push” communications informing customers of actions taken, or planned, by NATS. It states: “There does not appear to have been a formal process to receive, triage and prioritize customer information and requests. It may therefore be concluded that the NATS-led recovery was largely generic in nature.”

Enhanced resilience
The report concluded that: “It is considered necessary to set out contingency, resilience, and business continuity performance requirements in a clear and unambiguous way … in consultation with other stakeholders and ideally aligned within Europe to avoid driving different requirements and costs across the network.”

According to NATS, since the December 2014 incident, the organization has “reviewed and updated” its approach, including the ability of key stakeholders to receive and respond to information.

NATS is endeavouring to ensure that the new strategy is better understood by all stakeholders, with better lines of communication and awareness. This should result in improved decision making and speed up the recovery process, which can significantly mitigate the level of disruption in the event of system failures.
Some airlines have yet to be convinced, however. Speaking about the Prestwick outage in October 2015, Ronan O’Keefe of Ryanair notes: “We voiced our concerns at the time that it was unacceptable that the NATS ATC system dropped for the second time in 12 months, particularly on a busy Friday in the run up to Christmas.”

Technical interruption
Earlier this year, South Africa had problems of its own following control center issues at O.R. Tambo airport in Johannesburg. Ironically, it happened at the same time as a regional air traffic management workshop in the city.  “Ours is a very technical and technology-intensive environment,” explains Thabani Mthiyane, CEO of the Air Traffic Navigation Services (ATNS) company. “The communication failure experienced was precipitated by an off-peak period reconfiguration of the control center technical layout, meant to accommodate the installation and commissioning of new equipment. This regrettably, resulted in a temporary communication intermission.”

The malfunction interrupted communication between the controllers and aircrews on flights operating in the nearby airspace. The lesson learned in this case was the need for better planning. During a review of the causes of the failure by ATNS, “it became evident that further risk mitigations are required to be implemented.”

An unbelievable event
The question is how far those mitigation efforts should go. A fire that was deliberately set by a disgruntled contract employee at the Air Route Traffic Control Center (ARTCC) facility in the Chicago suburb of Aurora, Illinois, in September 2014, has been described as “unbelievable” by Terry Biggio, Vice President, Safety and Technical Training for the FAA’s Air Traffic Organization.

The fire substantially damaged the Federal Telecommunications System, which allows Chicago Center to digitally share flight data throughout the system. It also required all of the controllers to evacuate the building while firefighters, police, and bomb-sniffing dogs took over what had become a crime scene.

The ARTCC typically manages more than 6,000 daily flights, which ended up being transferred to Terminal Radar Approach Control (TRACON) facilities in the region, such as South Bend, Minneapolis, and Moline—increasing their workload by as much as 400%.

Airlines had to scramble to manage the subsequent flight disruptions and unhappy passengers. According to the trade group Airlines for America, the fire caused approximately 6,600 flight cancellations and affected nearly 500,000 passengers from 26 September to 13 October 2014, including 4,500 flights in September alone.

The FAA says it worked as quickly as possible to share information. “All along the way, we had people working together at a level that I had never experienced,” Chicago Center Air Traffic Manager, Bill Cound remarks. “Teamwork, cooperation, collaboration. People coming to the table with solutions at hand. No turf protection. No selfishness about which facility is going to do what. The story is about all the facilities coming together—approach control facilities that were working traffic they were never trained to do, they had never worked before, all the centers reaching out and finding creative, innovative ways to work traffic.”

While the response at the time had its element of ad-hoc innovation, the FAA is putting the lessons learned on to a more concrete footing, and it has enhanced security and contingency plans across the United States’ air traffic system following the Chicago arson.

“New practices and policies have strengthened our security posture while also enabling us to quickly share surveillance, communications, and weather information with other facilities in the event of an outage or disruption of air traffic services,” the FAA notes in a statement. “Meanwhile, the FAA continues to deploy new NextGen technologies that will enable an even more seamless transfer of data and services in the future.”

Tactical reaction
From the airline point of view, it seems the collaboration experienced following the Chicago outage, is the key to a speedy and accurate response to any ATC problems.

Melissa Ford, Senior Manager, Communication & Outreach – Operational Communication at Southwest Airlines notes that “the airline must have the capability to tactically react and then strategically plan and adjust their operations based on the limitations imposed by an ATC outage. Close communication and collaboration between the airline operators and the responsible air navigation service provider is critical to determine what capabilities are available in the affected airspace. This is in order for the airline to develop and execute a positive plan of action.”

Marc Gross, Managing Director of American Airlines’ Integrated Operations Center, concurs. “We are partnering with the FAA on business continuity,” he reveals. “We have shared our business continuity plans and strategy with the FAA in the hope they can leverage some of the information in their organization. We are also participating in an FAA business contingency drill that will simulate this same outage to help them verify revised contingency plans are effective.”

Rob Eagles, IATA’s Director for Air Traffic Management and Infrastructure, says the “contingency and resilience of the ATM system must be a high priority. “The consequences of outages on the airlines and the interconnected global system are severe. Advances in technology wouldTechnology today enable allows for better redundancy between ATC centres and States.”