Podcast: Play in new window | Download
Subscribe: RSS
It was by all accounts a small problem, a little overheating last Monday in the electronic jungle that is the Technology Command Center for Delta Airlines at its Atlanta headquarters. This minor overheating event — okay, “fire” if you insist — caused a nearby voltage-control module to spasm and allow a surge to hit a transformer, which immediately shut down the power supply. No worries, there’s an app for that. It’s called a switchgear, and its job is to sense a power failure and immediately switch the circuit to a backup power source.
Instantly, much of the computer network with which the world’s largest airline tracks and controls its planes, employees, and ticketed passengers worldwide, crashed. Airplanes on the ground were stopped in place. Aircraft in the air landed at their destination, and parked. A thousand flights had to be cancelled, tens of thousands of passengers were stranded in parked airplanes and airports. Another 500 flights were cancelled on Tuesday, and the airline continued five days later to struggle toward normalcy.
This is hardly an unprecedented event:
- In July, Southwest Airlines lost its network for 12 hours and had to cancel 2,300 flights over four days. The failure of a single router brought the system down, and it took 12 hours to reboot it.
- In September of 2015, an American Airlines system glitch stopped its flights to and from its hubs at Chicago, Dallas and Miami.
- In April of 2013, a national computer outage at American Airlines wiped out a third of its scheduled flights.
- August, 2012 – United Airlines experienced a two-hour crash of its computer systems that affected 10 per cent of its flights.
Note that the frequency of these events has gone from fewer than one a year (remember that each event costs the airline tens of millions of dollars and the passengers — well, who knows what it costs the passengers?) to two so far this year. Does anybody know why? Of course. Everybody involved knows why:
- Each airline’s system was built in the 1990s. One of the basic assumptions was that it could be shut down at night for repairs and maintenance. Now the systems are global and it’s always daytime somewhere. The system is like an airplane that can’t land, and has to be fixed and maintained in flight.
- Since then, the numbers of passengers and aircraft have grown exponentially. Patches and add-ons have been required so that the software can contain, search, sort and otherwise manipulate ever larger masses of data.
- There have been numerous mergers, requiring that individual, proprietary systems be modified so that they communicate and work with each other.
- Outside systems and networks — Internet travel agents and ticket sellers, for example, have demanded access to the airlines’ systems. Adding them to an oversized network while maintaining security has not been easy.
After 20 years of slapdash growth, Frankenstein grafts, temporary fixes, plug-ins, add-ons and extension cords, each airline has a system that is too big to fix, and increasingly prone to fail. You can’t fix it because it has to keep going, and it can’t keep going unless you fix it. You can’t afford to fix it, and you can’t afford not to.
This is Stage Four Technology Cancer, and it’s not affecting just the computer reservation and scheduling systems. A recent survey of maintenance personnel for the South American Airline LATAM revealed increasing worry about the effects of cuts in the numbers and qualifications of maintenance personnel, to save money. Giving credence to those worries, another study shows that since 2010, fewer than two commercial airliners per year have crashed worldwide. That was a worse record than that of the previous several years. So far in 2016, three airliners have crashed. The numbers are small so far, but the trend is in the wrong direction.
This is what technology cancer does; it grows without restraint until it threatens the survival of its host. If its host is the sort who refuses to have surgery because it’s too inconvenient, death ensues.
Hello, Tom. I haven’t commented here (or practically anywhere) for a couple years now, but I had to respond to this article. Coincidence: I woke up way too early this morning, and was musing about something we could perhaps call “software infrastructure.” We all know about the condition of physical infrastructure here in the US and many other parts of the world. My thoughts on software arose from experiences last night in trying to read and reply to an important email. I’ve used AOL so long (even though it in many ways stinks) it’s difficult to change. Here’s what happened: the attachments I was supposed to review were all 0 bytes. They looked “attached” but nothing was there. I tried to reply to the sender and ask for another try. My email wouldn’t send. I couldn’t save a draft, either. And so on. About 15 or 20 tries before I could send.
This sort of thing is common on about a jillion websites. Forget monitoring health insurance claims, prescription orders, etc. online. I always end up calling overworked tech support folks, and they almost always have to open trouble tickets to try to get more savvy IT people involved, and a resolution often takes a week or more. Online banking I trust not at all, given the glitches I’ve seen. And so on.
Occasionally we will hear a few mutterings of worry about such things as the electrical grid… Consider military applications, nuclear power plants, essential hard infrastructure we don’t even think about, all managed by computer systems in basically the same condition as the airline software you describe above, and be afraid. Very afraid.
Thanks for the great work you do researching these issues, bearing witness to what’s happening, and giving warning to those willing to hear the message.
Life in the technology-oncology ward…your observations are spot on.
I appreciate your good thoughts, it can be hard to stay motivated, writing among the ruins….
This piece reminds me of the verbal slugfest concerning the fate of the internet. A couple years ago, anyone who dared to envision a limited life span for this computer wonderland was taken to the woodshed, yet today, people view terrible service, outages, and ballooning fees as the norm.
I know several people who have cancelled their internet, simply because of cost and corporate government spying. They now rely upon friends, and free services, or go without.
Certainly the sales pitch that led the charge for a people’s internet never materialized, and much of the information we were promised lies locked behind pay windows, or has vanished without an explanation. Instead of the wonder of an online library, we have a vast wasteland of low rent ads, and soundbites tailored to truncated attention spans.
Sometimes I wonder, as I stagger out of that woodshed, if the internet will die due to external collapse, or due to internal irrelevance. Sites like this one are certainly doing their part to ensure the former, not the latter, and for that I’m grateful. It makes the observation of collapse somewhat palatable.
Forgive me for simply inserting a link to my own lame and stale little riff on the subject rather than scrape together something original, but this story, on one version or another, has been playing out for a while:
http://www.limpinggazelles.com/2015/04/deferred-maintenance.html
What is frightening to ponder is the thought of many of these systems in a perfect storm of cascading failure, whether through “natural causes” or at the hand of malicious players. The odds for either option to play out get better and better.
Thanks Mr. Lewis, this is a great example of the of the complexity that we are collapsing under. If any have not yet seen it I encourage them to read “The Collapse of Complex Societies” by Joseph Tainter. He describes how complexity increases while the return on it diminishes, yet leaders keep responding to each problem arising from it by, tada! more complexity. Very much as you have described in the airline story. The airlines in question are no doubt considering what geegaws to add to their systems to ‘fix’ them.
As for airlines and air travel in general, if it is the first aspect of our civilization to collapse under its own complexity that can only be a good thing for the planet. Among those who continue to fly it is probably the easiest large chunk of their energy footprint to eliminate.