UK Air-Traffic Software Misread Spots on Map To Cause Outage (bloomberg.com) 26
The UK's worst air-traffic outage in a decade was caused by an anomaly in the airspace manager's software system, which confused two geographical checkpoints separated by some 4,000 nautical miles. From a report: The UK's Civil Aviation Authority said Wednesday it will conduct an independent review of the incident, which forced hundreds of flights to be canceled or delayed last week after an error in processing an airline's flight plan. The glitch triggered a shutdown of the software system run by NATS for safety reasons, according to a preliminary report from the public-private partnership formerly called National Air Traffic Services. This forced air-traffic staff to input flight plans manually, drastically reducing the amount of air traffic that could be processed.
The event sent airlines and airports in the UK into turmoil on Aug. 28, leaving planes out of position and passengers stranded. Nearly 800 flights leaving UK airports were canceled, with a similar number of arrivals scrapped, according to analytics firm Cirium. The report by NATS showed that on the day of the incident, an airline entered a plan into the system which led through UK airspace. NATS Chief Executive Officer Martin Rolfe declined to discuss details of the flight, such as its route or the airline involved, saying the specifics weren't pertinent to the outage. While the flight plan wasn't faulty, it threw off the system because the software used by NATS received duplicate identities for two different points on the map. There are an infinite number of flight-plan waypoints in the world, and duplicates remain despite work to remove them, according to Rolfe.
The event sent airlines and airports in the UK into turmoil on Aug. 28, leaving planes out of position and passengers stranded. Nearly 800 flights leaving UK airports were canceled, with a similar number of arrivals scrapped, according to analytics firm Cirium. The report by NATS showed that on the day of the incident, an airline entered a plan into the system which led through UK airspace. NATS Chief Executive Officer Martin Rolfe declined to discuss details of the flight, such as its route or the airline involved, saying the specifics weren't pertinent to the outage. While the flight plan wasn't faulty, it threw off the system because the software used by NATS received duplicate identities for two different points on the map. There are an infinite number of flight-plan waypoints in the world, and duplicates remain despite work to remove them, according to Rolfe.
the NOTAM system seemed to fail in the same way (Score:2)
the NOTAM system seemed to fail in the same way.
bad data takes down main and live DR site.
Software to detect duplicates ??? (Score:2)
Is there some reason software would not have error detection built in before allowing it to "ship?"
Aviation stuff is on older systems that may not ha (Score:2)
Aviation stuff is on older systems that may not have good error checking.
Re: (Score:2)
Frankly, given what I've read on other topics here, I'm not sure the -age of the system- would have much significance. I've often been appalled at the excuses some posters come up with for why their software broke...
(I don't remember the exact procedure we used on CAATS for 'onboarding' data, that was almost 30 years ago, but I'd bet that detecting duplicate identifiers was a key part of that process. In part because that was the 'standard of practice' on that project.)
Re: (Score:2)
Re: (Score:2)
Is there some reason software would not have error detection built in before allowing it to "ship?"
Yes, the reason is that it wasn't a design consideration. Every problem has an obvious solution in retrospect, but predicting all problems in advance never works. Whoever wrote the software probably didn't receive an instruction of "must be able to handle specific scenario {insert scenario no one predicted would happen}"
Re: (Score:2)
This was on the news last night. The dupe is because there are two waypoints with the same code - as set by international agreement (and apparently they are changing, but it's slow going).
The problem here was an airline (quite correctly) filed their flight plan, but unlike every other flight, this particular one (quite correctly) passed over two waypoints with the same code. The receiving system shit the bed, and it took 4 hours to recover.
This is a case of the air industry having some royally shitty legacy
An infinitely large database (Score:3, Funny)
There are an infinite number of flight-plan waypoints in the world
I'd like to see the system that hosts their database. Must be pretty big! They left out the cardinality, too.
Re: (Score:3)
Like Passwords. (Score:2)
it's like passwords. You're not supposed to reuse them.
"an infinite number of flight-plan waypoints" (Score:3)
If this guy actually said "an infinite number," he has demonstrated he's not competent to run a technology based enterprise. The number of named waypoints is finite, and I'm sure representable within a single 32 bit unique identifier. See for example https://www.icao.int/EURNAT/Ot... [icao.int]
(And yes, I actually worked on Canada's Air Traffic Control System software back in the mid '90s.)
Re: (Score:2)
Ideally, I'd have thought you'd want nearby waypoints to have a related identifier. 128 bits would definitely be sufficient to do this and would not be an unreasonable sized value to work with.
Having neighbours with a similar address means that if there's a mistype or other error, you're within the margin of error for navigation, as opposed to being 4,000 miles away.
Re: (Score:2)
I would think NOT. "Close" is no cigar here. Better to have the system crash than to mis-navigate an aircraft into a mountain that happened to be between you and the incorrect waypoint. (Of course, that's not a software design decision, but rather a user requirements decision. On CAATS, we had air traffic controllers embedded in the program to address these kinds of questions. However, that did cause 'requirements thrashing' when the designated user rep changed and the new guy had different opinions th
Eager to blame others (Score:3)
UK: "This is the fault of the French!" [slashdot.org]
Also UK: "Oh wait, turns out it's just our software that sucks."
Re: (Score:3)
The Independent does not speak for the UK nor for the National Air Traffic Service. While the quality of the stories is generally good they have a firm anti-EU bias and will take any opportunity to heap shit on the French.
The UK government didn't blame anyone other than themselves. UK NATS specifically never mentioned the origin and declined to comment on the source at the time. The independent went to great effort to dig out the information and all they got was "French" and then that one piece of complete
Re: (Score:1)
Sanitize Your Inputs (Score:2)
Re: (Score:2)
in this case, the input was perfectly valid.
it seems the system did not handle duplicates.
but but what did the french have to do with it (Score:2)
Rule No.1: Always Blame Somebody Else. (Score:2)
Rule No.1: Always Blame Somebody Else.
Re: (Score:2)
I've heard (Score:2)
More explanations for the outage than there are airlines. This plethora of rationalisations tells me they've no idea why the crash occurred and are making it up as they go along.
Mis-read map (Score:2)
That's not a compass rose. That's just where I put down my coffee cup.
The map is not the territory (Score:2)
I think the bug is understandable but not excusable. I got heavily into the weeds of geolocation for the US... it turns out that many rural areas have a street called Rural Road 14 or Country Road 14. Worse, two neighboring counties or zips in say Texas could each have their own RR14. They might be the same road continuing on, or two independent roads. Worse still, the census bureau assigns county codes and the US Postal Service assigns Zip codes. Mapping from one to the other is inexact. 3rd party ve