Air Traffic Control "Telephone Glitch" Delays Hundreds of UK Flights 40
First time accepted submitter biodata writes "The BBC is reporting that hundreds of UK commercial air flights have been delayed for most of Saturday due to an internal telephone systems problem in the National Air Traffic Control Service, and delays are likely to continue into the evening. A spokesperson said that it was a different software bug from the one which grounded flights in the summer."
oh, editorial control (Score:5, Insightful)
"difficult software bug from the one which grounded flights"
well, spellchecker did not complain, so we're all set...
Re: (Score:2)
it also seems that somebody had to make this change... article has this quote:
Re: (Score:2)
Sure. It was really not a simple bug to put in, but the programmer who wrote it had already grounded flights in the summer, and thanks to that experience he also managed to put this bug in, despite all its difficulty.
This was definitely not intentional. (Score:5, Informative)
It's just an unfortunate incident.
British Telecom has had an issue (which has happened a number of times) which led to a minor timing glitch in one of their systems. When this happens, the data reliability on the FARICE line to Iceland drops and you start getting corrupted flight messages. Shanwick was alerted to the problem and both sides consulted and decided that the best solution in the interrim would be something that had been done previously, disconnecting FARICE and thus forcing all connections through the backup line, DANICE, which appeared to be operating normally.
Unfortunately, the problem was even worse on DANICE. What appeared to be normal operation was only normal up to the data logger. Once it actually got to the flight tracking software, the messages were being refused, and corrupted messages being sent in the other direction. So while BT was working on getting their system fixed, flight control managers were being forced to basically manually dig up ATC messages and copy-paste them off to the air traffic controllers (as much was handled through voice as possible as well).
But it got even worse. A totally unrelated communications network, Datalink, decided to misbehave during all of this, which may or may not have been due to the Shanwick problems. On the Iceland side, the general solution is to force a switchover to the backup system. Which was done... except a critical component on the backup system immediately crashed. Repeated attempts to switch and ultimately switch back caused even more problems for the air traffic controllers.
Eventually the fixed FARICE line was brought back up, Datalink back online (with the switchover-crash problem postponed to be investigated during a low-traffic timeperiod)
It's terrible that there were so many delays, but these are extremely complicated systems with a challenging task, built up over decades with tons of computer components, protocols, lines, routers, radar systems, transmitters, and on and on, scattered all over the world. On a weekend. Everyone was scrambling and doing their damndest to fix it as soon as possible. It should also be noted that it was never a safety issue - even in the absolute worst case, air traffic control could go all the way back to the old paper-and-pencil method. What the systems give is, primarily, speed, and thus when there's big problems, there's delays.
And that was my weekend, how was yours? ;)
Re: (Score:3)
Oh, and I forgot to mention the voice communication systems problem. That one didn't affect me directly but I did get a memo about it.
Re: (Score:3)
Of course, I might be entirely off base here, but below is the first impression I got.
Wouldn't a "fix" be as simple as routing all that junk encapsulated over a point-to-point ssh connection between two routers? Doesn't almost any router let you pack up all of the disparate kinds of traffic and push it over a "safe" pipe that doesn't give a flying fuck about datagram corruption? Wouldn't a solution here be, quite literally, two router boxes from any major vendor? Yeah, it may not perform all that great when
Re: (Score:2)
Re: (Score:2)
The issue is, you deal with the system you're with, not the situation you wish you had.
We can't change a transmission protocol or route data over arbitrary connections. This is a collection of everything from very old hardware to brand new, protocols from very old to brand new, in every country in the world, and you can't just arbitrarily rework them. It's the same in the air, too. And when new protocols are made, they're generally in addition to existing ones, not replacing them. I'm not aware of any with
Re: (Score:2)
Ultimately, there are routers or modems involved, and they push some legacy protocols, and there's a lot of providers out there who offer modules for modern routing hardware that take those old protocols and push them quite transparently over modern data pipes. It's a reasonably well understood problem. It would not require reworking the whole thing, that's the whole point - you take what you have and push the data around using modern hardware that can ensure that the data is safe.
Even if all you have is a
Re: (Score:1)
I'll assume that it was only because you were overworked that you missed the humour in my comment. What I did was to give a possible interpretation which would have made the erroneous sentence correct. Of course I didn't mean to imply that someone really added bugs intentionally. At least one person understood it and gave me a "Funny" mod.
But anyway, your comment was full of interesting information, so it was the rare case of a productive Whoosh. Thank you for sharing that information.
How my weekend was? We
Re: (Score:2)
This happens enough that I often wonder whether the editors are really that careless, or whether they intentionally insert errors like that in order to provide fodder for those who so enjoy writing posts correcting the article and complaining about the lack of editing. Thorough proofreading would kill one of the memes that makes slashdot what it is.
I choose to believe... (Score:2)
Re: (Score:1)
Yeah I can understand that - after seeing goatse you cannot concentrate on your programming and create all sort of nasty bugs ...
Time to Scale back on Computerisation (Score:2, Interesting)
This wouldn't -- no counldn't have happenned in the days before computers.
Eventually, I think centralised computer control is going to go the way of semaphore. It's too easy for a centralised computer system to glitch, break, be shutdown, and then screw up the lives and functions of millions.
What we should see is decentralised systems run using independent computer systems.
Re: (Score:1)
Yeah no. Probably not.
The efficiency gains from centrally controlled, fully integrated computer systems simply dwarf any benefits you might get from time to time with a distributed system.
A central computer with occasional downtime is acceptable when the alternative is a stupid, slow clerical system every day, all the time. "Clerical" is what disparate, independent systems always break down to because of the amount of human effort required to keep them working together.
Re: (Score:2)
What efficiency gains? Airlines would be far more efficient if they could fly direct from A to B, rather than being funneled into narrow corridors. Pretty much since the advent of GPS, people have been trying to get rid of 'air traffic control' and replace it by direct communication between aircraft which know where they're going and where they want to go.
Re: (Score:2)
This wouldn't -- no counldn't have happenned in the days before computers.
And we wouldn't have all these Tesla fires holding back the adoption of the electric car if we'd just stuck with horses and carts.
Also, without the internet paedophiles wouldn't have easy access to kiddy porn. Won't someone (else) puhlease think of the children?!
What we should see is decentralised systems run using independent computer systems.
Got much experience of ATC systems?
Re: (Score:2)
Without the internet, how the hell would anyone know what they had access to? There was something called privacy before the Internet came along.
Re: (Score:2)
What we should see is decentralised systems run using independent computer systems.
How about some updates? These are old-ass systems developed incrementally. It's time to spend some money modernising and unifying them.
RTFA says it's not telephones (Score:1)
but a day/night switchover.
which means they have back-assward management in the first place, for not operating a life-safety system as a 24/7 operation.
carbon-based computation should not be part of the core logic on which the air control system rests.
Re:RTFA says it's not telephones (Score:5, Informative)
They do operate it as a 24/7 operation. However, at night time there are less planes in the sky, so each traffic controller is given a bigger area to work on and there are fewer of them on duty. During day time, these areas are subdivided into smaller areas and more controllers are brought on-line to work on the larger number of areas. It was this switch-over that failed.
Internal telephone systems problem? (Score:2)
"There is a quirk over whether it flashes or not," says Chisholm. "We want it to work in 100% of case
Re: (Score:2)
And this is how government organizations perform in the 21st century. Facepalm...
Re: (Score:1)
NATS (http://en.wikipedia.org/wiki/National_Air_Traffic_Services) is 51% in private ownership, and 42% is actually owned by large airlines
I caught the beginning of this... (Score:2)
I was traveling from Heathrow to Beijing via Helsinki (5.5 hour lay-over) that was supposed to leave LHR at 7:30 but was delayed until 9:00...the estimated departure moved again backwards and forwards once (after we got on the plane), but it seemed to be a minor delay from my point of view.
The most annoying thing was that the online systems weren't showing the disruption. I was looking at the departure board at LHR and it was showing the delay (though it took a while), but the online web page and the 'Heath