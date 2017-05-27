IT Crash Causes British Airways To Cancel All Flights (cnbc.com) 39
An anonymous reader quotes CNBC: British Airways canceled all flights from London's Heathrow and Gatwick airports on Saturday as a global IT failure upended the travel plans of tens of thousands of people on a busy U.K. holiday weekend. The airline said it was suffering a "major IT systems failure" around the world. Chief executive Alex Cruz said "we believe the root cause was a power-supply issue and we have no evidence of any cyberattack." He said the crash had affected "all of our check-in and operational systems." BA operates hundreds of flights from the two London airports on a typical day -- and both are major hubs for worldwide travel. Several hours after problems began cropping up Saturday morning, BA suspended flights up to 6 p.m. because the two airports had become severely congested. The airline later scrapped flights from Heathrow and Gatwick for the rest of the day.
I didn't realized that the British celebrated U.S. Memorial Day weekend.
Ramadan starts today, and Monday is the Spring Bank Holiday, when many schools and businesses close.
It is actually "(Late) Spring bank holiday". The UK has depoliticized and dereligionized most of their holidays (notable exceptions are Christmas and Easter), so there is a bunch of "bank holidays" around the year that fall on Mondays (to provide extended weekends). This particular holiday seems to have replaced "Whit Monday" (day after Pentecost), which was a moveable Christian holiday. So, as you should expect, it is not related to the US Memorial Day.
Somewhere, there is probably an IT guy who has been begging for the budget to upgrade some old machines, or move the services onto a cloud provider and was ignored.
He's crying today, because this huge revenue loss could probably have been avoided with a small budget for newer hardware or more redundancy.
And despite that s/he knows who will take the blame for it.
move the services onto a cloud provider
"Cloud" service providers have no place in mission critical roles by virtue that the "Cloud" is a faster way of saying "abdicating responsibility". If you make millions of dollars a day on the back of your IT infrastructure, then the last thing you do is outsource the responsibility of said infrastructure to a 3rd party company which has different priorities than you do.
Any IT manager making such a recommendation is a) lazy, b) useless and c) should be fired.
"Cloud" is used to encourage cloudy thinking. (Score:2)
The word "cloud" is used by cloud providers to encourage cloudy thinking: Dilbert cartoon. [pinimg.com]
This Dilbert cartoon [cloudave.com] shows where cloudy thinking is leading.
Other sources: IT outsourcing (Score:2)
Here you have the BBC report on the matter: http://www.bbc.com/news/uk-400... [bbc.com]
It looks like BAE has recently replaced most of its IT workforce with south Asian contractors.
OT: it's BA, not BAE. The latter is a different company concerned mainly with blowing up flying objects, along with people in them. Easy mistake to make though.
"Power supply failure" does not take down a well-designed and well-maintained infrastructure. This is just a smokescreen to hide incompetence.
Idiots in charge! (Score:1)
The Mythical Man-Month [wikipedia.org] was written in 1975. In a very detailed way, it described how common business-planning stategies fail when applied to information technology projects. But did anyone listen? We've known how to avoid these sorts of problems for over 40 years!
"We" (as in people that actually have a clue what they are doing) have indeed known that. But the decision-makers have no such understanding. While it is really tacky, I have had to explain catastrophe scenarios to customers that would have killed their company, and all that was needed was a failed software functionality update (which they wanted to do without a possibility to roll-back and no working plan for keeping business going any other way). The people making the decisions these days are bean-counter
Backup plan.. (Score:2)
and the backup plan for when the IT systems fail is: water and food vouchers..
Is anyone tracking causes for Airline outages? (Score:2)
It's my vague recollection that at least one other airline had a power-related IT outage within the last year or so.
I would have thought "reliable power at scale" was a solved problem.
There are no "power related" IT outages. There are some where the IT infrastructure could not handle one specific system going down, and that is not a technical issue, but something else which usually is called "gross negligence". The seeming technological root-causes are just transparent lies by misdirection that serve to obscure the fact that management caused this by incompetence, arrogance, greed and general stupidity.
There are some where the IT infrastructure could not handle one specific system going down, and that is not a technical issue, but something else which usually is called "gross negligence".
Technically, that's known as a single point of failure.
https://en.wikipedia.org/wiki/Single_point_of_failure [wikipedia.org]
The term "gross negligence" doesn't come into play until a lawsuit is filed. Since no one died and/or injured from this outage, a gross inconvenience doesn't rise to gross negligence.
I think last time it was a data center failure. This time it seems like a power supply issue. The real problem is lack of redundancy and planning.
The major issue is... (Score:1)
...the outsourced IT guys from TCS in India need to fly to the UK to fix the 'power supply' issue but currently they are unable to book a flight on British Airways.....
Pilling up technical debt is utterly stupid (Score:3)
Of course, it requires more than the myopic 3-month planning that most MBAs are capable of at maximum. It also requires a real understanding of risk management and staying away from all short-term optimization. Otherwise, you end up at "save a million, lose a billion", as this seems to be a fine example of.
Claiming this was a "power supply issue" is just lying by misdirection. The root cause is lack of redundancy, lack of resilience and lack of effective business continuity management. All things that cost money and that do not generate profit _unless_ something like this happens. In a healthy infrastructure, one (or even several) power supplies blowing up will not kill your ability to do business.
Events like that are almost universally due to gross mismanagement and should not only result in termination but also prosecution of the "leadership" that allowed this to happen by not being prepared.
