US FAA Adopts New Safeguards After Computer Outage Halted Flights (reuters.com) 25

Posted by msmash on Monday January 30, 2023 @11:36AM from the moving-forward dept.

The Federal Aviation Administration (FAA) told lawmakers Monday it had made a series of changes to prevent a repeat of a key computer system outage that forced a nationwide Jan. 11 ground stop disrupting more than 11,000 flights. From a report: The FAA said it has implemented "a one-hour synchronization delay for one of the backup databases. This action will prevent data errors from immediately reaching that backup database." The FAA also said it "now requires at least two individuals to be present during the maintenance of the (messaging) system, including one federal manager."

US FAA Adopts New Safeguards After Computer Outage Halted Flights

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 25 Comments Log In/Create an Account

Comments Filter:

Turn keys... (Score:2, Funny)

by Anonymous Coward writes:

Turn keys on my mark.
MARK - system goes down....
- Re: Turn keys... (Score:2)
  
  by Bearhouse ( 1034238 ) writes:
  
  Cmon guys mod up
So things now break one hour later? (Score:2)

by gweihir ( 88907 ) writes:

Yeah, great fix. And I have seen "managers" making sure things were done right. Most of the time they were not even looking and used their phone. One also directly told me he had no clue how things worked on the tech side.
- Re: (Score:2)
  
  by The MAZZTer ( 911996 ) writes:
  
  Sounds like this is the "short term band aid" fixes they are doing. Hopefully they are also looking to long-term fixes as well.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    You think? I do not. When something goes this badly wrong, the rot sits deep on all levels.
- Re: (Score:2)
  
  by firewrought ( 36952 ) writes:
  
  Yeah, I would think having a peer check your work as you do the deployment would be way more effective at preventing errors.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Not only you. Anybody that does this competently uses two _experts_ for anything like this because otherwise it is pointless.
    - Re: (Score:2)
      
      by CaptQuark ( 2706165 ) writes:
      
      And almost any major upgrade must have a "rollback" plan in place before it is approved. The Change Request checklist at our location included a rollback requirement where we had explain the plan to reverse any changes we made, the procedure to do it, who could do it, and the timeline to accomplish it.
      If the risk assessment said the rollback could be completed in two hours by saving the data before it was upgraded and restoring it if problem occurred, it was probably approved. If the procedure was to back
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        In an environment like this? Most definitely.
- what about fixing input to cut down on data errors (Score:2)
  
  by Joe_Dragon ( 2206452 ) writes:
  
  what about fixing input to cut down on data errors?
  Was it an maintenance error? Or was it an data entry error that crashed the system?
  - Re: (Score:2, Interesting)
    
    by Anonymous Coward writes:
    
    It was a maintenance error. They inadvertently synchronized a bad parameter to the backup environment before pushing a code update. Ergo, when the system 'restarted' after the code update it imploded - and the hot-spare backup environment was already corrupt.
    I imagine restoration was - rollback to known good point, move the transactions forward (skipping the errant one), and restarting things... hence the rather lengthy outage window.
    Also... the system is rather old. Adding to the issues.
    How do you take
    - Oldness has nothing to do with it (Score:2, Interesting)
      
      by Anonymous Coward writes:
      
      This system would've been outperformed by a fleet of teletypes producing hardcopy. Central goes down, you still have the hardcopies and you just hope that nothing really vital needs to go out as a NOTAM in the few hours you need to crank the thing back on. If it does, well, you still have radio. And you'll have a much smaller and more localised problem.
      There used to be a German weather subscription service for (general) aviation that used ISDN and built a Fido-style network on top of that. Should the centr
Highly Available (Score:2)

by RedMage ( 136286 ) writes:

Even modern systems are Highly Available, not "Always Available" - nothing is 100%, but there are procedures and designs that protect and minimize the impact of mistakes. Change Requests and "manager over the shoulder" aren't really going to improve anything except creating paperwork and job angst.
I'm not generically a fan of remaking working systems, but there does come a time when the requirements or outcomes have shifted far enough that throwing a system away and restarting is the better option. In my
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  I'm not generically a fan of remaking working systems
  Yep, that's the reason COBOL is still in use.
- Re: (Score:2)
  
  by slack_justyb ( 862874 ) writes:
  
  Change Requests and "manager over the shoulder" aren't really going to improve anything except creating paperwork and job angst
  I would say this is just how it goes in US Government jobs, but at the same time I've seen what happens when someone tries to explain what you just said to a Congressional sub-committee. The folks in Congress are the ones that make the "eyes over your shoulder" happen way more often than not. Tell you the truth, I think it's just projection.
It's time to make ATC private, similar to Canada (Score:2)

by schwit1 ( 797399 ) writes:

https://transportationtodaynew... [transporta...aynews.com]
- Re: (Score:2)
  
  by serafean ( 4896143 ) writes:
  
  Privatization destroyed the UK's railways, gutted French EDF, took to the brink of collapse Prague (CZ) water utilities...
  Privatization can go both ways.
- Re: (Score:2)
  
  by pesho ( 843750 ) writes:
  
  The NOTAM system that failed is not ATC. Your post is misguided.
System is not "mission critical" (Score:2)

by laughingskeptic ( 1004414 ) writes:

The system is only mission critical in the eyes of the FAA who have made obtaining NoTAMs a pre-flight requirement for pilots. Following other incidents where pilots did things like landing on a closed runway, directors of the FAA have admitted to Congress that NTAMs are largely noise and that pilots often ignored them or miss a single key line in pages of non-useful cryptically encoded lines of notices. Air Traffic control also provides this information to pilots and even better, provides it to them as t
- Re: System is not "mission critical" (Score:2)
  
  by Bearhouse ( 1034238 ) writes:
  
  You make a good point, but NOTAMs still have their place. Flights between two uncontrolled airports comes to mind, although of course if you're paranoid like me you'll call your destination beforehand in that case. The wider question is of course 1. How can you fuck up something so simple and 2. How easy and cheap would it be to replace the current system with a web-based simple text solution?
Upgrading equipment. (Score:2)

by jddj ( 1085169 ) writes:

8-inch floppy drives to 5.25in.
They revoked access (Score:2)

by Tony Isaac ( 1301187 ) writes:

As they should have done long ago, and as every dev shop that deploys software critical to the operation of their companies. It's way too common for developers to have free rein on production systems. If the *only* way to make changes to a production system, is to create a deployment and test the deployment on a test system first, these kinds of issues will happen far less frequently.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

US FAA Adopts New Safeguards After Computer Outage Halted Flights (reuters.com) 25

US FAA Adopts New Safeguards After Computer Outage Halted Flights More Login

US FAA Adopts New Safeguards After Computer Outage Halted Flights

Turn keys... (Score:2, Funny)

Re: Turn keys... (Score:2)

So things now break one hour later? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

what about fixing input to cut down on data errors (Score:2)

Re: (Score:2, Interesting)

Oldness has nothing to do with it (Score:2, Interesting)

Highly Available (Score:2)

Re: (Score:1)

Re: (Score:2)

It's time to make ATC private, similar to Canada (Score:2)

Re: (Score:2)

Re: (Score:2)

System is not "mission critical" (Score:2)

Re: System is not "mission critical" (Score:2)

Upgrading equipment. (Score:2)

They revoked access (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot