After Crowdstrike Outage, FSF Argues There's a Better Way Forward (fsf.org) 139

Posted by EditorDavid on Sunday July 28, 2024 @07:51PM from the join-us-now dept.

"As free software activists, we ought to take the opportunity to look at the situation and see how things could have gone differently," writes FSF campaigns manager Greg Farough: Let's be clear: in principle, there is nothing ethically wrong with automatic updates so long as the user has made an informed choice to receive them... Although we can understand how the situation developed, one wonders how wise it is for so many critical services around the world to hedge their bets on a single distribution of a single operating system made by a single stupefyingly predatory monopoly in Redmond, Washington. Instead, we can imagine a more horizontal structure, where this airline and this public library are using different versions of GNU/Linux, each with their own security teams and on different versions of the Linux(-libre) kernel...

As of our writing, we've been unable to ascertain just how much access to the Windows kernel source code Microsoft granted to CrowdStrike engineers. (For another thing, the root cause of the problem appears to have been an error in a configuration file.) But this being the free software movement, we could guarantee that all security engineers and all stakeholders could have equal access to the source code, proving the old adage that "with enough eyes, all bugs are shallow." There is no good reason to withhold code from the public, especially code so integral to the daily functioning of so many public institutions and businesses. In a cunning PR spin, it appears that Microsoft has started blaming the incident on third-party firms' access to kernel source and documentation. Translated out of Redmond-ese, the point they are trying to make amounts to "if only we'd been allowed to be more secretive, this wouldn't have happened...!"

We also need to see that calling for a diversity of providers of nonfree software that are mere front ends for "cloud" software doesn't solve the problem. Correcting it fully requires switching to free software that runs on the user's own computer.The Free Software Foundation is often accused of being utopian, but we are well aware that moving airlines, libraries, and every other institution affected by the CrowdStrike outage to free software is a tremendous undertaking. Given free software's distinct ethical advantage, not to mention the embarrassing damage control underway from both Microsoft and CrowdStrike, we think the move is a necessary one. The more public an institution, the more vitally it needs to be running free software.

For what it's worth, it's also vital to check the syntax of your configuration files. CrowdStrike engineers would do well to remember that one, next time.

After Crowdstrike Outage, FSF Argues There's a Better Way Forward

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 139 Comments Log In/Create an Account

Comments Filter:

This is stupid (Score:3, Interesting)

by Anonymous Coward writes: on Sunday July 28, 2024 @08:00PM (#64662504)

The answer is not to Balkanize operating systems and software to try to obtain security through obscurity. It would make things needlessly complicated and create a whole new set of operational problems related to interoperability, training, and manpower. There's no guarantee that cutting the pie into smaller pieces to distribute to anyone would do anything but reduce the amount of time spent ensuring each individual combination of software selected by a particular entity is truly secure. The answer here is to hold Crowdstrike and anyone else that does something similar financially culpable even if it bankrupts them, as an example to other companies to make god damn sure their security products are doing what they are supposed to do, nothing more and nothing less.

- Re:This is stupid (Score:5, Insightful)
  
  by Spazmania ( 174582 ) writes: on Sunday July 28, 2024 @08:36PM (#64662554) Homepage
  
  It's not about open or closed source. This was a process error. Allowing components of high availability systems to all update on the same day is a wrong design choice. It would be just as wrong under Linux.
  
  - Re:This is stupid (Score:4, Interesting)
    
    by Known Nutter ( 988758 ) writes: on Sunday July 28, 2024 @08:44PM (#64662582)
    
    The way this is being presented -- or at least the way I am understanding it -- is that this particular definition update was one of several which are pushed over the course of a single day, the point being that threats are identified and pushed to clients in near real time. This was not a typical "patch Tuesday" type update. Pushing these updates to all clients in near-real time is the point of the service. Am I misunderstanding this?
    
    - Re:This is stupid (Score:5, Informative)
      
      by Martin Blank ( 154261 ) writes: on Sunday July 28, 2024 @08:59PM (#64662618) Homepage Journal
      
      Microsoft pushes a Defender update most days, and may push more than one update in a day. Regardless of how many, they're bucketed into three groups: a small one that acts as a trial group, a large one that encompasses most users, and another small one that is used for systems where stability is a more important factor. Within those, there is some additional randomness for when each system gets it. All this happens after internal test rollouts that look for immediate crashes. In doing so, they avoid exactly what Crowdstrike did.
      And Microsoft isn't the first to do this. This has been standard procedure for decades within the AV community. This was an enormous failure on the part of Crowdstrike.
      
      - Re: (Score:2)
        
        by Spazmania ( 174582 ) writes:
        
        Microsoft isn't the first to do this. This has been standard procedure for decades within the AV community.
        
        There's a line from the movie Tron: "The standard, substandard training which will result in your eventual elimination."
      - Snakeoil Community (Score:2)
        
        by stooo ( 2202012 ) writes:
        
        >> This has been standard procedure for decades within the AV community.
        correction:
        This has been standard procedure for decades within the Snakeoil Community.
      - Re: (Score:2)
        
        by billyswong ( 1858858 ) writes:
        
        Another factor is Microsoft update can be handled by the built-in startup recovery because the Windows OS knows there is a defender update or maybe driver update just applied today and know how to roll it back. Crowdstrike is doing its own thing without going through a standardized-to-all update mechanism. Thus the computer is unable to recover from boot failure automatically.
        We need a standardized-to-all mechanism for software installation / update when such installation / update is capable of ruining a
        
        Re: (Score:2)
        
        by billyswong ( 1858858 ) writes:
        
        You can't re-install the corrected update without rolling back first anyway.
        WannaCry was widespread because there are too many computers NOT applying updates timely. Why do people don't update their computers timely? One of the reasons cited is fear of an OS update may break the computer from running formerly working applications. For organizations lacking dedicated resources and personnel to handle this kind of accidents, they are even more likely to refrain from timely updates. Another reason is OS upd
    - Re:This is stupid (Score:4, Insightful)
      
      by gweihir ( 88907 ) writes: on Sunday July 28, 2024 @11:25PM (#64662870)
      
      You are. And because these updates are pushed in real time and can lead to boot failure if broken, that is a very high-risk operation model. At the very least, testing should have caught what was obviously a really stupid error. It did not, which means it was completely inadequate. But worse, pushing such updates which can crash the machine (and there is zero need to do it that way) and prevent reboot is an extremely bad idea in the first place. Just think what happens if somebody compromises the Cloudstrike supply chain to its customers.
      This is an abysmal failure on many levels, bot on the side of Cloudstrike and on the side of Microsoft:
      1. Updates to configuration that cannot be blocked or delayed by customers, yet that can lead to the observed problems.
      2. A kernel-level interface that can crash the machine. Microsoft provided nothing more adequate.
      3. A software architecture by Cloudstrike that, after a crash just boots the same thing and crashes again. Proper risk analysis would have identified that risk. Proper risk management would have eliminated it.
      4. Ridiculously inadequate testing by Cloudstrike.
      5. A complete failure to do proper risk assessment and risk management, both by Cloudstrike and by Microsoft.
      The conclusion here is that Cloudstrike does not do professional work, but neither does Microsoft and both should not be relied on for anything critical.
      
    - Re: (Score:2)
      
      by Captain Segfault ( 686912 ) writes:
      
      I don't think there's anything inherently wrong with it being "a single day" -- there are legitimate tradeoffs with security here that want unusually fast rollouts.
      Even still, if this thing had been rolled out over even a couple of hours, even with the crudest of telemetry, it would have presumably been stopped when it only crashlooped a few percent of machines. That still would have been a huge annoyance but it wouldn't have been stop-the-business bad for the large majority of affected businesses.
    - - Re: (Score:2)
        
        by Known Nutter ( 988758 ) writes:
        
        Thank you for the insight. This is not my area of expertise, but I do like to get the water cooler talking points correct.
        
        Re:This is stupid (Score:5, Insightful)
        
        by anegg ( 1390659 ) writes: on Sunday July 28, 2024 @10:41PM (#64662808)
        
        Some more insight, then. Critical code (code running in kernel mode, code critical to starting up/maintaining the availability of a system) should not collapse because of an erroneous or missing external file unless that file is critical to the main function of the system and the system is better off not starting than starting without the contents of that file. I'm (happily) on the sidelines for this one, but so far the evidence I've seen suggests that CloudStrike fucked up big time, in multiple ways. If their intention was to provide a defense system with "near realtime" global distribution, then their internal controls were obviously not up to the job of supporting the kind of responsibility they took upon themselves. If, on the other hand, they fully advised their customers of the risk of having production servers all auto-update simultaneously, in real time, from what is to them (the customers) a foreign source and their customers all chose to take the "easy road" and assume that CloudStrike was infallible, then their customers have themselves to blame.
        Lots of programmers take on writing critical code (device drivers, for example). Having this code fail because of an error in an external file is just sad.
        I think that attempting to blame Microsoft is a red herring or an attempt to try and drag some bigger pockets into the blame pool in hopes of some future remuneration from those bigger pockets. Microsoft sucks for many reasons (including forced updates), but as far as I'm aware CrowdStrike had their big boy pants on and made this mistake all on their own. I don't think it's the OS vendor's duty to make sure that 3rd party software written to operate as critical code works unless the OS vendor makes a guarantee to that effect.
        
        
        Re: (Score:3)
        
        by CaptQuark ( 2706165 ) writes:
        
        There is another element of this fiasco that seems to be overlooked: these systems had a drive-level encryption system without an enterprise management for the keys.
        At my last job we had a management system that allowed control of the managed systems including pushing and deleting files in the encrypted file system. If we needed to push a group policy object to all the managed computers, the system could do that. If the systems needed to be booted into a recovery state, the management software had the enc
        
        Re: (Score:2)
        
        by jonbryce ( 703250 ) writes:
        
        The "enterprise management system" was down due to the issue.
        Even if you brought that up first, the other computers wouldn't be able to receive any group policy updates until they too were manually brought back online.
        
        Re: (Score:2)
        
        by TractorBarry ( 788340 ) writes:
        
        When I got a Windows 7 laptop the very first thing I did was to back up the drivers and extract all the serial numbers/keys etc. I then went through the BIOS settings, formatted the disk, and re-installed from scratch. This gets rid of any "shovelware" and gives you an install in a known state.
        Plus you can put a Linux OS on there whilst you're at it :)
        
        Re:This is stupid (Score:4, Interesting)
        
        by Yaztromo ( 655250 ) writes: on Monday July 29, 2024 @02:06AM (#64662990) Homepage Journal
        
        Having this code fail because of an error in an external file is just sad.
        FWIW, in this case the external file contains P-code which then gets to run in kernel context.
        I think that attempting to blame Microsoft is a red herring or an attempt to try and drag some bigger pockets into the blame pool in hopes of some future remuneration from those bigger pockets.
        No, there is sufficient reason to also blame Microsoft. Tools like CrowdStrike must run in Kernal mode in part because Microsoft doesn’t really give them a lot of other options. Back when NT 3.1 was being developed, Microsoft made the decision to only support Ring 0 and Ring 3 (Kernal mode/user mode respectively) for performance reasons — switching between rings can take 150+ clock cycles, and can be slow. But the Intel CPU supports four Rings of execution, with Ring 1 intended for device drivers.
        Modern Windows works in this way to this day. Had Microsoft been more focussed on safety and less on raw performance, drivers could run in Ring 1 and could be isolated from the kernel. A Ring 1 CrowdStrike Falcon Sensor driver could, in theory, be isolated from the system when it misbehaved, allowing the system to remain online. But Microsoft being Microsoft, they chased performance over safety — so we have a situation where an errant driver like the Falcon sensor can bring the whole system down.
        If you want to see a system that does it right, look at how Falcon Sensor runs on macOS. There the Falcon Sensor is written as a modern System Extension [apple.com], and leverages DriverKit Endpoint Security extensions — where it has all the access it needs to system events, but runs completely in user mode. Should CrowdStrike on macOS run into a similar problem, the system can just isolate it and shut it down without crashing the entire system like Windows.
        What the FSF is failing to say here is that Linux has the same basic flaws that Windows has when it comes to misbehaving drivers. Linux also only supports Ring 0/Ring 3, and doesn’t provide a way for something like the Falcon Sensor to run in user mode ala macOS. Indeed, certain Linux distros with certain kernel revisions have already had kernel panics due to CrowdStrike earlier this year [theregister.com].
        You can’t wait a week for your security software to be updated when there are actors online active exploiting zero-day vulnerabilities. CrowdStrike absolutely screwed the pooch on this one. But both Microsoft and Linux still assume we live in the device driver works of the 1990s, where you release a driver and maybe just do a few bug fixes every few months, and which eventually becomes stable enough not to change. In the 202X online world we need both security software that is constantly updated and appropriate driver protection guarantees to simply disable misbehaving divers like this one. Unfortunately, the only major OS doing any work in this area seems to be Apple — Linux could learn something from them in this regard. Maybe instead of claiming that being able to choose from multiple OS vendors using the same kernel is the solution the FSF could instead work with the Linux Kernel maintainers to look at mechanisms to isolate drivers, so when they misbehave they don’t take down the entire system with them.
        Yaz
        
        
        Re: (Score:2, Informative)
        
        by drinkypoo ( 153816 ) writes:
        
        Linux also only supports Ring 0/Ring 3, and doesnâ(TM)t provide a way for something like the Falcon Sensor to run in user mode ala macOS
        You can do an on access scanner in user space with fanotify, which can block file access until scanning is done.
        
        Re: (Score:2)
        
        by Yaztromo ( 655250 ) writes:
        
        I’ll admit I’ve never done development with fanotify, so I’m open to being corrected here.
        From what I understand of fanotify, it’s well suited for something like a virus scanner — but what CrowdStrike Falcon Sensor does is much more than file-level scanning. It’s also doing in-memory checks, and looks for patterns of events that may indicate malicious activity.
        Indeed, the P-code file that killed Windows instances the other week was intended to check for certain types of
        
        Re: (Score:2, Troll)
        
        by drinkypoo ( 153816 ) writes:
        
        It means that at least some of what it's doing could be done in user space. They could use kmemleak for detecting kernel memory leakage instead of doing it themselves, while process memory is available to any process with permissions through /proc. On linux, named pipes can be monitored from user space [ycombinator.com].
        
        Re: (Score:2)
        
        by drinkypoo ( 153816 ) writes:
        
        when my entirety what?
        If you're so well-educated, how's about you explain what it does that can't be done by these mechanisms? Because it looks like you know less than nothing yourself so far.
        
        Re: (Score:2)
        
        by cciechad ( 602504 ) writes:
        
        Yes but Linux has multiple API's for security both LSM and Seccomp BPF.
        
        Re: (Score:3)
        
        by deKernel ( 65640 ) writes:
        
        The moment your boss, you know the guy that typically approves the checks puts security over speed then we can blame Microsoft. They are simply providing the product the business's want....performance over security.
      - Re: (Score:3)
        
        by thegarbz ( 1787294 ) writes:
        
        You understand correctly. I'm telling you that's a gross design error. 100% wrong.
        Your criticality of you systems also represents a criticality against external attackers. The whole point of this class of update is that it gets pushed out on day one to prevent zero day attacks. It's not a design error to close security gaps as soon as they arise.
        A better question was why the bluescreen happened in the first place. Why was the software able to do this. How was it that a seeming configuration / definition update able to cause this to occur.
        Don't throw the baby out with the bathwater.
        
        Re: (Score:2)
        
        by Spazmania ( 174582 ) writes:
        
        It's not a design error to close security gaps as soon as they arise.
        It can be and in the Crowdstrike incident, it clearly was. Security is a cost equation. Threat times vulnerability times incident cost.
        Pushing a bad update was a small but non-zero threat. Times a 100% vulnerability times an exceptionally high incident cost.
        Pushing updates slowly is also a vulnerability but with an appropriate defense in depth, the vulnerability is much much less than 100%. There's also a threat and it's higher than the bad update threat, but not enough higher to offset the reduced vulnerab
        
        Re: (Score:2)
        
        by thegarbz ( 1787294 ) writes:
        
        It can be and in the Crowdstrike incident, it clearly was.
        
        Everything can be when done incorrectly. The point is to address the incorrect use of this kind of update, not to throw out a fundamental fast acting security practice because one company with a history of fuckups fucked up again.
        Maybe it would be better if we stop calling this "update". Just like fetching the current spam list for Spamassassin isn't an update, or getting virus definition files from windows defender isn't an update. The point is there are parts of security that you want to be automated and
  - Re: (Score:2)
    
    by Ol Olsoc ( 1175323 ) writes:
    
    It's not about open or closed source. This was a process error. Allowing components of high availability systems to all update on the same day is a wrong design choice. It would be just as wrong under Linux.
    I deal with troubleshooting systems. The present system is cloud storage of critical elements, and Microsoft allowing third parties to determine whether their ecosystem works.
    Analysis - the bad guys have found a great way to bring the cloud down. If Crowdstrike has access to the OS in a manner that one problem can bring down the house of cards, you can bet the bad guys can have the same. Fatal security problem.
    The cloud, once promoted as the path forward for modern computing. Completely secure, and s
    - Re: (Score:3)
      
      by Spazmania ( 174582 ) writes:
      
      You may be conflating two different issues. The Microsoft cloud event and the Crowdstrike event happened within hours of each other, but were unrelated.
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      Indeed. Also think what a nice juicy attack target Cloudstrike makes.
      - Re: (Score:2)
        
        by Ol Olsoc ( 1175323 ) writes:
        
        Indeed. Also think what a nice juicy attack target Cloudstrike makes.
        If people aren't worried about this, they should be.
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        A lot of eggs in one rather flimsy basket. Very stupid.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Yes. The argument the FSF is making here is also that Linux is a lot less of a monoculture than Windows is. And that is a fair point.
    - Re:This is stupid (Score:4, Insightful)
      
      by Spazmania ( 174582 ) writes: on Sunday July 28, 2024 @11:18PM (#64662862) Homepage
      
      The FSF makes a lot of arguments why open source can be a better choice. The ones here are not particularly on-point for the Crowdstrike outage. The same architectural errors in system design apply to both open and closed source. As past failures at Facebook and Amazon have shown us.
      
  - - Re: (Score:2)
      
      by Spazmania ( 174582 ) writes:
      
      You're playing a game and your computer crashes. You don't blame the game.
      
      What planet do you live on? The game is the top candidate cause for the crash until it's demonstrated to happen under other circumstances. The only way an OS can perfectly prevent it is to emulate *everything* so that only the emulated machine fails. That's a computational expense far beyond most folks' tolerance.
- Re: (Score:3, Insightful)
  
  by dbialac ( 320955 ) writes:
  
  Not only that, but this statement is blatantly false:
  a single operating system made by a single stupefyingly predatory monopoly in Redmond, Washington
  This was an issue caused by Cloudstrike, not Microsoft. Many Windows systems did not use Cloudstrike. A similar product for Linux, FOSS or not, would have created a similar outage.
  - - Re: (Score:2, Insightful)
      
      by dbialac ( 320955 ) writes:
      
      A few systems? A few systems? How many Redhat servers are out there? And do you honestly think there's no such think as a bug in a kernel that gets accidentally triggered by some sort of an update?
      - Re: (Score:3)
        
        by dbialac ( 320955 ) writes:
        
        Large businesses run Redhat Servers. Why do you think that IBM pushes them? I also think you're not looking at this from a server with automatic updates perspective. The updates to a security-level system will likely need to run at a higher level than standard user-level. Another post also points to Debian servers being hit recently.
    - Re: (Score:2)
      
      by jonbryce ( 703250 ) writes:
      
      It would, and in fact it did, about a month ago.
- Re: This is stupid (Score:2)
  
  by FudRucker ( 866063 ) writes:
  
  you're being stupid, and trying to be a distraction, you hush now and let the adults talk
- Re: This is stupid (Score:2)
  
  by erik.martino ( 997000 ) writes:
  
  It doesn't help to hold companies accountable, only actual persons. Companies are not self aware and don't have a preservation instinct.
- - Re: (Score:2)
    
    by DarkOx ( 621550 ) writes:
    
    Not really - for large enterprises finding out something is going on quickly is as/more important than preventing initial foot holds in the first place.
    The simple answer is the regulators forced Microsoft of allow a loop hole big enough to drive a system crippling truck thru their WOL program. Driver code should not be doing anything with data from users space other than flipping a few options on and off based on some very very carefully defined structures sent across a restricted interface. They certain
Linux too (Score:5, Interesting)

by ArchieBunker ( 132337 ) writes: on Sunday July 28, 2024 @08:05PM (#64662516)

Crowdstrike had already caused crashes in Linux too. https://www.theregister.com/20... [theregister.com]
Why would their software require kernel level access on Linux?

- Re: (Score:3)
  
  by ChunderDownunder ( 709234 ) writes:
  
  No need to feel jealous, Linux has reached feature parity with Windows :)
  https://linux.slashdot.org/sto... [slashdot.org]
- Not much need for clownstrike snakeoil (Score:2)
  
  by stooo ( 2202012 ) writes:
  
  The difference is: Linux has not such a pressing, vital need for this clownstrike snake-oil than Linux.
- Re: (Score:2)
  
  by thegarbz ( 1787294 ) writes:
  
  Why would their software require kernel level access on Linux?
  Same reason they need it on Windows. To provide functionality not offered by an OS API.
  - Re: (Score:2)
    
    by BadDreamer ( 196188 ) writes:
    
    Linux has hooks to do what they need without running in the kernel. Apparently they're not interested in doing it the right way, but simply port their Windows garbage over.
- Re: (Score:2)
  
  by Mirnotoriety ( 10462951 ) writes:
  
  @ ArchieBunker: “Crowdstrike had already caused crashes in Linux too.”
  
  Why would anyone in their right mind even run Anti-Virus software on their Linux box?
FSF at the airport (Score:5, Funny)

by VampireByte ( 447578 ) writes: on Sunday July 28, 2024 @08:12PM (#64662522) Homepage

I can't help imagining an airport gate where the software is acting up and two admins get into a vi vs. emacs argument while trying to fix it.

- Re: (Score:2)
  
  by serviscope_minor ( 664417 ) writes:
  
  I can't help imagining an airport gate where the software is acting up and two admins get into a vi vs. emacs argument while trying to fix it.
  I had one of Her Majesty's finest border drones insist to me that I could not move gate and had to try rescanning my passport while the screen was showing the BIOS, boot sequence then Windows XP desktop.
  Eventually he conceded that "my passport didn't work" after 15 minutes and allowed me to try a different gate. He made sure the next person put in their fruitless 15 m
- Re: (Score:2)
  
  by thegarbz ( 1787294 ) writes:
  
  I can't imagine this. An airline paying for two IT people? That sounds like a cost cut waiting to happen.
- Re: (Score:2)
  
  by rsmith-mac ( 639075 ) writes:
  
  I can't help imagining an airport gate where the software is acting up and two admins get into a vi vs. emacs argument while trying to fix it.
  It would all be a foolish argument anyhow. We all know that Pico is better.
  - Re: (Score:2)
    
    by antdude ( 79039 ) writes:
    
    Nano (from Pico) FTW. :P
We've tried that too (Score:4, Insightful)

by SubmergedInTech ( 7710960 ) writes: on Sunday July 28, 2024 @08:23PM (#64662540)

Instead, we can imagine a more horizontal structure, where this airline and this public library are using different versions of GNU/Linux, each with their own security teams and on different versions of the Linux(-libre) kernel...
...because that works SO well for mobile phones.
And it's going to work even worse for education. Realistically, a public library is lucky to have *one* person who *kinda* understands their computers. The odds they have a security *team* are essentially zero. Ditto public schools.
An airline or a major university? Sure, maybe. But a library?

Crowdstrike? (Score:3)

by oldgraybeard ( 2939809 ) writes: on Sunday July 28, 2024 @08:35PM (#64662552)

Crowdstrike(everything) needs to be striped out of every business's IT infrastructure. This company has no reason to exist. This company is dead.

- Re: Crowdstrike? (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  No way. I've seen what happens when you get rid of Crowdstrike in this documentary: https://m.youtube.com/watch?v=... [youtube.com] Crowdstrike has a button on a tablet that can stop breaches. I'm pretty sure they're integrating with ChatGPT.
  - Re: Crowdstrike? (Score:2)
    
    by phantomfive ( 622387 ) writes:
    
    Any highly successful security company needs good commercials to be successful, because the people who hire them can't tell the difference.
- Re: (Score:2)
  
  by AmiMoJo ( 196126 ) writes:
  
  They may literally be dead sooner or later, due to the lawsuits coming out of this event. They can't escape liability with an EULA if the cause of the outage was negligence.
  - Re: (Score:2)
    
    by thegarbz ( 1787294 ) writes:
    
    Their liability is limited even if it was negligence. Remember you're responsible for the BSOD, not how the business responds to it. The lack of a business continuity strategy is not in their control meaning a large portion of damages are out of their hands.
    It'll hurt them a bit (I hope) let's lets not kid ourselves, they won't be dead as a result of this.
    - Re: (Score:2)
      
      by AmiMoJo ( 196126 ) writes:
      
      If they did have business continuity insurance then the insurers will be looking to recover their losses.
      Check Lawful Masses on YouTube, he did a video about this. He's an actual lawyer, unlike me who just plays one on Slashdot.
- Re: (Score:2)
  
  by bleedingobvious ( 6265230 ) writes:
  
  This company is dead.
  A company carrying a valuation of USD60Bn is "dead".
  OK, then....
  - Re: (Score:2)
    
    by Anubis IV ( 1279820 ) writes:
    
    When you’re more or less known for the work you do in one particular area, and you not only failed at your basic task of keeping computers up, accessible, and secure, you failed so badly that you caused the largest IT outage in history, then yeah, it isn’t unreasonable for people to declare them dead.
    That market cap you’re citing? It’s down roughly 35% from where it was before the event. Not exactly the mark of a healthy company.
    This event is now shining light on past, similar events
    - Re: (Score:2)
      
      by unrtst ( 777550 ) writes:
      
      When you’re more or less known for the work you do in one particular area, and you not only failed at your basic task of keeping computers up, accessible, and secure, you failed so badly that you caused the largest IT outage in history, then yeah, it isn’t unreasonable for people to declare them dead.
      Oh! I guess that's why Equifax is completely dead after their 2017 breach! Go check their stock - momentary drop, and back to normal within 1-2 years.
      - Re: (Score:2)
        
        by Anubis IV ( 1279820 ) writes:
        
        The breach affected none of their customers, so why would their business tank? That was an unsurprising result.
        
        Re: (Score:2)
        
        by unrtst ( 777550 ) writes:
        
        How many people impacted by Crowdstrike were paying customers? Wanna make a friendly wager on whether or not Crowdstrike is dead / will be around in a similar capacity in 5 years? :-)
Security by obscurity (Score:2, Interesting)

by Baron_Yam ( 643147 ) writes:

Is more likely to hide the problem from a future victim of an exploit than from the author of such an exploit.
Closed source just means that you can't check what's under the hood and have to trust that megacorp that's been in court a bunch of times for unethical behavior to be ethical.
This does not seem like a great bet to me.
I hate MS as much as the next guy, but... (Score:4, Interesting)

by sconeu ( 64226 ) writes: on Sunday July 28, 2024 @08:43PM (#64662576) Homepage Journal

Let's be fair. This isn't on MS. It's on Crowdstrike. And don't forget, Crowdstrike wound up kernel panicking Linux a few months back.

- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  MS set the tone and culture, provided the opportunity and set things up. Cloudstrike merely triggered the event after that.
  Also note that MS Windows is so insecure that you need things like Cloudstrike.
  - Re: (Score:2)
    
    by Joey Vegetables ( 686525 ) writes:
    
    I second that. There is infinitely more than enough blame to go around, but both entities share the blame for this one.
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      Indeed. That is why Microsoft almost panicky tries to blame the EU. They are deeply afraid that too many people will notice that they share a lot of the blame here and that their products are actually pretty bad.
- Re: (Score:2)
  
  by GreatDrok ( 684119 ) writes:
  
  "Let's be fair. This isn't on MS. It's on Crowdstrike. And don't forget, Crowdstrike wound up kernel panicking Linux a few months back."
  True, and frankly it's stupid to allow software like this on your system but often we're forced to by 'policy'. It's frustrating running Linux when policy makers insist on treating it like Windows and making us install all this anti-malware code which requires deep access to the system.
  However, although might be unpopular here, Apple is unaffected because they won't let too
- - Re: (Score:3)
    
    by Guspaz ( 556486 ) writes:
    
    So when Crowdstrike did the same thing on Debian a while back, that was on Debian? The only reason that one didn't cause as widespread damage is because Debian is a relatively minor player in the enterprise space (I mean from a server/container standpoint).
    - Re: (Score:2)
      
      by BadDreamer ( 196188 ) writes:
      
      The difference is, on Debian (or any Linux), they can solve the problem without running in the kernel. That they don't is on Crowdstrike.
      On Windows there is no choice. And that's on Microsoft.
      - Re: (Score:2)
        
        by Guspaz ( 556486 ) writes:
        
        On Linux, at least in kubernetes clusters, Crowdstrike uses eBPF to avoid running directly in the kernel. On Windows, Microsoft also implemented eBPF for Windows (https://github.com/microsoft/ebpf-for-windows) but CrowdStrike does not to use it.
        
        Re: (Score:2)
        
        by BadDreamer ( 196188 ) writes:
        
        There is a kernel version for Linux as well. That caused crashes a few months ago. But it's not needed to do the job in Linux.
        The Windows eBPF implementation is not complete. Also, Windows is a lot more API heavy than Linux, so it may well not be enough.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Why had all those businesses stopped doing their own update testing?
    They had not. This was a configuration update and, with Microsoft's blessing, those do not need to be blockable. With Cloudstrike, you can only block code updates to do your own testing. Config updates go right through and, as we have seen, can make a system unbootable. An exceptionally stupid design.
  - Re: (Score:2)
    
    by thegarbz ( 1787294 ) writes:
    
    Why had all those businesses stopped doing their own update testing?
    You don't test definition updates. You push them out urgently. Almost no one ever tested this kind of thing so there was nothing for them to stop doing. The point here was that this update shouldn't have had the ability to cause a BSOD in the first place.
    Resilience has been compromised in the name of security.
    Security breaches are far worse than a drop in resilience. The overwhelming majority of companies came out the other end unaffected. A few airlines made some losses, but for the most part it was just a weekend of headaches for IT.
- - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Exactly. That is a point that flies right over the heads of the Microsoft-apologists though.
IPv4-only ISPs are the problem (Score:3)

by tepples ( 727027 ) writes: <tepples@gmaiTIGERl.com minus cat> on Sunday July 28, 2024 @08:48PM (#64662592) Homepage Journal

one wonders how wise it is for so many critical services around the world to hedge their bets on a single distribution of a single operating system made by a single stupefyingly predatory monopoly in Redmond, Washington.
But enough about the Nintendo Switch system software; let's talk about what it'd take to switch to self-hosted free software.
calling for a diversity of providers of nonfree software that are mere front ends for "cloud" software doesn't solve the problem. Correcting it fully requires switching to free software that runs on the user's own computer.
And this in turn requires more investment in IPv6 rollout among Internet access providers. It's 2024, and Frontier fiber service is still IPv4-only. Some other ISPs provide "dual-stack lite" service with full IPv6 alongside limited IPv4 that allows only outgoing connections. Because there aren't enough IPv4 addresses to go around, a whole neighborhood gets put behind one IPv4 address using carrier-grade network address translation (CGNAT). This situation makes it impractical for a customer of an ISP that uses dual-stack lite to run an on-premises server, as Frontier subscribers have no way to connect to it.

Dubious advice (Score:2)

by dgatwood ( 11270 ) writes:

The correct answer is to not allow third-party kernel extensions on your cloud infrastructure. Insist on the OS vendor providing user-space hooks for whatever you need to do, and if they won't, choose another OS vendor. This wasn't caused by "the cloud" or "closed-source software". It was caused by an architectural decision when they wrote the Crowdstrike Falcon kernel extension for Windows.
I've been told that had this happened on the Linux version of crowdstrike, the OS as a whole would have stayed up,
- Re: (Score:2)
  
  by chuckugly ( 2030942 ) writes:
  
  I've been told that had this happened on the Linux version of crowdstrike, the OS as a whole would have stayed up, because it is based on eBPF instead of being a kernel extension. And I think that the Mac version is also not running in the kernel. So out of the three architectures, only the Windows port had this design problem.
  I don't know much about the "Debian Event" but it actually seems to be pretty much identical from the reports:
  "In April, a CrowdStrike update caused all Debian Linux servers in a civic tech lab to crash simultaneously and refuse to boot. The update proved incompatible with the latest stable version of Debian, despite the specific Linux configuration being supposedly supported. The lab's IT team discovered that removing CrowdStrike allowed the machines to boot and reported the incident." - https://www.neowin [neowin.net]
  - Re: Dubious advice (Score:2)
    
    by gnasher719 ( 869701 ) writes:
    
    I canâ(TM)t say exactly what would happen on macOS, but âoethe system as a whole stays upâ may be less useful than you think. Since this is antivirus software and you donâ(TM)t want to run without anti-virus software (I mean thatâ(TM)s why you pay crowdstrike big money), there could easily be some code on macs with crowdstrike that prevents them from running without antivirus
    
    So the OS is up and running, but the end user canâ(TM)t get any work done. No difference.
    - Re: (Score:2)
      
      by chuckugly ( 2030942 ) writes:
      
      I'm not certain why you would be replying to me about MacOS, but since we're here, I'm unaware of anything magical in the various BSDs that would prevent a module running in the context of the kernel from borking the whole works there the same as any other popular platform.
      - Re: (Score:2)
        
        by dgatwood ( 11270 ) writes:
        
        I'm not certain why you would be replying to me about MacOS, but since we're here, I'm unaware of anything magical in the various BSDs that would prevent a module running in the context of the kernel from borking the whole works there the same as any other popular platform.
        Apple's rules about not allowing non-driver kexts for pretty much any reason would be the "anything magical". Anything like Crowdstrike Falcon on macOS would almost certainly have to run in user space. So if it crashes, it crashes, and the rest of the OS should just roll its eyes, assuming nothing in the kernel ends up blocked waiting for some kind of permission ack from Falcon (which it shouldn't if their daemon isn't running).
Greg doesn't sound informed (Score:2)

by bleedingobvious ( 6265230 ) writes:

"one wonders how wise it is for so many critical services around the world to hedge their bets on a single distribution of a single operating system made by a single stupefyingly predatory monopoly in Redmond, Washington"
Greg is conflating CS with Windows...
Greg is arguing that thousands of divergent systems will be more secure, in the face of all the evidence to the contrarty.
Greg should not be talking about anything technical. Ever. Greg should be fetching coffee.
Itâ(TM)s anti-virus software (Score:2)

by gnasher719 ( 869701 ) writes:

The software does serve a purpose. And software protecting against malware you want to install instantly. The auto-install is not the problem.

The problem is that the update took millions of machines out that had to be fixed manually. There are several reasons for this:

1. They used a parser that can crash with the wrong input, and I bet they still use it. This needs a manual review plus fixes to make sure that whatever the input is, the parser will survive parsing it, and accept it or reject it. From e
wrong tree (Score:3)

by Tom ( 822 ) writes: on Monday July 29, 2024 @04:30AM (#64663166) Homepage Journal

This isn't about free vs. proprietary software.
It's about us security dudes telling people to patch everything immediately, people doing the logical thing - automate it - and then someone fucking it up and look there everyone is down.
Under all of that, it's about trust. Do we trust companies like Crowdstrike with kernel-level access to our systems? Do we trust them with updates? Do we trust them enough to make fully automatic updates?

- Re: (Score:2)
  
  by bleedingobvious ( 6265230 ) writes:
  
  Do we trust companies like Crowdstrike with kernel-level access to our systems?
  Wrong question.
  Do we trust them more than the end user who is desperately trying to let the threat actors in?
  Yes. The answer is yes.
  This type of approach to securing endpoints exists not because of the threat actor. It exists because end users are so horrendously bad at behaving responsibly. All the time. With carefree abandon.
  The one truism about improving safety as applied to automobiles. The safer the driver feels, the more risks they take.
  What we need is an AGI that stabs the end user with a pointy st
  - Which tree? (Score:2)
    
    by evanh ( 627108 ) writes:
    
    Safety, security and reliability are three separate problems. People do tend to muddle them together. I find myself always bashing management with this issue.
    - Security is about equipment protecting itself against people. The Internet being the cesspool that it is has a lot of this.
    - Safety is about equipment protecting people against itself. This always has highest priority. But is also only of concern in a limited set of environments. Life threatening ones. Usually because of risky mechanical or lo
  - Re: (Score:2)
    
    by Tom ( 822 ) writes:
    
    Do we trust them more than the end user who is desperately trying to let the threat actors in?
    
    That's a bit of hyperbole and part of the problem. Users are that - users. They want to USE computers to do their actual jobs. Their actual jobs are not computers. They behave what you call irresponsibly when things get in their way, because guess what, on the other side of security is not user stupidity, it's the boss pressing his people to deliver faster and better. We've been through so many decades of "optimizing performance" that now the small hurdles of security are major obstacles.
    re "optimizing perf
    - Re: (Score:2)
      
      by bleedingobvious ( 6265230 ) writes:
      
      They behave what you call irresponsibly when things get in their way, because guess what, on the other side of security is not user stupidity, it's the boss pressing his people to deliver faster and better. .
      Yeah, no. The machine is not clicking on every link they received in unsolicited mail. The boss is not forcing them to download infectious garbage based on web pop-ups, boredom and general idiocy. Productivity is not driving them to browse garbage, infected, sites.
      Phishing remains so prevalent because it works.
      End users are the primary entry point and will remain so until there are consequences for their actions. Sadly, HR will not engage at this level because - let's be blunt - HR are some of the worst o
      - Re: (Score:2)
        
        by Tom ( 822 ) writes:
        
        Phishing remains so prevalent because it works.
        
        We agree on that.
        The boss is not forcing them to download infectious garbage based on web pop-ups, boredom and general idiocy.
        No, but turning them into constantly under pressure cogs in a machine makes two things certain: One, they don't give a shit about the company and two, they don't have time to lean back and reflect on what they're doing.
        People will find opportunities to rest. It is well studied that the max most people can focus on a task is about 90 minutes. After that, you need a break or your concentration nose-dives. People take those breaks. If they can't do it officially, they'll have long restroom vis
        
        Re: (Score:2)
        
        by bleedingobvious ( 6265230 ) writes:
        
        Ah, the usual. Putting more responsibility on the end user. With awareness campaigns and punishments. Because it has done fuck all for the past 30 years, so it'll certainly start working... any day now... any day...
        Can we stop pretending the mouth-breathing farkwits have no personal agency here and are *entirely* reponsbile for their actions?
        "Oh, shame. The poor dullards are too tired and dumb to be held accountable."
        WTH argument is this?
  - Re: (Score:2)
    
    by TractorBarry ( 788340 ) writes:
    
    Personally, as mainly a cyclist and motorcylist, I *really* like the idea of the "Tullock Spike":
    "This is from legendary economist Gordon Tullock’s famous comment: “If the government wanted people to drive safely, they’d mandate a spike in the middle of each steering wheel".
    The Tullock Spike, or, to sound more gearhead-friendly, the Tullock Steering Column was something first thought of by the noted economist Gordon Tullock. Tullock came up with the idea around the time seatbelts in cars w
FSF is ignoring the possibility of malice (Score:2)

by Alain Williams ( 2972 ) writes:

I am not saying that the ClownStrike debacle was malicious but they are is a great position for $EvilActor to get his stooge installed. This stooge "accidentally" breaks any QA procedures and a "buggy" update is sent out world wide causing mayhem at a time of $EvilActor's choosing. This time the effects were relatively benign -- although a pain and costly for those affected. But what if this had happened when $EvilActor was invading a neighbouring country or doing a large drug run or ... ?
ClownStrike update
Childishness is not a solution (Score:2)

by laughingskeptic ( 1004414 ) writes:

Their "way forward" seems to be more of a childish rant against their perceived enemies than anything constructive. Greg Farough ranting about Microsoft is not news.
That wasn't the cause of the problem... (Score:2)

by laird ( 2705 ) writes:

The problem had nothing to do with open vs closed source, it was that they didn't sufficiently test the update, compounded by them releasing it globally at the same time. They should have of course tested more thoroughly, and then released in a "canary release" model, so when it crashed 0.1% of their customers they could have detected that and stopped the update before it crashed all their Windows customers.
- Re: (Score:2)
  
  by Guspaz ( 556486 ) writes:
  
  Microsoft doesn't want to do any of that, regulatory bodies forced them to.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Bullshit. Microsoft was forced to provide "equal access", not to provide "insecure access". Moving their own stuff out of the kernel (where it has no business being anyways, just as the Cloudstrike crap) and opening the API would have worked just as well to comply with anti-trust regulations. But would have caused none of the vulnerabilities.
  - - Re: (Score:2)
      
      by Guspaz ( 556486 ) writes:
      
      The exact same CrowdStrike tools exist on Linux and macOS. In fact, they *only* support Linux when it comes to containers, not Windows.
- Re: (Score:2)
  
  by bleedingobvious ( 6265230 ) writes:
  
  Actually, the bad idea was flagging the driver as boot-start - ie, no boot without....
  OS would have been fine without the driver.
- Re: Can we just agree ... (Score:2)
  
  by gnasher719 ( 869701 ) writes:
  
  No. If the malware protection protects against malware, itâ(TM)s fine. If the malware protection prevents you from booting, then that is the problem that needs fixing.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

This is stupid (Score:3, Interesting)

Re:This is stupid (Score:5, Insightful)

Re:This is stupid (Score:4, Interesting)

Re:This is stupid (Score:5, Informative)

Re: (Score:2)

Snakeoil Community (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:This is stupid (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re:This is stupid (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re:This is stupid (Score:4, Interesting)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:2, Troll)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:This is stupid (Score:4, Insightful)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:2, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: This is stupid (Score:2)

Re: This is stupid (Score:2)

Re: (Score:2)

Linux too (Score:5, Interesting)

Re: (Score:3)

Not much need for clownstrike snakeoil (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

FSF at the airport (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

We've tried that too (Score:4, Insightful)

Crowdstrike? (Score:3)

Re: Crowdstrike? (Score:2)

Re: Crowdstrike? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Security by obscurity (Score:2, Interesting)

I hate MS as much as the next guy, but... (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

IPv4-only ISPs are the problem (Score:3)

Dubious advice (Score:2)

Re: (Score:2)

Re: Dubious advice (Score:2)