Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
GNU is Not Unix Bug Open Source

After Crowdstrike Outage, FSF Argues There's a Better Way Forward (fsf.org) 139

"As free software activists, we ought to take the opportunity to look at the situation and see how things could have gone differently," writes FSF campaigns manager Greg Farough: Let's be clear: in principle, there is nothing ethically wrong with automatic updates so long as the user has made an informed choice to receive them... Although we can understand how the situation developed, one wonders how wise it is for so many critical services around the world to hedge their bets on a single distribution of a single operating system made by a single stupefyingly predatory monopoly in Redmond, Washington. Instead, we can imagine a more horizontal structure, where this airline and this public library are using different versions of GNU/Linux, each with their own security teams and on different versions of the Linux(-libre) kernel...

As of our writing, we've been unable to ascertain just how much access to the Windows kernel source code Microsoft granted to CrowdStrike engineers. (For another thing, the root cause of the problem appears to have been an error in a configuration file.) But this being the free software movement, we could guarantee that all security engineers and all stakeholders could have equal access to the source code, proving the old adage that "with enough eyes, all bugs are shallow." There is no good reason to withhold code from the public, especially code so integral to the daily functioning of so many public institutions and businesses. In a cunning PR spin, it appears that Microsoft has started blaming the incident on third-party firms' access to kernel source and documentation. Translated out of Redmond-ese, the point they are trying to make amounts to "if only we'd been allowed to be more secretive, this wouldn't have happened...!"

We also need to see that calling for a diversity of providers of nonfree software that are mere front ends for "cloud" software doesn't solve the problem. Correcting it fully requires switching to free software that runs on the user's own computer.The Free Software Foundation is often accused of being utopian, but we are well aware that moving airlines, libraries, and every other institution affected by the CrowdStrike outage to free software is a tremendous undertaking. Given free software's distinct ethical advantage, not to mention the embarrassing damage control underway from both Microsoft and CrowdStrike, we think the move is a necessary one. The more public an institution, the more vitally it needs to be running free software.

For what it's worth, it's also vital to check the syntax of your configuration files. CrowdStrike engineers would do well to remember that one, next time.

This discussion has been archived. No new comments can be posted.

After Crowdstrike Outage, FSF Argues There's a Better Way Forward

Comments Filter:
  • This is stupid (Score:3, Interesting)

    by Anonymous Coward on Sunday July 28, 2024 @07:00PM (#64662504)
    The answer is not to Balkanize operating systems and software to try to obtain security through obscurity. It would make things needlessly complicated and create a whole new set of operational problems related to interoperability, training, and manpower. There's no guarantee that cutting the pie into smaller pieces to distribute to anyone would do anything but reduce the amount of time spent ensuring each individual combination of software selected by a particular entity is truly secure. The answer here is to hold Crowdstrike and anyone else that does something similar financially culpable even if it bankrupts them, as an example to other companies to make god damn sure their security products are doing what they are supposed to do, nothing more and nothing less.
    • Re:This is stupid (Score:5, Insightful)

      by Spazmania ( 174582 ) on Sunday July 28, 2024 @07:36PM (#64662554) Homepage

      It's not about open or closed source. This was a process error. Allowing components of high availability systems to all update on the same day is a wrong design choice. It would be just as wrong under Linux.

      • Re:This is stupid (Score:4, Interesting)

        by Known Nutter ( 988758 ) on Sunday July 28, 2024 @07:44PM (#64662582)
        The way this is being presented -- or at least the way I am understanding it -- is that this particular definition update was one of several which are pushed over the course of a single day, the point being that threats are identified and pushed to clients in near real time. This was not a typical "patch Tuesday" type update. Pushing these updates to all clients in near-real time is the point of the service. Am I misunderstanding this?
        • Re:This is stupid (Score:5, Informative)

          by Martin Blank ( 154261 ) on Sunday July 28, 2024 @07:59PM (#64662618) Homepage Journal

          Microsoft pushes a Defender update most days, and may push more than one update in a day. Regardless of how many, they're bucketed into three groups: a small one that acts as a trial group, a large one that encompasses most users, and another small one that is used for systems where stability is a more important factor. Within those, there is some additional randomness for when each system gets it. All this happens after internal test rollouts that look for immediate crashes. In doing so, they avoid exactly what Crowdstrike did.

          And Microsoft isn't the first to do this. This has been standard procedure for decades within the AV community. This was an enormous failure on the part of Crowdstrike.

          • Microsoft isn't the first to do this. This has been standard procedure for decades within the AV community.

            There's a line from the movie Tron: "The standard, substandard training which will result in your eventual elimination."

          • >> This has been standard procedure for decades within the AV community.
            correction:
            This has been standard procedure for decades within the Snakeoil Community.

          • Another factor is Microsoft update can be handled by the built-in startup recovery because the Windows OS knows there is a defender update or maybe driver update just applied today and know how to roll it back. Crowdstrike is doing its own thing without going through a standardized-to-all update mechanism. Thus the computer is unable to recover from boot failure automatically.

            We need a standardized-to-all mechanism for software installation / update when such installation / update is capable of ruining a

        • Re:This is stupid (Score:4, Insightful)

          by gweihir ( 88907 ) on Sunday July 28, 2024 @10:25PM (#64662870)

          You are. And because these updates are pushed in real time and can lead to boot failure if broken, that is a very high-risk operation model. At the very least, testing should have caught what was obviously a really stupid error. It did not, which means it was completely inadequate. But worse, pushing such updates which can crash the machine (and there is zero need to do it that way) and prevent reboot is an extremely bad idea in the first place. Just think what happens if somebody compromises the Cloudstrike supply chain to its customers.

          This is an abysmal failure on many levels, bot on the side of Cloudstrike and on the side of Microsoft:
          1. Updates to configuration that cannot be blocked or delayed by customers, yet that can lead to the observed problems.
          2. A kernel-level interface that can crash the machine. Microsoft provided nothing more adequate.
          3. A software architecture by Cloudstrike that, after a crash just boots the same thing and crashes again. Proper risk analysis would have identified that risk. Proper risk management would have eliminated it.
          4. Ridiculously inadequate testing by Cloudstrike.
          5. A complete failure to do proper risk assessment and risk management, both by Cloudstrike and by Microsoft.

          The conclusion here is that Cloudstrike does not do professional work, but neither does Microsoft and both should not be relied on for anything critical.

        • I don't think there's anything inherently wrong with it being "a single day" -- there are legitimate tradeoffs with security here that want unusually fast rollouts.

          Even still, if this thing had been rolled out over even a couple of hours, even with the crudest of telemetry, it would have presumably been stopped when it only crashlooped a few percent of machines. That still would have been a huge annoyance but it wouldn't have been stop-the-business bad for the large majority of affected businesses.

      • It's not about open or closed source. This was a process error. Allowing components of high availability systems to all update on the same day is a wrong design choice. It would be just as wrong under Linux.

        I deal with troubleshooting systems. The present system is cloud storage of critical elements, and Microsoft allowing third parties to determine whether their ecosystem works.

        Analysis - the bad guys have found a great way to bring the cloud down. If Crowdstrike has access to the OS in a manner that one problem can bring down the house of cards, you can bet the bad guys can have the same. Fatal security problem.

        The cloud, once promoted as the path forward for modern computing. Completely secure, and s

        • You may be conflating two different issues. The Microsoft cloud event and the Crowdstrike event happened within hours of each other, but were unrelated.

        • by gweihir ( 88907 )

          Indeed. Also think what a nice juicy attack target Cloudstrike makes.

      • by gweihir ( 88907 )

        Yes. The argument the FSF is making here is also that Linux is a lot less of a monoculture than Windows is. And that is a fair point.

        • Re:This is stupid (Score:4, Insightful)

          by Spazmania ( 174582 ) on Sunday July 28, 2024 @10:18PM (#64662862) Homepage

          The FSF makes a lot of arguments why open source can be a better choice. The ones here are not particularly on-point for the Crowdstrike outage. The same architectural errors in system design apply to both open and closed source. As past failures at Facebook and Amazon have shown us.

    • Re: (Score:3, Insightful)

      by dbialac ( 320955 )
      Not only that, but this statement is blatantly false:

      a single operating system made by a single stupefyingly predatory monopoly in Redmond, Washington

      This was an issue caused by Cloudstrike, not Microsoft. Many Windows systems did not use Cloudstrike. A similar product for Linux, FOSS or not, would have created a similar outage.

    • you're being stupid, and trying to be a distraction, you hush now and let the adults talk
    • It doesn't help to hold companies accountable, only actual persons. Companies are not self aware and don't have a preservation instinct.
  • Linux too (Score:5, Interesting)

    by ArchieBunker ( 132337 ) on Sunday July 28, 2024 @07:05PM (#64662516)

    Crowdstrike had already caused crashes in Linux too. https://www.theregister.com/20... [theregister.com]

    Why would their software require kernel level access on Linux?

  • by VampireByte ( 447578 ) on Sunday July 28, 2024 @07:12PM (#64662522) Homepage

    I can't help imagining an airport gate where the software is acting up and two admins get into a vi vs. emacs argument while trying to fix it.

    • I can't help imagining an airport gate where the software is acting up and two admins get into a vi vs. emacs argument while trying to fix it.

      I had one of Her Majesty's finest border drones insist to me that I could not move gate and had to try rescanning my passport while the screen was showing the BIOS, boot sequence then Windows XP desktop.

      Eventually he conceded that "my passport didn't work" after 15 minutes and allowed me to try a different gate. He made sure the next person put in their fruitless 15 m

    • I can't imagine this. An airline paying for two IT people? That sounds like a cost cut waiting to happen.

    • I can't help imagining an airport gate where the software is acting up and two admins get into a vi vs. emacs argument while trying to fix it.

      It would all be a foolish argument anyhow. We all know that Pico is better.

  • by SubmergedInTech ( 7710960 ) on Sunday July 28, 2024 @07:23PM (#64662540)

    Instead, we can imagine a more horizontal structure, where this airline and this public library are using different versions of GNU/Linux, each with their own security teams and on different versions of the Linux(-libre) kernel...

    ...because that works SO well for mobile phones.

    And it's going to work even worse for education. Realistically, a public library is lucky to have *one* person who *kinda* understands their computers. The odds they have a security *team* are essentially zero. Ditto public schools.

    An airline or a major university? Sure, maybe. But a library?

  • by oldgraybeard ( 2939809 ) on Sunday July 28, 2024 @07:35PM (#64662552)
    Crowdstrike(everything) needs to be striped out of every business's IT infrastructure. This company has no reason to exist. This company is dead.
    • No way. I've seen what happens when you get rid of Crowdstrike in this documentary: https://m.youtube.com/watch?v=... [youtube.com] Crowdstrike has a button on a tablet that can stop breaches. I'm pretty sure they're integrating with ChatGPT.
    • by AmiMoJo ( 196126 )

      They may literally be dead sooner or later, due to the lawsuits coming out of this event. They can't escape liability with an EULA if the cause of the outage was negligence.

      • Their liability is limited even if it was negligence. Remember you're responsible for the BSOD, not how the business responds to it. The lack of a business continuity strategy is not in their control meaning a large portion of damages are out of their hands.

        It'll hurt them a bit (I hope) let's lets not kid ourselves, they won't be dead as a result of this.

        • by AmiMoJo ( 196126 )

          If they did have business continuity insurance then the insurers will be looking to recover their losses.

          Check Lawful Masses on YouTube, he did a video about this. He's an actual lawyer, unlike me who just plays one on Slashdot.

    • This company is dead.

      A company carrying a valuation of USD60Bn is "dead".

      OK, then....

      • When you’re more or less known for the work you do in one particular area, and you not only failed at your basic task of keeping computers up, accessible, and secure, you failed so badly that you caused the largest IT outage in history, then yeah, it isn’t unreasonable for people to declare them dead.

        That market cap you’re citing? It’s down roughly 35% from where it was before the event. Not exactly the mark of a healthy company.

        This event is now shining light on past, similar events

        • by unrtst ( 777550 )

          When you’re more or less known for the work you do in one particular area, and you not only failed at your basic task of keeping computers up, accessible, and secure, you failed so badly that you caused the largest IT outage in history, then yeah, it isn’t unreasonable for people to declare them dead.

          Oh! I guess that's why Equifax is completely dead after their 2017 breach! Go check their stock - momentary drop, and back to normal within 1-2 years.

          • The breach affected none of their customers, so why would their business tank? That was an unsurprising result.

            • by unrtst ( 777550 )

              How many people impacted by Crowdstrike were paying customers? Wanna make a friendly wager on whether or not Crowdstrike is dead / will be around in a similar capacity in 5 years? :-)

  • Is more likely to hide the problem from a future victim of an exploit than from the author of such an exploit.

    Closed source just means that you can't check what's under the hood and have to trust that megacorp that's been in court a bunch of times for unethical behavior to be ethical.

    This does not seem like a great bet to me.

  • by sconeu ( 64226 ) on Sunday July 28, 2024 @07:43PM (#64662576) Homepage Journal

    Let's be fair. This isn't on MS. It's on Crowdstrike. And don't forget, Crowdstrike wound up kernel panicking Linux a few months back.

    • by gweihir ( 88907 )

      MS set the tone and culture, provided the opportunity and set things up. Cloudstrike merely triggered the event after that.

      Also note that MS Windows is so insecure that you need things like Cloudstrike.

      • I second that. There is infinitely more than enough blame to go around, but both entities share the blame for this one.
        • by gweihir ( 88907 )

          Indeed. That is why Microsoft almost panicky tries to blame the EU. They are deeply afraid that too many people will notice that they share a lot of the blame here and that their products are actually pretty bad.

    • "Let's be fair. This isn't on MS. It's on Crowdstrike. And don't forget, Crowdstrike wound up kernel panicking Linux a few months back."

      True, and frankly it's stupid to allow software like this on your system but often we're forced to by 'policy'. It's frustrating running Linux when policy makers insist on treating it like Windows and making us install all this anti-malware code which requires deep access to the system.

      However, although might be unpopular here, Apple is unaffected because they won't let too

  • one wonders how wise it is for so many critical services around the world to hedge their bets on a single distribution of a single operating system made by a single stupefyingly predatory monopoly in Redmond, Washington.

    But enough about the Nintendo Switch system software; let's talk about what it'd take to switch to self-hosted free software.

    calling for a diversity of providers of nonfree software that are mere front ends for "cloud" software doesn't solve the problem. Correcting it fully requires switching to free software that runs on the user's own computer.

    And this in turn requires more investment in IPv6 rollout among Internet access providers. It's 2024, and Frontier fiber service is still IPv4-only. Some other ISPs provide "dual-stack lite" service with full IPv6 alongside limited IPv4 that allows only outgoing connections. Because there aren't enough IPv4 addresses to go around, a whole neighborhood gets put behind one IPv4 address using carrier-grade network address translation (CGNAT). This situation makes it impractical for a customer of an ISP that uses dual-stack lite to run an on-premises server, as Frontier subscribers have no way to connect to it.

  • The correct answer is to not allow third-party kernel extensions on your cloud infrastructure. Insist on the OS vendor providing user-space hooks for whatever you need to do, and if they won't, choose another OS vendor. This wasn't caused by "the cloud" or "closed-source software". It was caused by an architectural decision when they wrote the Crowdstrike Falcon kernel extension for Windows.

    I've been told that had this happened on the Linux version of crowdstrike, the OS as a whole would have stayed up,

    • I've been told that had this happened on the Linux version of crowdstrike, the OS as a whole would have stayed up, because it is based on eBPF instead of being a kernel extension. And I think that the Mac version is also not running in the kernel. So out of the three architectures, only the Windows port had this design problem.

      I don't know much about the "Debian Event" but it actually seems to be pretty much identical from the reports:

      "In April, a CrowdStrike update caused all Debian Linux servers in a civic tech lab to crash simultaneously and refuse to boot. The update proved incompatible with the latest stable version of Debian, despite the specific Linux configuration being supposedly supported. The lab's IT team discovered that removing CrowdStrike allowed the machines to boot and reported the incident." - https://www.neowin [neowin.net]

      • I canâ(TM)t say exactly what would happen on macOS, but âoethe system as a whole stays upâ may be less useful than you think. Since this is antivirus software and you donâ(TM)t want to run without anti-virus software (I mean thatâ(TM)s why you pay crowdstrike big money), there could easily be some code on macs with crowdstrike that prevents them from running without antivirus

        So the OS is up and running, but the end user canâ(TM)t get any work done. No difference.
        • I'm not certain why you would be replying to me about MacOS, but since we're here, I'm unaware of anything magical in the various BSDs that would prevent a module running in the context of the kernel from borking the whole works there the same as any other popular platform.

          • by dgatwood ( 11270 )

            I'm not certain why you would be replying to me about MacOS, but since we're here, I'm unaware of anything magical in the various BSDs that would prevent a module running in the context of the kernel from borking the whole works there the same as any other popular platform.

            Apple's rules about not allowing non-driver kexts for pretty much any reason would be the "anything magical". Anything like Crowdstrike Falcon on macOS would almost certainly have to run in user space. So if it crashes, it crashes, and the rest of the OS should just roll its eyes, assuming nothing in the kernel ends up blocked waiting for some kind of permission ack from Falcon (which it shouldn't if their daemon isn't running).

  • "one wonders how wise it is for so many critical services around the world to hedge their bets on a single distribution of a single operating system made by a single stupefyingly predatory monopoly in Redmond, Washington"

    Greg is conflating CS with Windows...

    Greg is arguing that thousands of divergent systems will be more secure, in the face of all the evidence to the contrarty.

    Greg should not be talking about anything technical. Ever. Greg should be fetching coffee.

  • The software does serve a purpose. And software protecting against malware you want to install instantly. The auto-install is not the problem.

    The problem is that the update took millions of machines out that had to be fixed manually. There are several reasons for this:

    1. They used a parser that can crash with the wrong input, and I bet they still use it. This needs a manual review plus fixes to make sure that whatever the input is, the parser will survive parsing it, and accept it or reject it. From e
  • by Tom ( 822 ) on Monday July 29, 2024 @03:30AM (#64663166) Homepage Journal

    This isn't about free vs. proprietary software.

    It's about us security dudes telling people to patch everything immediately, people doing the logical thing - automate it - and then someone fucking it up and look there everyone is down.

    Under all of that, it's about trust. Do we trust companies like Crowdstrike with kernel-level access to our systems? Do we trust them with updates? Do we trust them enough to make fully automatic updates?

    • Do we trust companies like Crowdstrike with kernel-level access to our systems?

      Wrong question.

      Do we trust them more than the end user who is desperately trying to let the threat actors in?

      Yes. The answer is yes.

      This type of approach to securing endpoints exists not because of the threat actor. It exists because end users are so horrendously bad at behaving responsibly. All the time. With carefree abandon.

      The one truism about improving safety as applied to automobiles. The safer the driver feels, the more risks they take.

      What we need is an AGI that stabs the end user with a pointy st

      • Safety, security and reliability are three separate problems. People do tend to muddle them together. I find myself always bashing management with this issue.

        - Security is about equipment protecting itself against people. The Internet being the cesspool that it is has a lot of this.
        - Safety is about equipment protecting people against itself. This always has highest priority. But is also only of concern in a limited set of environments. Life threatening ones. Usually because of risky mechanical or lo

      • by Tom ( 822 )

        Do we trust them more than the end user who is desperately trying to let the threat actors in?

        That's a bit of hyperbole and part of the problem. Users are that - users. They want to USE computers to do their actual jobs. Their actual jobs are not computers. They behave what you call irresponsibly when things get in their way, because guess what, on the other side of security is not user stupidity, it's the boss pressing his people to deliver faster and better. We've been through so many decades of "optimizing performance" that now the small hurdles of security are major obstacles.

        re "optimizing perf

        • They behave what you call irresponsibly when things get in their way, because guess what, on the other side of security is not user stupidity, it's the boss pressing his people to deliver faster and better. .

          Yeah, no. The machine is not clicking on every link they received in unsolicited mail. The boss is not forcing them to download infectious garbage based on web pop-ups, boredom and general idiocy. Productivity is not driving them to browse garbage, infected, sites.

          Phishing remains so prevalent because it works.

          End users are the primary entry point and will remain so until there are consequences for their actions. Sadly, HR will not engage at this level because - let's be blunt - HR are some of the worst o

          • by Tom ( 822 )

            Phishing remains so prevalent because it works.

            We agree on that.

            The boss is not forcing them to download infectious garbage based on web pop-ups, boredom and general idiocy.

            No, but turning them into constantly under pressure cogs in a machine makes two things certain: One, they don't give a shit about the company and two, they don't have time to lean back and reflect on what they're doing.

            People will find opportunities to rest. It is well studied that the max most people can focus on a task is about 90 minutes. After that, you need a break or your concentration nose-dives. People take those breaks. If they can't do it officially, they'll have long restroom vis

            • Ah, the usual. Putting more responsibility on the end user. With awareness campaigns and punishments. Because it has done fuck all for the past 30 years, so it'll certainly start working... any day now... any day...

              Can we stop pretending the mouth-breathing farkwits have no personal agency here and are *entirely* reponsbile for their actions?

              "Oh, shame. The poor dullards are too tired and dumb to be held accountable."

              WTH argument is this?

      • Personally, as mainly a cyclist and motorcylist, I *really* like the idea of the "Tullock Spike":

        "This is from legendary economist Gordon Tullock’s famous comment: “If the government wanted people to drive safely, they’d mandate a spike in the middle of each steering wheel".

        The Tullock Spike, or, to sound more gearhead-friendly, the Tullock Steering Column was something first thought of by the noted economist Gordon Tullock. Tullock came up with the idea around the time seatbelts in cars w

  • I am not saying that the ClownStrike debacle was malicious but they are is a great position for $EvilActor to get his stooge installed. This stooge "accidentally" breaks any QA procedures and a "buggy" update is sent out world wide causing mayhem at a time of $EvilActor's choosing. This time the effects were relatively benign -- although a pain and costly for those affected. But what if this had happened when $EvilActor was invading a neighbouring country or doing a large drug run or ... ?

    ClownStrike update

  • Their "way forward" seems to be more of a childish rant against their perceived enemies than anything constructive. Greg Farough ranting about Microsoft is not news.
  • The problem had nothing to do with open vs closed source, it was that they didn't sufficiently test the update, compounded by them releasing it globally at the same time. They should have of course tested more thoroughly, and then released in a "canary release" model, so when it crashed 0.1% of their customers they could have detected that and stopped the update before it crashed all their Windows customers.

Beware of all enterprises that require new clothes, and not rather a new wearer of clothes. -- Henry David Thoreau

Working...