Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Open Source

Software Freedom Conservancy Quits GitHub (theregister.com) 45

An anonymous reader quotes a report from The Register: The Software Freedom Conservancy (SFC), a non-profit focused on free and open source software (FOSS), said it has stopped using Microsoft's GitHub for project hosting -- and is urging other software developers to do the same. In a blog post on Thursday, Denver Gingerich, SFC FOSS license compliance engineer, and Bradley M. Kuhn, SFC policy fellow, said GitHub has over the past decade come to play a dominant role in FOSS development by building an interface and social features around Git, the widely used open source version control software. In so doing, they claim, the company has convinced FOSS developers to contribute to the development of a proprietary service that exploits FOSS. "We are ending all our own uses of GitHub, and announcing a long-term plan to assist FOSS projects to migrate away from GitHub," said Gingerich and Kuhn.

The SFC mostly uses self-hosted Git repositories, they say, but the organization did use GitHub to mirror its repos. The SFC has added a Give Up on GitHub section to its website and is asking FOSS developers to voluntarily switch to a different code hosting service. "While we will not mandate our existing member projects to move at this time, we will no longer accept new member projects that do not have a long-term plan to migrate away from GitHub," said Gingerich and Kuhn. "We will provide resources to support any of our member projects that choose to migrate, and help them however we can."

For the SFC, the break with GitHub was precipitated by the general availability of GitHub Copilot, an AI coding assistant tool. GitHub's decision to release a for-profit product derived from FOSS code, the SFC said, is "too much to bear." Copilot, based on OpenAI's Codex, suggests code and functions to developers as they're working. It's able to do so because it was trained "on natural language text and source code from publicly available sources, including code in public repositories on GitHub," according to GitHub. Gingerich and Kuhn see that as a problem because Microsoft and GitHub have failed to provide answers about the copyright ramifications of training its AI system on public code, about why Copilot was trained on FOSS code but not copyrighted Windows code, and whether the company can specify all the software licenses and copyright holders attached to code used in the training data set.
"We don't believe Amazon, Atlassian, GitLab, or any other for-profit hoster are perfect actors," said Gingerich and Kuhn. "However, a relative comparison of GitHub's behavior to those of its peers shows that GitHub's behavior is much worse. GitHub also has a record of ignoring, dismissing and/or belittling community complaints on so many issues, that we must urge all FOSS developers to leave GitHub as soon as they can."
This discussion has been archived. No new comments can be posted.

Software Freedom Conservancy Quits GitHub

Comments Filter:
  • by williamyf ( 227051 ) on Friday July 01, 2022 @07:20PM (#62667108)

    Sourceforge (a slashdot sister company) is waiting for you with open arms.

    After the Dice fiasco, the new owners have tried to make ammends.

    https://sourceforge.net/p/forg... [sourceforge.net]

    Disclaimer: I am in NO WAY affiliated to Slashdot, Github or any related company

    • Nah.

    • I've been around here long enough to remember the /. articles about free software organizations urging projects to move away from Sourceforge.

    • by quall ( 1441799 ) on Friday July 01, 2022 @11:19PM (#62667426)

      Isn't that the site with bundled garbage in every installer?

      Yeah no thanks. I don't know what it's like now since I just stay away from that site, but there is no reason to even chance choosing it.

      • Is this a troll? It was never in every projects installer. There was a huge mis-step when management decided to repack the installers for a few abandoned projects, and they offered an option for project owners to include an advertising feature in their installer in a poor attempt to enable monetizing their projects. This option was never taken up by managers of any of the software which I used back then, and I still use some of those tools today. My own project hosted on Sourceforge has never been comprom
    • Re: (Score:2, Troll)

      by drinkypoo ( 153816 )

      After the Dice fiasco, the new owners have tried to make ammends.

      If they've done as good a job with Sourceforge as they've done with Slashdot, it'll fuck your mom in front of you.

    • I've thoght about using sourceforge just because they are happy to host project mail lists which could be very convenient.

    • Got my first trojan from sf. Never been back.

  • by Catvid-22 ( 9314307 ) on Friday July 01, 2022 @07:23PM (#62667112)
    I'm not sure what to make of this, since this is practically in the same category as the data mining being done by other Big Data companies, only the target in this case isn't social media (Facebook) or trivia (Google) but code. Microsoft might be in firmer legal ground here as their system most likely doesn't involve any side issues such as privacy. So, yes, geeks may be up in arms. But don't expect any executive or legal action on this, unless it's in the whoops category covered by the DMCA and other copyright regimes. But who knows, maybe this would become the focus of some future GPL v4?
    • Just don't use it. It's not like there aren't other alternatives out there. The FOSS community have a right to be concerned and while the geek in me would say "hey, this sounds like a good feature" looking at it from another perspective it could be considered a land grab.

      • I'm sorry but that is such bs. One can successfully make the argument that today, it is impossible to write code that doesn't violate copyright or some license somewhere.

        We flatter ourselves by saying that we create a original code every time we try something. We don't. We create things based on what we've seen before, examples we've studied, code we've debugged and from logical descriptions Which means we are copying somebody else's work either intentionally or unintentionally.

        I argue that an AI tool
        • I think the problem for free software advocacy groups is that Copilot, to prevent legal headaches for owner Microsoft, only trawls through publicly accessible code, which is mostly FOSS.
    • by Misagon ( 1135 ) on Friday July 01, 2022 @10:42PM (#62667386)

      There's a big difference. Facebook and Google do data-mining on data that they own. Github does not own the repositories uploaded to it -- the uploaders do, and often they just mirror other people's code on their accounts.

      You are legally allowed to train an AI model on public data, but Open Source is not Public Domain.
      Copilot is trained on code that is under copyright, under various licenses where the weakest require at least attribution of the original authors. But copilot has stripped away any copyright or attribution from the training set.

      First, the very nature of the service, and second, special code within Copilot, is making identification of the original code hard: When nobody can prove that they are the plaintiffs in any specific case, then they can't sue, and Microsoft is taking advantage of that.

      • You've conflated public code and Public Domain. The code doesn't need to be public domain, it just needs to be publicly accessible. Which it is, if you're using Github to host a public repository. Can you point out the exact part of the GPL, MIT, Apache or other popular open source licences that forbid performing statistical analysis (including training AI) on public source repos in aggregate?
    • by mysidia ( 191772 ) on Saturday July 02, 2022 @11:22AM (#62668152)

      maybe this would become the focus of some future GPL v4?

      There is probably No term they are capable of adding that would address this, because the GPL cannot restrict usage that is not reliant on having a license. Unless the FSF is willing to add a "Non-Disclosure Agreement (NDA)" to the GPL prohibiting the public redistribution of GPL-covered code to recipients who have not agreed to Bind themselves to the GPL with respect to the covered software.

      The problem for the OSS developer is they are Publishing their code - it is possible for Github people to download the code they can find Anywhere on the internet and decline to agree to the GPL (Not putting your code on Githbub does Not protect you. If your code is publicly available, they
        can download it and use it internally.).

      Training their AI model with your code does not require a license. It is fine that they legally possess a copy of your code with no license, they can train their model with it.. They have not had to agree to the GPL in order to legally do so, therefore, no term in the GPL. can prevent this use.

      Finally, individual code Output of their tool is likely to most of the time be legally Fair use and Not a derivative work of the code their system was trained with - mainly because the CoPilot tool is producing very small sections of code focused to a task set by the user: assuming their training set is large enough - it is likely CoPilot can produce autocomp snippets which are small and unimportant enough regarding the original works as a whole To not reach the threshold necessary to be considered an appropriation of another work, And furthermore... if they do, it is likely it will have a different purpose and be sufficiently transformative that it ends up being fair use.

  • by RUs1729 ( 10049396 ) on Friday July 01, 2022 @09:15PM (#62667290)
    The day Microsoft took it over was the day that projects ought to have left. No sense in complaining now: you knew what was coming.
    • Absolutely correct. Very sad that you've been modded down for understanding the situation better than most Slashdolts. I left GitHub for GitLab the moment the acquisition was announced. No regrets.

  • by SuperKendall ( 25149 ) on Friday July 01, 2022 @09:29PM (#62667310)

    Microsoft and GitHub have failed to provide answers about the copyright ramifications of training its AI system on public code

    Read that statement again. If there are ramifications about learning to code from PUBLIC code, do any "legal" programmers even exist? I've looked at repos, I've looked at code snippets. How is it any different, legally or otherwise, if an AI I'm training is reading there same code I could have some developer from India reference to figure out how to do something?

    • by raymorris ( 2726007 ) on Friday July 01, 2022 @10:12PM (#62667352) Journal

      If you copy-paste functions from open source licensed code, you need to follow the license. That's true whether you use a mouse to do or you use a decision tree.

      The decision tree method is called "AI". It's a common type of AI, which consists of essentially building a bunch of "if-else" statements. Like this:

      If programmer types "Call-Api" {
          paste-from("Apache")
      }

      You can certainly gain understanding of programming beta practices by reading other people's code, and then use that understanding to write your own original software. A computer program can NOT gain an understanding of programming best practices. It can only do more or less complex versions of copy-paste.

      • by Mrs. Grundy ( 680212 ) on Friday July 01, 2022 @10:25PM (#62667372) Homepage

        This makes claims that sound scientific-y, but how exactly can you test the statement that "A computer program can NOT gain an understanding of programming best practices." You would need to make a distinction between the statistical process a ML algorithm performs and what a brain does, but we don't really understand what a brain does. ML algorithms are not "complex versions of copy-paste" any more than our own intelligence is. To say such suggests you don't have a good intuition about what ML algorithms are actually doing.

        You can't copyright ideas. You can copyright tangible forms of expression. If I write an algorithm that synthesizes a bunch of different ideas to create something newwell that's what creative people do and we value it.

        • Now all you need to do is convince the UK of that. [bbc.com]

        • by mysidia ( 191772 )

          You can't copyright ideas. You can copyright tangible forms of expression.

          Also it's... You can copyright ORIGINAL tangible forms of expression.

          Software programs contain a lot of code, And a lot of code within a software project is not an original form of expression.

          Take for example, the factorial function.. there are Only so many common ways to calculate factorial, and it is very likely that for any code you can think of to do it: exactly that code has almost certainly been already written before by so

        • > can NOT gain an understanding of programming best practices." You would need to make a distinction between the statistical process a ML algorithm performs and what a brain does, but we don't really understand what a brain does.

          We know enough about the brain to know it's not sure as hell friggin linear regression! Heck, we're nearly as likely to use the *least* significant factor. Every day someone posts right here on Slashdot positing that A was caused by B, where A happened BEFORE B.

          Maybe it's Bayes

      • by mysidia ( 191772 )

        If you copy-paste functions from open source licensed code, you need to follow the license. That's true whether you use a mouse to do or you use a decision tree.

        This is not necessarily true legally; it depends upon whether whatever you actually pasted is important enough and large enough to be considered an appropriation of others' work. The best advise of course is to Always attribute your copy+paste with its source, Because these issues are not very fun to deal with for programmers - it is better

        • > main() { printf("Hello, slashdot\n"); }
          > If I post a larger program containing this main function

          If you post a larger program containing that main function, you wrote mostly dead code, code that will never run.

          • by mysidia ( 191772 )

            If you post a larger program containing that main function, you wrote mostly dead code, code that will never run ..

            Not necessarily true - you don't have enough information to determine that it will be dead code.

            A compiled object can ofcourse export multiple functions, and we can link with -nostartfiles and set the program entrypoint to a _start function that begins execution somewhere else, so the main() function is not the entrypoint.

            In fact, we can have a compilation process that creates multiple diffe

      • If you copy-paste functions from open source licensed code, you need to follow the license.

        And what if you are not copy-pasting code, but learning approaches and making use of that knowledge later?

        Which is exactly what AI is doing. It's looking at a whole bunch of different code, and seeing common approaches that lots of people use. It's not copying any one codebase.

        • > but learning approaches and making use of that knowledge

          Hard drives have bits, copied from here to there. They do not have knowledge and do not make use of their insights. They are not humans or animals. They are magnets, or transistors (tiny relays).

          It's literally precisely the same as connecting a bunch of household light switches together. A switch doesn't gain understanding. It simply gets flipped on or off. The connected wire copies that in or off state to the other side of the room. It do

          • Hard drives have bits, copied from here to there.

            Human brains are no different, storing information they learned and retrieving it later, combining many items from pact memories.

            Humans are just machines also, just more complex and general purpose. There lies your fundamental misunderstanding.

            It's literally precisely the same as connecting a bunch of household light switches together.

            You mean like... neurons.

    • I'm not entirely sure what the compliant is, but GPL isn't public domain. If the AI only knows GPL code, and what it suggests, if copy/paste effectively from GPL, then the user's code should correctly attribute the origins.

    • Microsoft and GitHub have failed to provide answers about the copyright ramifications of training its AI system on public code

      Read that statement again. If there are ramifications about learning to code from PUBLIC code, do any "legal" programmers even exist? I've looked at repos, I've looked at code snippets. How is it any different, legally or otherwise, if an AI I'm training is reading there same code I could have some developer from India reference to figure out how to do something?

      Freedom to study the code and learn from it is one of the four essential freedoms of free software. Nowhere does it say that the "studying" can only be done by humans, not AI.

    • by dknj ( 441802 )

      Did you know if you copy and paste stackoverflow code then you are committing license violations? Stackoverflow uses creative commons licensing and requires attribution. But does anyone do it? No. Copyright is dead unless you have tons of money and can setup traps to catch people dead to rights. This is why copilot is taking off. Within Microsoft internally, folks move from the AKS team to Windows development teams and no one blinks an eye. Gone are the days when you had to separate your dev teams accor

    • by Talchas ( 954795 )
      Except that a relevant amount of the time it is definitely just copypasting some open-source code in from some appropriate function with the same name or such which it has essentially "memorized". And that's before you start asking how much it is understanding vs copypasting for smaller snippets. Even with humans you have people be very careful for "clean room" reimplementations of proprietary code when there's lawyer-happy companies involved.
  • ... we will no longer accept new member projects that do not have a long-term plan to migrate away from GitHub.

    but ... i don't have such a plan, because i am not using GitHub ... and never have.

  • FOSS has all the signs of a religion including disputes among heretics and apostates. Happily FOSS believers (mostly) don't burn people at the stake unlike the unfortunate Jeanne d'Arc.
  • What are the alternatives for Free git hosting? The benefit of GutHub is that it's centralized -- obviously ironic given Git. Would a non-profit hub for Git be burdensome because of bandwidth? Storage? Who is working on a decentralized hub for Git repos?

    • We need federation and a search engine. Gitlab does some federation. Not sure how easy it is to tag or reference a resources on another server.

      #1334@example.com/foo/bar would be OK. @johngalt@example.com is good enough too.

  • I would love to see Linux move away from github (they are on it right?). But due to its slow on-going corporate take over, I doubt it will move.

    But for my things, I have been considering moving somewhere and will do the on a rainy weekend.

"If value corrupts then absolute value corrupts absolutely."

Working...