Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
GNU is Not Unix AI Open Source Programming

Richard Stallman Shares His Concerns About GitHub's Copilot -- and About GitHub (gnu.org) 45

destinyland writes: A newly-released video at GNU.org shows an hour-long talk given by free software advocate Richard Stallman for the BigBlueBotton open source conference (which was held online last July). After a 14-minute clip from an earlier speech, Stallman answers questions from the audience — and the first question asked Stallman for his opinion about the AI Copilot [automated pair programming tool] developed for Microsoft's GitHub in collaboration with AI research and deployment company OpenAI.

Stallman's response?

There are many legal questions about Copilot whose answers I don't know, and maybe nobody knows. And it's likely some of theo depend on the country you're in [because of the copyright laws in those countries.] In the U.S. we won't be able to have reliable answers until there are court cases about it, and who knows how many years it'll take for those court cases to arise and be finally decided. So basically what we have is a gigantic amount of uncertainty.

Now the next thing is, what about morally? What can I say morally about Copilot? Well the basic idea seems okay. Why shouldn't a program be able to give you hints like that?

But there is one pitfall, which is that if you follow those hints, you might end up putting a substantial block of code copied from a GPL-covered program, written by someone else, or one hint after another after another after another — it adds up to a substantial amount of code, perhaps, with very little change, perhaps. And then you've infringed the GPL by releasing that code, unless your program is covered by the same versions — plural — of the GPL, in which case it would be permitted. But you might not even know that. Copilot might not tell you — it doesn't endeavor to inform you. So you're likely not to know. Which means Copilot is leading users — some of its users — into a pitfall. Well, they should fix it so it doesn't do that.

But basically, what can you expect from GitHub? GitHub gives people inadequate advice about what it means to choose a license. They tell you you can choose GPL version 2 or GPL version 3. I think they don't tell you that really you could choose GPL version 2 only, or GPL version 2 or later, or GPL version 3 only, or GPL version 3 or later — and those are four different choices. They give users different permissions over the future. So it's important to make each program say clearly which choice covers it. And GitHub doesn't tell you how to do that.

It doesn't tell you that you need to do that. Because the way you do that is with a licensed notice that is supposed to be in every source file. It's unreliable to put just one statement in a free program and say "This program is covered by such-and-such license." What happens if somebody copies one of the files into some other program which says it's covered by a different license? Now that program has been inaccurately mis-licensed, which is illegal and is going to mislead users. So any self-respecting — any repository that wants to be honest has to explain these things, not just tell people to make the licensing of each piece of code clear, but help users do so — make it easy.

So GitHub has had this enormous problem for all of its existence, and Copilot has the similar — a basically, vaguely similar sort of problem, in the same area. It's not exactly the same problem. I don't think that copying a snippet of a few lines of code infringes any license. I think it's de minimus. But I'm not a lawyer.

This discussion has been archived. No new comments can be posted.

Richard Stallman Shares His Concerns About GitHub's Copilot -- and About GitHub

Comments Filter:
  • ...so Stallman is saying that an application is going to recommend significant, "license violating" blocks of code to a programmer to solve a particular problem, but to do so the application will intentionally violate license agreements for existing code that it "steals" in order to make the recommendation. Sure, RMS, we believe you...

    The only thing less interesting than pair programming is pair programming with a synthesized buffoon as a partner. There are already programming tools that make recommendati

    • by godrik ( 1287354 ) on Saturday September 18, 2021 @12:29PM (#61808235)

      Copilot will lead to code licensing violations, and not only for GPL code. These licensing questions are very old int he software industry, we remember the "if you have seen copyrighted code that does X, you should not write ANY code to do X". The theory was that your mind was tainted with code X and whatever you come up with is a derivative. I think it is bonkers, but it gets lawyers worried. That was with proprietary code.

      We have seen demoes of copilot reproducing code as is, with comments included.

      github is full of codes licensed under a wide range of open source license. Most of the licenses on open source codes have a clause that include down to "if you use my code in a derivative, you should credit me". This clause would be violated in many usage of copilot.

      copilot is an interesting idea. But the legal consequence are annoying.

      Of course most modern tech companies don't care about intellectual properties unless it is their own and will shit on the open source movement at any chance they have.

      • Re: (Score:3, Insightful)

        by ptaff ( 165113 )

        "if you have seen copyrighted code that does X, you should not write ANY code to do X"

        Don't confuse software copyright with software patents. Software copyright does not prevent you from re-implementation.

        modern tech companies don't care about intellectual properties

        Please avoid the term "intellectual property", it clouds the mind and creates confusion about copyrights, patents and trademarks, which are distinct entities that exist for different purposes.

        • The only ones clouded are people who don't understand legal terms and concepts. [cornell.edu]

          Just because slashdot abuses words doesn't mean others don't understand it.

        • by Junta ( 36770 ) on Saturday September 18, 2021 @12:53PM (#61808315)

          Don't confuse software copyright with software patents. Software copyright does not prevent you from re-implementation.

          The problem is when you go to 're-implement' after reading the original implementation, you are very likely to reproduce it in an obvious way, even if you aren't copy pasting, you may 'derive' what at least appears to be a verbatim copy of the code. This is why clean room reverse engineering places such an emphasis on never actually seeing the code of the thing being reverse engineered. Once you see the code, you can't really guarantee you are just reimplementing the same function rather than copying. Clean-room design is critical for avoiding copyright infringement, but is useless with respect to patents.

    • by Junta ( 36770 ) on Saturday September 18, 2021 @12:39PM (#61808261)

      The difference between historical recommendation systems and copilot is that copilot was seemingly indiscriminately released on all sorts of code without regard for licensing. The hypothesis being that using code as 'training' input constitutes fair use, though I can't fathom why people make this assertion so easily. With this along with other machine learning solutions, you clearly see verbatim chunks of some piece of training fodder in the result. Historically developer aids were doing very specific things and thus were limited. Copilot does things ranging from trivial (like translate a comment like 'ignore lines starting with #' to 'if line.startswith('#'): continue') to straightforward but more helpful (fill in boilerplate formatting of data in verbose languages like javascript) to potentially just dumping a whole function that was probably lifted verbatim from another project based on the function name, without attribution or any way of knowing because ostensibly the AI 'learned' it, not copied it.

      Say, RMS, why don't you create an open source license that is easy to follow and doesn't create all these hidden "illegal" traps.

      Well, the whole point of a license is to 'create traps' and by definition they are not illegal and further they aren't hidden. They are easy to follow, but people get upset because they don't like the copyleft terms. Such people are free to ignore the hell out of GPL code if they are so bothered by it (I know in my commercial development role, I steer well clear of GPL code, even as a learning reference), but copilot is the component making things 'hidden' by producing code with unclear progeny until some human comes along who recognizes a function and then there's a mess.

      RMS may have issues, but from the perspective of intellectual property, he hasn't really done anything unreasonable. He doesn't demand everyone do it the way he thinks best, but if someone chooses to offer their work to the world under those terms, he wants those respected, and is right to highlight people using 'training' as an ill-defined loophole to potentially 'launder' license infringement. If you don't like GPL, you can public domain or use a BSD style if you at least want attribution or just make it a proprietary license, RMS nor any one else is stopping you.

    • by MtHuurne ( 602934 ) on Saturday September 18, 2021 @12:49PM (#61808291) Homepage

      ...so Stallman is saying that an application is going to recommend significant, "license violating" blocks of code to a programmer to solve a particular problem, but to do so the application will intentionally violate license agreements for existing code that it "steals" in order to make the recommendation. Sure, RMS, we believe you...

      Copilot is capable of suggesting pieces of code verbatim [twitter.com], so it's not just a theoretical problem. I don't know how often this would occur when the user isn't trying to make it happen, but the fact that it can happen at all means that I'm very hesitant to use copilot on any corporate or open source code.

      The only thing less interesting than pair programming is pair programming with a synthesized buffoon as a partner. There are already programming tools that make recommendations, yet somehow we don't have these GPL violations already. Wonder why?

      Most other tools look for particular bad patterns like common bugs and deprecated API and then recommend ways to fix them. They are not AIs trained on large amounts of open source code. In some situations an AI can store its training input instead of generalizing it. There are no restrictions on what you learn from reading someone else's code, but if you're going to copy-paste it, you need to follow its license. This is not limited to GPL either: many of the more permissive licenses still require attribution.

      In my opininon using Copilot is a risk, regardless of what you think about RMS and the GPL.

    • There are already programming tools that make recommendations, yet somehow we don't have these GPL violations already. Wonder why?

      Because you are ignorant. Copilot is different than other programming recommendation tools.

    • Re: (Score:2, Flamebait)

      by zieroh ( 307208 )

      I would spin it differently. Given the opportunity to speak on an interesting subject, Stallman essentially made it all about himself.

      Which is, you know, pretty typical.

    • ...so Stallman is saying that an application is going to recommend significant, "license violating" blocks of code to a programmer to solve a particular problem, but to do so the application will intentionally violate license agreements for existing code that it "steals" in order to make the recommendation. Sure, RMS, we believe you...

      Modern AI are barely more than semantic matching algorithms. Whatever you use to train them on is what they copy from.

  • But basically, what can you expect from GitHub? GitHub gives people inadequate advice about what it means to choose a license. They tell you you can choose GPL version 2 or GPL version 3. I think they don't tell you that really you could choose GPL version 2 only, or GPL version 2 or later, or GPL version 3 only, or GPL version 3 or later — and those are four different choices. They give users different permissions over the future. So it's important to make each program say clearly which choice covers it. And GitHub doesn't tell you how to do that.

    Well licensing is a human problem so the only thing technology can do is as he says tell you what is covered by which. No technology can make your decisions for you far as what license to use.

    • Because Stallman is a dogmatic $schmuck, he'll be ignored, but he does have a point about the use and defense of different licensing.

      Lifting open code based on one license that is toxic for another is a really slippery slope, but no one wants to take the time/effort/education to figure out what's ethical, so it's cut-paste-commit.

      If you start down the slippery slope, momentum will build, and it'll become a larger mess than the one it already is. Down the road, there's a hungry law firm that will follow this

  • RMS criticizes Microsoft ? I am shocked, shocked I tell you.
  • by Cassini2 ( 956052 ) on Saturday September 18, 2021 @12:51PM (#61808303)

    Google and Oracle [wikipedia.org] have been in court over copyright issues since 2010, and the case isn't over yet. Many believe that the few lines of code under discussion are either (a) API's and not copyrightable or (b) some trivial code that no one really cares about (rangeCheck.) Nevertheless, the lawsuit continues.

    This is untried case law. If you happen to use the CoPilot and have deep pockets, then expect to be taken to court. A lawyer will put it to the test in the hope of a huge payday.

    • by Anonymous Coward

      CoPilot is legally toxic for significant projects, at least until some serious test cases are decided.

      If you use it you have *no idea* where the code might have come from or what the consequences of using it might be. If it's for your own private use, fine; if you're going to distribute it but you're sure it's a niche non-profit thing, you're probably fine, no-one's going to bother with you.

      Otherwise, you'd better have plenty of money and a good lawyer.

      RMS is by *many* accounts pretty crappy as a person**,

      • **As in behaviour that would get most of us fired etc.

        But also shouldn't get us fired.

      • Style....It's about style. The application of license to a block of code in one project to add to another project, cannot account for the style of the new developer. How likely is it that a block of code is not relevant to the surrounding blocks of code and the style of the original over all? An automated helper will develop it's own stye therefore, the developer using the tool will adapt this style. Copying and pasting is the least of your problems if an entire project can be systematically re-coded in the
    • by storkus ( 179708 )

      ...and patent trolls target everyone else. You really think some lawyer doesn't have enough time to start targeting everyone? Unless M$ puts up and announced some kind of legal shield against having a CoPilot result get your sued, this is a huge problem.

      NEVER underestimate lawyers, especially in the U$A!

  • by NicknamesAreStupid ( 1040118 ) on Saturday September 18, 2021 @03:08PM (#61808637)
    Stallman's comments point out a few of the growing weaknesses of increasingly fossilized copyright and patent laws. Soon, AI tools will create IP issues so quickly that the judicial system will be unable to process them fast enough, much less rule upon them.
    • The lead issue in the courts is the argument can AI can own it's intellectual property so I have to assume they can create new code that "someone" deems satisfactory enough to use. Maybe an AI can gain an ISO standard? I think IP will be the next issue to settle once and for all, after that. But the language it's written in will have to be more likely an ISO standard than less. And we know how much people hate that show...
    • In this same vein, his comments point out the weakness of GPL, which is completely reliant on these legal systems. The fact is that GPL code isn't "free"—it comes with stipulations that significantly limit what the code can be used for and it uses copyright law to enforce these limits.

      Truly free code is code that can be used for anything—whether it's a Stallman approved application or not. GPL is designed to hold a monopoly on open source software licenses because if you want to use GPL code you

      • Re: (Score:3, Insightful)

        by Dog-Cow ( 21281 )

        You don't understand the GPL. Code under the GPL can be used for any purpose you desire. The terms only come into effect if you distribute the product.

      • > In this same vein, his comments point out the weakness of GPL, which is completely reliant on these legal systems.

        What is a license supposed to rely on? The goodness of people's hearts? Sharp-dressed gentlemen with full-body tattoos?

        > Stallman believes in freeâ"as in things don't cost any money.

        You haven't read anything by Stallman, have you?

    • by tlhIngan ( 30335 ) <slashdot&worf,net> on Sunday September 19, 2021 @04:24AM (#61809817)

      There's nothing fossilized about IP law. as in, what AI generates is pretty clear. GitHub's copilot has learned from everything in GitHub, which means under the law, what it generates is a derivative work of its input.

      Which means, it's currently going to be a huge mess oge licenses and lawsuits.

      Same goes for if you use an AI to generate the next song, next book, etc - it's going to create a derived work of its input for your song, so you better credit it all and pay your royalties, or book, etc.

      The only way around it is feeding your AI public domain works which will generate unencumbered output.

      One could make the argument that because of this, a strong public commons in the form of public domain is required, and that it can be provably be shown that copyright extensions are harmful to progress because it's literally limiting output.

  • Would it not be possible by issuing a new open-source license that explicitly forbids using the source code as part of the "training set" for a statistical model?

    Are there any issues that would make that difficult?

    • by spth ( 5126797 )

      It would not be possible: Forbidding the use of the source code as a training set means the license would not uphold freedom 0 (The freedom to run the program as you wish, for any purpose), and thus the license would not be free / open source.

      Many people have demanded or proposed such (i.e. take away freedom 0) licenses, usually for political causes. A few examples can be found at the Organization for Ethical Source [ethicalsource.dev], which advocates for taking away freedom 0.

      So far, they have not found widespread use.

  • "But basically, what can you expect from GitHub? GitHub gives people inadequate advice about what it means to choose a license. They tell you you can choose GPL version 2 or GPL version 3. I think they don't tell you that really you could choose GPL version 2 only, or GPL version 2 or later, or GPL version 3 only, or GPL version 3 or later — and those are four different choices. They give users different permissions over the future. So it's important to make each program say clearly which choice cover

  • Old man shakes fist at cloud? All of us read whatever codeâ(TM)s available to us (often open source, under whatever license) that informs our problem solving on any given task, and we all reuse whatever solutions we figure out on subsequent tasks. A machine learning process that complements our own natural learning isnâ(TM)t worth any fuss.

    • And also, decades later, Slashdot hasnâ(TM)t figured out character encodings. Â\_(ãf)_/Â

    • All fine until a project you depend on, or even your own project, gets shut down because of copyright claims based on that machine learning system, and you yourself may even end up being sued.
      The problem here is that it's automatically transposing code between projects, with no real awareness of the legal issues around doing that (which it can't really have in the first place, since law is a tangled, subjective nightmare). There's multiple possible copyright claimants involved and absolutely no clear idea

You know you've landed gear-up when it takes full power to taxi.

Working...