Richard Stallman Shares His Concerns About GitHub's Copilot -- and About GitHub (gnu.org) 45
destinyland writes: A newly-released video at GNU.org shows an hour-long talk given by free software advocate Richard Stallman for the BigBlueBotton open source conference (which was held online last July). After a 14-minute clip from an earlier speech, Stallman answers questions from the audience — and the first question asked Stallman for his opinion about the AI Copilot [automated pair programming tool] developed for Microsoft's GitHub in collaboration with AI research and deployment company OpenAI.
Stallman's response?
There are many legal questions about Copilot whose answers I don't know, and maybe nobody knows. And it's likely some of theo depend on the country you're in [because of the copyright laws in those countries.] In the U.S. we won't be able to have reliable answers until there are court cases about it, and who knows how many years it'll take for those court cases to arise and be finally decided. So basically what we have is a gigantic amount of uncertainty.
Now the next thing is, what about morally? What can I say morally about Copilot? Well the basic idea seems okay. Why shouldn't a program be able to give you hints like that?
But there is one pitfall, which is that if you follow those hints, you might end up putting a substantial block of code copied from a GPL-covered program, written by someone else, or one hint after another after another after another — it adds up to a substantial amount of code, perhaps, with very little change, perhaps. And then you've infringed the GPL by releasing that code, unless your program is covered by the same versions — plural — of the GPL, in which case it would be permitted. But you might not even know that. Copilot might not tell you — it doesn't endeavor to inform you. So you're likely not to know. Which means Copilot is leading users — some of its users — into a pitfall. Well, they should fix it so it doesn't do that.
But basically, what can you expect from GitHub? GitHub gives people inadequate advice about what it means to choose a license. They tell you you can choose GPL version 2 or GPL version 3. I think they don't tell you that really you could choose GPL version 2 only, or GPL version 2 or later, or GPL version 3 only, or GPL version 3 or later — and those are four different choices. They give users different permissions over the future. So it's important to make each program say clearly which choice covers it. And GitHub doesn't tell you how to do that.
It doesn't tell you that you need to do that. Because the way you do that is with a licensed notice that is supposed to be in every source file. It's unreliable to put just one statement in a free program and say "This program is covered by such-and-such license." What happens if somebody copies one of the files into some other program which says it's covered by a different license? Now that program has been inaccurately mis-licensed, which is illegal and is going to mislead users. So any self-respecting — any repository that wants to be honest has to explain these things, not just tell people to make the licensing of each piece of code clear, but help users do so — make it easy.
So GitHub has had this enormous problem for all of its existence, and Copilot has the similar — a basically, vaguely similar sort of problem, in the same area. It's not exactly the same problem. I don't think that copying a snippet of a few lines of code infringes any license. I think it's de minimus. But I'm not a lawyer.
Stallman's response?
There are many legal questions about Copilot whose answers I don't know, and maybe nobody knows. And it's likely some of theo depend on the country you're in [because of the copyright laws in those countries.] In the U.S. we won't be able to have reliable answers until there are court cases about it, and who knows how many years it'll take for those court cases to arise and be finally decided. So basically what we have is a gigantic amount of uncertainty.
Now the next thing is, what about morally? What can I say morally about Copilot? Well the basic idea seems okay. Why shouldn't a program be able to give you hints like that?
But there is one pitfall, which is that if you follow those hints, you might end up putting a substantial block of code copied from a GPL-covered program, written by someone else, or one hint after another after another after another — it adds up to a substantial amount of code, perhaps, with very little change, perhaps. And then you've infringed the GPL by releasing that code, unless your program is covered by the same versions — plural — of the GPL, in which case it would be permitted. But you might not even know that. Copilot might not tell you — it doesn't endeavor to inform you. So you're likely not to know. Which means Copilot is leading users — some of its users — into a pitfall. Well, they should fix it so it doesn't do that.
But basically, what can you expect from GitHub? GitHub gives people inadequate advice about what it means to choose a license. They tell you you can choose GPL version 2 or GPL version 3. I think they don't tell you that really you could choose GPL version 2 only, or GPL version 2 or later, or GPL version 3 only, or GPL version 3 or later — and those are four different choices. They give users different permissions over the future. So it's important to make each program say clearly which choice covers it. And GitHub doesn't tell you how to do that.
It doesn't tell you that you need to do that. Because the way you do that is with a licensed notice that is supposed to be in every source file. It's unreliable to put just one statement in a free program and say "This program is covered by such-and-such license." What happens if somebody copies one of the files into some other program which says it's covered by a different license? Now that program has been inaccurately mis-licensed, which is illegal and is going to mislead users. So any self-respecting — any repository that wants to be honest has to explain these things, not just tell people to make the licensing of each piece of code clear, but help users do so — make it easy.
So GitHub has had this enormous problem for all of its existence, and Copilot has the similar — a basically, vaguely similar sort of problem, in the same area. It's not exactly the same problem. I don't think that copying a snippet of a few lines of code infringes any license. I think it's de minimus. But I'm not a lawyer.
doesn't even pass the sniff test (Score:1, Flamebait)
...so Stallman is saying that an application is going to recommend significant, "license violating" blocks of code to a programmer to solve a particular problem, but to do so the application will intentionally violate license agreements for existing code that it "steals" in order to make the recommendation. Sure, RMS, we believe you...
The only thing less interesting than pair programming is pair programming with a synthesized buffoon as a partner. There are already programming tools that make recommendati
Re: (Score:1)
The vast majority of open source developers are merely posers to you then.
When people create their works, they have some interest that they think is best served by their choice of license. Companies love it when open source devs go public domain or BSD style, since they can bake that code right into whatever they want and at worst have to worry about attribution. GPL frustrates because they are forced to develop out in the open if they want to partake.
I suspect the Linux kernel would not be where it is to
Re: (Score:2, Informative)
> When did github become copyright police/enforcement?
"Police enforcement" is a misrepresentation of how copyright disputes are resolved. Police rarely get involved except to enforce judgements, like any other civil case. If you use copyrighted works, you are responsible for that use in various ways.
Github accesses copyrighted code then presents it to other users for use. If you present work to someone without disclosure, you're liable for damages resulting in your (Github Copilot's) action. To present s
Re:Much ado about nothing. (Score:5, Informative)
When they started offering up code that sometimes is a verbatim copy of code from another project without attribution or associated license for the source material. They aren't a passive repository of code at that point, they are actively influencing the code written and doing so by sourcing code that may not be compatible from a license perspective.
Hard for coders to know the license of the software that is in their project if they use copilot which doesn't explain licensing terms.
Also, at the same time as launching copilot, Github put out a lot of press in the media to declare as certain fact that machine learning of a codebase doesn't count as being subject to copyright infringement claims. There was no public debate, simply poof, there it was and github had preloaded a lot of tech coverage to be favorable to try to have at least the public consider the matter settled before there could be a debate. Presumably giving some of the lawyers involved the nicer looking code generation, where it looks pretty inoffensive and avoiding the examples some people have given of more blatantly obvious lifting of intact code segments from projects.
Re: (Score:1)
When did github become copyright police/enforcement?
When did k0d3rz stop being responsible for providing the license for their wares?
When did you become a stupid fuck?
Re: (Score:2)
OTOH, a BSD style license is best for reference software. Would the internet have taken off if instead of a BSD licensed stack, it was GPL? Would gzip be everywhere? Things like flac also depend on a BSD style license for the reference encoder/decoder. Basically software that needs to inter-operate so closed modifications that don't change the fundamentals are fine. I guess there is still the danger of the gorilla subverting things.
Re: (Score:2)
If you *really* believe in free software (free meaning "freedom") then your license will consist of exactly one sentence: "You are free to do whatever you want with this software".
^^^ The headspace of libertarians
Re: (Score:2)
If you *really* believe in free software (free meaning "freedom") then your license will consist of exactly one sentence: "You are free to do whatever you want with this software".
That is moronically foolish and results only in a capture of the commons scenario, which has been played out in many contexts for hundreds of years (maybe thousands). If you are pig-ignorant then maybe you shouldn't make posts about these things.
Re:doesn't even pass the sniff test (Score:5, Insightful)
Copilot will lead to code licensing violations, and not only for GPL code. These licensing questions are very old int he software industry, we remember the "if you have seen copyrighted code that does X, you should not write ANY code to do X". The theory was that your mind was tainted with code X and whatever you come up with is a derivative. I think it is bonkers, but it gets lawyers worried. That was with proprietary code.
We have seen demoes of copilot reproducing code as is, with comments included.
github is full of codes licensed under a wide range of open source license. Most of the licenses on open source codes have a clause that include down to "if you use my code in a derivative, you should credit me". This clause would be violated in many usage of copilot.
copilot is an interesting idea. But the legal consequence are annoying.
Of course most modern tech companies don't care about intellectual properties unless it is their own and will shit on the open source movement at any chance they have.
Re: (Score:3, Insightful)
Don't confuse software copyright with software patents. Software copyright does not prevent you from re-implementation.
Please avoid the term "intellectual property", it clouds the mind and creates confusion about copyrights, patents and trademarks, which are distinct entities that exist for different purposes.
Re: (Score:3)
The only ones clouded are people who don't understand legal terms and concepts. [cornell.edu]
Just because slashdot abuses words doesn't mean others don't understand it.
Re:doesn't even pass the sniff test (Score:4, Informative)
Don't confuse software copyright with software patents. Software copyright does not prevent you from re-implementation.
The problem is when you go to 're-implement' after reading the original implementation, you are very likely to reproduce it in an obvious way, even if you aren't copy pasting, you may 'derive' what at least appears to be a verbatim copy of the code. This is why clean room reverse engineering places such an emphasis on never actually seeing the code of the thing being reverse engineered. Once you see the code, you can't really guarantee you are just reimplementing the same function rather than copying. Clean-room design is critical for avoiding copyright infringement, but is useless with respect to patents.
Re:doesn't even pass the sniff test (Score:5, Insightful)
The difference between historical recommendation systems and copilot is that copilot was seemingly indiscriminately released on all sorts of code without regard for licensing. The hypothesis being that using code as 'training' input constitutes fair use, though I can't fathom why people make this assertion so easily. With this along with other machine learning solutions, you clearly see verbatim chunks of some piece of training fodder in the result. Historically developer aids were doing very specific things and thus were limited. Copilot does things ranging from trivial (like translate a comment like 'ignore lines starting with #' to 'if line.startswith('#'): continue') to straightforward but more helpful (fill in boilerplate formatting of data in verbose languages like javascript) to potentially just dumping a whole function that was probably lifted verbatim from another project based on the function name, without attribution or any way of knowing because ostensibly the AI 'learned' it, not copied it.
Say, RMS, why don't you create an open source license that is easy to follow and doesn't create all these hidden "illegal" traps.
Well, the whole point of a license is to 'create traps' and by definition they are not illegal and further they aren't hidden. They are easy to follow, but people get upset because they don't like the copyleft terms. Such people are free to ignore the hell out of GPL code if they are so bothered by it (I know in my commercial development role, I steer well clear of GPL code, even as a learning reference), but copilot is the component making things 'hidden' by producing code with unclear progeny until some human comes along who recognizes a function and then there's a mess.
RMS may have issues, but from the perspective of intellectual property, he hasn't really done anything unreasonable. He doesn't demand everyone do it the way he thinks best, but if someone chooses to offer their work to the world under those terms, he wants those respected, and is right to highlight people using 'training' as an ill-defined loophole to potentially 'launder' license infringement. If you don't like GPL, you can public domain or use a BSD style if you at least want attribution or just make it a proprietary license, RMS nor any one else is stopping you.
Re:doesn't even pass the sniff test (Score:4, Informative)
...so Stallman is saying that an application is going to recommend significant, "license violating" blocks of code to a programmer to solve a particular problem, but to do so the application will intentionally violate license agreements for existing code that it "steals" in order to make the recommendation. Sure, RMS, we believe you...
Copilot is capable of suggesting pieces of code verbatim [twitter.com], so it's not just a theoretical problem. I don't know how often this would occur when the user isn't trying to make it happen, but the fact that it can happen at all means that I'm very hesitant to use copilot on any corporate or open source code.
The only thing less interesting than pair programming is pair programming with a synthesized buffoon as a partner. There are already programming tools that make recommendations, yet somehow we don't have these GPL violations already. Wonder why?
Most other tools look for particular bad patterns like common bugs and deprecated API and then recommend ways to fix them. They are not AIs trained on large amounts of open source code. In some situations an AI can store its training input instead of generalizing it. There are no restrictions on what you learn from reading someone else's code, but if you're going to copy-paste it, you need to follow its license. This is not limited to GPL either: many of the more permissive licenses still require attribution.
In my opininon using Copilot is a risk, regardless of what you think about RMS and the GPL.
Re: (Score:2)
There are already programming tools that make recommendations, yet somehow we don't have these GPL violations already. Wonder why?
Because you are ignorant. Copilot is different than other programming recommendation tools.
Re: (Score:2, Flamebait)
I would spin it differently. Given the opportunity to speak on an interesting subject, Stallman essentially made it all about himself.
Which is, you know, pretty typical.
Re: (Score:2)
...so Stallman is saying that an application is going to recommend significant, "license violating" blocks of code to a programmer to solve a particular problem, but to do so the application will intentionally violate license agreements for existing code that it "steals" in order to make the recommendation. Sure, RMS, we believe you...
Modern AI are barely more than semantic matching algorithms. Whatever you use to train them on is what they copy from.
Human rules. (Score:2)
But basically, what can you expect from GitHub? GitHub gives people inadequate advice about what it means to choose a license. They tell you you can choose GPL version 2 or GPL version 3. I think they don't tell you that really you could choose GPL version 2 only, or GPL version 2 or later, or GPL version 3 only, or GPL version 3 or later — and those are four different choices. They give users different permissions over the future. So it's important to make each program say clearly which choice covers it. And GitHub doesn't tell you how to do that.
Well licensing is a human problem so the only thing technology can do is as he says tell you what is covered by which. No technology can make your decisions for you far as what license to use.
Re: (Score:2)
Because Stallman is a dogmatic $schmuck, he'll be ignored, but he does have a point about the use and defense of different licensing.
Lifting open code based on one license that is toxic for another is a really slippery slope, but no one wants to take the time/effort/education to figure out what's ethical, so it's cut-paste-commit.
If you start down the slippery slope, momentum will build, and it'll become a larger mess than the one it already is. Down the road, there's a hungry law firm that will follow this
Surprise not! (Score:1)
Lawyers target deep pockets (Score:5, Interesting)
Google and Oracle [wikipedia.org] have been in court over copyright issues since 2010, and the case isn't over yet. Many believe that the few lines of code under discussion are either (a) API's and not copyrightable or (b) some trivial code that no one really cares about (rangeCheck.) Nevertheless, the lawsuit continues.
This is untried case law. If you happen to use the CoPilot and have deep pockets, then expect to be taken to court. A lawyer will put it to the test in the hope of a huge payday.
Re: (Score:1)
CoPilot is legally toxic for significant projects, at least until some serious test cases are decided.
If you use it you have *no idea* where the code might have come from or what the consequences of using it might be. If it's for your own private use, fine; if you're going to distribute it but you're sure it's a niche non-profit thing, you're probably fine, no-one's going to bother with you.
Otherwise, you'd better have plenty of money and a good lawyer.
RMS is by *many* accounts pretty crappy as a person**,
Re: (Score:2)
**As in behaviour that would get most of us fired etc.
But also shouldn't get us fired.
Re: (Score:1)
Re: (Score:2)
...and patent trolls target everyone else. You really think some lawyer doesn't have enough time to start targeting everyone? Unless M$ puts up and announced some kind of legal shield against having a CoPilot result get your sued, this is a huge problem.
NEVER underestimate lawyers, especially in the U$A!
Re: (Score:1)
The GPL has been upheld in court many times. Possibly hundreds by now. What are you talking about?
Intellectual property dillemma (Score:4, Interesting)
Re: (Score:1)
Re: (Score:2)
In this same vein, his comments point out the weakness of GPL, which is completely reliant on these legal systems. The fact is that GPL code isn't "free"—it comes with stipulations that significantly limit what the code can be used for and it uses copyright law to enforce these limits.
Truly free code is code that can be used for anything—whether it's a Stallman approved application or not. GPL is designed to hold a monopoly on open source software licenses because if you want to use GPL code you
Re: (Score:3, Insightful)
You don't understand the GPL. Code under the GPL can be used for any purpose you desire. The terms only come into effect if you distribute the product.
Re: (Score:3)
> In this same vein, his comments point out the weakness of GPL, which is completely reliant on these legal systems.
What is a license supposed to rely on? The goodness of people's hearts? Sharp-dressed gentlemen with full-body tattoos?
> Stallman believes in freeâ"as in things don't cost any money.
You haven't read anything by Stallman, have you?
Re:Intellectual property dillemma (Score:4, Interesting)
There's nothing fossilized about IP law. as in, what AI generates is pretty clear. GitHub's copilot has learned from everything in GitHub, which means under the law, what it generates is a derivative work of its input.
Which means, it's currently going to be a huge mess oge licenses and lawsuits.
Same goes for if you use an AI to generate the next song, next book, etc - it's going to create a derived work of its input for your song, so you better credit it all and pay your royalties, or book, etc.
The only way around it is feeding your AI public domain works which will generate unencumbered output.
One could make the argument that because of this, a strong public commons in the form of public domain is required, and that it can be provably be shown that copyright extensions are harmful to progress because it's literally limiting output.
Can't be fixed with a new license? (Score:2)
Would it not be possible by issuing a new open-source license that explicitly forbids using the source code as part of the "training set" for a statistical model?
Are there any issues that would make that difficult?
Re: (Score:2)
It would not be possible: Forbidding the use of the source code as a training set means the license would not uphold freedom 0 (The freedom to run the program as you wish, for any purpose), and thus the license would not be free / open source.
Many people have demanded or proposed such (i.e. take away freedom 0) licenses, usually for political causes. A few examples can be found at the Organization for Ethical Source [ethicalsource.dev], which advocates for taking away freedom 0.
So far, they have not found widespread use.
Really, that's it? (Score:2)
"But basically, what can you expect from GitHub? GitHub gives people inadequate advice about what it means to choose a license. They tell you you can choose GPL version 2 or GPL version 3. I think they don't tell you that really you could choose GPL version 2 only, or GPL version 2 or later, or GPL version 3 only, or GPL version 3 or later — and those are four different choices. They give users different permissions over the future. So it's important to make each program say clearly which choice cover
Nonissue. (Score:2)
Old man shakes fist at cloud? All of us read whatever codeâ(TM)s available to us (often open source, under whatever license) that informs our problem solving on any given task, and we all reuse whatever solutions we figure out on subsequent tasks. A machine learning process that complements our own natural learning isnâ(TM)t worth any fuss.
Re: Nonissue. (Score:2)
And also, decades later, Slashdot hasnâ(TM)t figured out character encodings. Â\_(ãf)_/Â
Re: (Score:2)
The problem here is that it's automatically transposing code between projects, with no real awareness of the legal issues around doing that (which it can't really have in the first place, since law is a tangled, subjective nightmare). There's multiple possible copyright claimants involved and absolutely no clear idea