Forgot your password?
typodupeerror
Open Source AI Software

AI Can Clone Open-Source Software In Minutes 125

ZipNada writes: Two software researchers recently demonstrated how modern AI tools can reproduce entire open-source projects, creating proprietary versions that appear both functional and legally distinct. The partly-satirical demonstration shows how quickly artificial intelligence can blur long-standing boundaries between coding innovation, copyright law, and the open-source principles that underpin much of the modern internet.

In their presentation, Dylan Ayrey, founder of Truffle Security, and Mike Nolan, a software architect with the UN Development Program, introduced a tool they call malus.sh. For a small fee, the service can "recreate any open-source project," generating what its website describes as "legally distinct code with corporate-friendly licensing. No attribution. No copyleft. No problems." It's a test case in how intellectual property law -- still rooted in 19th-century precedent -- collides with 21st-century automation. Since the US Supreme Court's Baker v. Selden ruling, copyright has been understood to guard expression, not ideas.

That boundary gave rise to clean-room design, a method by which engineers reverse-engineer systems without accessing the original source code. Phoenix Technologies famously used the technique to build its version of the PC BIOS during the 1980s. Ayrey and Nolan's experiment shows how AI can perform a clean-room process in minutes rather than months. But faster doesn't necessarily mean fair. Traditional clean-room efforts required human teams to document and replicate functionality -- a process that demanded both legal oversight and significant labor. By contrast, an AI-mediated "clean room" can be invoked through a few prompts, raising questions about whether such replication still counts as fair use or independent creation.
This discussion has been archived. No new comments can be posted.

AI Can Clone Open-Source Software In Minutes

Comments Filter:
  • by haruchai ( 17472 ) on Wednesday April 01, 2026 @05:09PM (#66072850)

    because it seems there's going to be a lot more IP infringement and it won't be limited to open source

    • Re: (Score:3, Interesting)

      by Anonymous Coward
      From their own website... They offer "Full legal indemnification* *Through our offshore subsidiary in a jurisdiction that doesn't recognize software copyright" So, apparently, problem solved ;-D
      • ChatGpt says that qualifies as sound legal advice, so we're good. Ship it!
        • by drnb ( 2434720 )

          ChatGpt says that qualifies as sound legal advice, so we're good. Ship it!

          I know you're joking, but I really hope that the first court where someone tries that has public cameras. The judge's reaction and response will probably be awesome.

      • by homerbrew ( 10094532 ) on Wednesday April 01, 2026 @06:43PM (#66073034)
        Of course, I doubt I would call it a clean room design, especially if the AI was trained with that open source project. Once it has seen that original code in it's training, it's quite difficult to convince me that it didn't rely on that code in any way.
        • by CAIMLAS ( 41445 )

          If you can substantially distinguish how tokenized abstraction is any different than natural learning, I'd buy it. But as it's not, I don't think that's a meaningful argument.

          • Generally if the implementors have seen the original then it's not clean room.

            • Precisely. Even if one session is fed the explicit code and documents it, then the second session generates code ostensibly based on the documentation generated by the first without having been fed the original code explicitly, the AI underlying both sessions was itself trained on the original code, even if a previous version of it, and holds large chunks of it lossy-compressed within its internal weights, to the point that, with the proper prompting in an entirely unrelated third session, we can get it to

              • I'm skeptical this company is doing it properly (or even has their own models), but I think you could do this with two models.

                The documenter is trained on all available data.

                The coder is trained but without any copy left code.

                Clean room reverse engineering actually seems like a place where AI will be extremely capable.

                • by flink ( 18449 )

                  The coder is trained but without any copy left code.

                  It costs 10s of millions of dollars to train a big competent LLM. GPT-4 cost ~$74M to train, for example. You can hire a team of human devs who have never looked at the source to do a clean room rewrite the project for a fraction of the cost it would take to develop a "clean" model.

                  That said, I could see a use for a model that was only trained on MIT-licensed or public domain code.

            • So have one AI map out how something works, and have another AI recreate it?

              Of cos if the AI was trained on the original.......

        • Of course, I doubt I would call it a clean room design, especially if the AI was trained with that open source project. Once it has seen that original code in it's training, it's quite difficult to convince me that it didn't rely on that code in any way.

          An AI session does not necessarily train the AI. Two different AI sessions can be like two different people. The second session won't have any memory of the first. So it is effectively like one person writing a clean spec and a second implementing that spec.

          • by quenda ( 644621 )

            An AI session does not necessarily train the AI. ... The second session won't have any memory of the first. So it is effectively like one person writing a clean spec

            I think he is talking about the actual training of the model, not the inference sessions. LLMs are trained on a LOT of open-source code.
            The big models have read countless terrabytes of copyrighted code (Including GPL) in training. So if you ever ask AI to write a clone of GNU software, it might be legal, but can never be truly clean-room.

        • by gweihir ( 88907 )

          It does indeed in no way fulfill the requirements for a "clean room" clone. And, worse, even if it did, there was no way to prove that. For a proof of a "clean room" reimplementation, you need to demonstrate conclusively that the implementors never came into contact with the original code.

          Also remember that the result has no now copyright and ownership. It either has none at all or retains the original one.

          • That may be the case in the US but not in Europe. There we have interop privileges in the EU software copyright directive.
            This is why your the steward organisation shall be based in Europe or any other place that grants you legal satefy.

            To satisfy US requirements the trick is usually to separate research from implementation.

            that is, you have one project that documents functionality, and another independent project that implements the spec.

            Adversarial interoperability of course needs to get stengthened.

        • Once it has seen that original code in it's training, it's quite difficult to convince me that it didn't rely on that code in any way.

          The problem with this is a question of learning vs copying. From a copyright point of view virtually all the great artists of history studied under someone else. Should the Cristine Chapel painting be attributed to Domenico Ghirlandaio instead of Michelangelo simply because the latter learned and studied under the former? It will most definitely have the former's influence just like everything has influence in life.

        • This really does not matter. Copyright protects only the expession of software. You are free to clone software with independent code.

        • by allo ( 1728082 )

          If the problem is that the original is in the training data (and memorized) then you should be able to find recognizable parts in the output.
          The point here is more, that you have really fast clean-room reverse engineering, that produces *different* code with the same functionality.

          The next step to make it bullet proof would be to use two distinct models for writing spec and implementing spec.
          The software world is changing, because automated code writing is becoming good. It is interesting where things will

      • From their own website... They offer "Full legal indemnification* *Through our offshore subsidiary in a jurisdiction that doesn't recognize software copyright" So, apparently, problem solved ;-D

        I don't think it works that way. If I'm using copyrighted software without a license, the source of that software does not change the fact that I am in violation. It's about use, not where you got it.

    • because it seems there's going to be a lot more IP infringement and it won't be limited to open source

      It'll get real interesting when we start finding out the actual Intellectual in all those Property claims, is AI.

    • Really, someday anyone with a decently powerful enough desktop will be able to use AI to clone any Microsoft application that runs nativly on Linux
    • I certainly hope so! At least the lawyers!

      If AI is here to replace human labor, lawyers are very, very ripe for cloning. What they do is mostly just filling in blanks and summarizing documents already. AI is really, really good at that kind of thing. And it would be a lot cheaper to use AI than to hire a lawyer.

    • by AmiMoJo ( 196126 )

      I wonder if it's already happened. I can imagine a law firm creating an LLM based on the writings and court transcripts of a particular judge, and then using it to test out different strategies to see which is likely to be most effective with them.

  • Clean room? (Score:5, Insightful)

    by Pinky's Brain ( 1158667 ) on Wednesday April 01, 2026 @05:18PM (#66072886)

    Even if you use an AI to extract an extremely condensed specification out of the source code, it's hardly clean room if the LLM was pre-trained on the source code any way.

    • If it really was clean room, they would be able to do replicas of closed source projects just as easily. Or, indeed, just about anything.
      • With open source you can document a lot more than with closed source.

        • Assuming the hype is real, I wonder if it's possible to train these algorithms on machine code, or if they need the semantics and expressiveness of a HLL.

      • by allo ( 1728082 )

        If you write the spec, it can do. The point here is, that the first step is to convert code into a well-written specification. With closed source you need a human to write the specification (or a model working a lot longer using screenshot tools and what ever. I bet we will see such systems in the future).

    • This "clean room", to draw an analogy to medicine, is like if your "sterile surgical dressings" were made of syphilis viruses knitted together into a fabric.
    • Re:Clean room? (Score:5, Interesting)

      by Waffle Iron ( 339739 ) on Wednesday April 01, 2026 @08:32PM (#66073176)

      Even if you use an AI to extract an extremely condensed specification out of the source code, it's hardly clean room if the LLM was pre-trained on the source code any way.

      I once worked at a place that had a clean room process to create code compatible with a proprietary product. Anybody who had ever seen the original code or even loaded the original binary into a debugger was not allowed to write any code at all for the cloned product. The clone writers generally worked only off of the specifications and user documentation.

      There were a handful of people who were allowed to debug the original to resolve a few questions about low-level compatibility. The only way they were allowed to communicate with the software writers was through written questions and answers that left a clear paper trail, and the answers had to be as terse as possible (usually just yes or no). Everyone knew that these memos were highly likely to be used as evidence in legal proceedings.

      I highly doubt that any AI tech bros have ever been this rigorous, and I'd bet that most of these AIs have been trained on the exact same source code that they are cloning.

      • I'd bet that most of these AIs have been trained on the exact same source code that they are cloning.

        That may have even happened indirectly. Consider a computer science textbook discussing some classic OS topic and it offers a snippet of Linux code as a sample implementation. I think such exposure would disqualify a person from working on a clean room implementation team.

        • No, it's been documented that the actual code from open source projects was downloaded and processed. Look it up. There's no clean room anything going on here, the people mentioned in TFA are deluding themselves.
    • It doesn't have to be "clean room." If the new code is distinctly different from the original, it would be extremely difficult to claim copyright infringement.

      It's not illegal to make a Word Processor just because Word is copyrighted.

  • Software Cloning (Score:5, Interesting)

    by silentbozo ( 542534 ) on Wednesday April 01, 2026 @05:23PM (#66072898) Journal

    Can it clone proprietary software and turn it into an open source project?

    If so, then I think the tradeoff is fair.

    • Maybe have it create and publish the source code of 64-bit Windows 7, w/ no allowances for any assembly language? Then port it to all non-x86 CPUs - RISC-V, Arm, and even legacy NT hardware like Alpha and MIPS

      • Maybe have it create and publish the source code of 64-bit Windows 7, w/ no allowances for any assembly language? Then port it to all non-x86 CPUs - RISC-V, Arm, and even legacy NT hardware like Alpha and MIPS

        And PoewrPC. Damn, PowerPC never gets any respect. ;-)

    • by Sloppy ( 14984 )

      So begin the Obfuscated Object Code Compiler wars, to keep robots from writing machine language decompilers. The next few years are gonna be a wild ride!

    • Can it clone proprietary software and turn it into an open source project?

      I think the answer is no if you don't have clean access to the proprietary software, e.g., if you decompile or reverse engineer it in violation of a license agreement that you agreed to. That taints the spec, which taints the clean room reimplementation. I think this also applies to leaked software - if you know it's someone's trade secret, but you use it anyway to create a competing product, you can be sued.

      • I think this also applies to leaked software - if you know it's someone's trade secret, but you use it anyway to create a competing product, you can be sued.

        It depends on if you were the person who disclosed or used the trade secret in violation of some sort of non-disclosure or maintenance of IP agreement. However once in the wild, other persons may use the info.

        If a trade secret is disclosed then the IP protection is lost. The disclosure of a trade secret does not have to be intentional or authorized.

    • Can it clone proprietary software and turn it into an open source project?

      If so, then I think the tradeoff is fair.

      There is no tradeoff at all. This process takes Open Source code and turns it into unmaintainable gibberish. This same product also takes Proprietary software and turns it into unmaintainable gibberish.

      Nothing was taken from Open Source, but something was taken from Proprietary.

      You are further from a usable product if you use this on Open Source.

      You are closer to a usable product if you use it on Proprietary software.

  • by Local ID10T ( 790134 ) <ID10T.L.USER@gmail.com> on Wednesday April 01, 2026 @05:24PM (#66072900) Homepage

    "legally distinct code with corporate-friendly licensing. No attribution. No copyleft. No problems."

    They can claim that it is legally distinct, but until they win the lawsuit and appeals to set a legal precedent -it is not safe to make the assumption.

    • Because it's AI, there's not copyright either. Honestly a fair trade, especially if AI can clone proprietary programs just as easily.

      • by Ksevio ( 865461 )

        That's kind of a weird concept though - Can you remove a license from code by just passing it through an LLM?

    • "legally distinct code with corporate-friendly licensing. No attribution. No copyleft. No problems."

      They can claim that it is legally distinct, but until they win the lawsuit and appeals to set a legal precedent -it is not safe to make the assumption.

      The above is subject to misinterpretation. The copyright owner must demonstrate its a derivative and win in court. Owner must prove guilt, publisher does not need to prove innocence.

  • by Gravis Zero ( 934156 ) on Wednesday April 01, 2026 @05:27PM (#66072912)

    Despite the open source spin, source code is not required to do this as source code can also be generated from binaries. It shouldn't be shocking by now to learn that you can fully automate breaking down executable into functional source code with the addition of AI to "make sense" of the generated code. As such, this means that even large sophisticated and complex programs are also targets.

    The real question is, who wants to deal with a massive amount of AI slop code?

    • Re: (Score:2, Insightful)

      by gweihir ( 88907 )

      While true, legally it makes no difference whether you steal the sources or the binary. It is still stolen. And a clean-room implementation requires the code-writers to never have seen the original in any form. You cannot have an engineer analyze the original and then write a copy. Hence it is immediately plausible that having an AI train on the original or ingest it in a query and then writing a new version is not a clean-room clone (and only those are legal per default) at all.

      I do agree on the slop.

      • While true, legally it makes no difference whether you steal the sources or the binary. It is still stolen.

        Why would you steal it if you can simply license it?

        And a clean-room implementation requires the code-writers to never have seen the original in any form. You cannot have an engineer analyze the original and then write a copy.

        As the article explains, it's a clean room implementation because you use two different instances of an AI.
        * AI 1 documents how the code works in a human readable descriptions. (i.e. does the reverse engineering)
        * AI 2 constructs an entirely new codebase from the human readable descriptions in the documentation. (i.e. does the forward engineering)

        Since AI 2 has never seen or analyzed the original code/binary and has only ever read the documentation about i

        • by gweihir ( 88907 )

          You not only need to do a clean-room, you need to be able to prove it and that set-up does not allow it.

          • You not only need to do a clean-room, you need to be able to prove it and that set-up does not allow it.

            Actually, this setup would be more provable than having two people do it because you can literally record the entire process from start to finish.

            Have two different computers with no hardware in common.
            Computer 1 interprets the program and generates the documentation, saving it to a USB drive.
            You unplug the USB drive and move it over to Computer 2.
            Computer 2 reads the documentation and generates a new code base.
            You can read the documentation and there was no other means of communication.

            If you don't think a

            • by gweihir ( 88907 )

              You cannot do this with two people. Not possible. You need at the very least three (bun in practice a lot more), and the middle one is the info barrier. The middle makes sure no details about the implementation leaks and the middle team is under oath and likely needs to be more technologically competent than analysis and implementation team and also needs to be legally competent.

              And that is why you cannot automatize this process.

  • by unixisc ( 2429386 ) on Wednesday April 01, 2026 @05:30PM (#66072922)

    That way, make that OS at least a stable, working one, instead of the alpha stage that it's been in for 30 years

    • by organgtool ( 966989 ) on Wednesday April 01, 2026 @05:41PM (#66072946)
      How do you expect ReactOS to be stable and working when the purpose is to achieve feature-parity with Windows?
      • Just make sure you are one major release behind Microsoft at all times.

        Windows 95? Garbage. 98 especially SE? Just fine. Windows millennium was hot trash and XP especially service pack 2 just fine. Vista garbage, 7 just fine. 8 was a dumpster fire. 10 was okay. And now 11 is right back where we started.

        Although honestly I don't know if we will ever get a working Windows 12. Microsoft has very very little competition anymore. Basically just Apple and there's a laundry list of reasons why that's a pr
  • by SlashbotAgent ( 6477336 ) on Wednesday April 01, 2026 @05:48PM (#66072954)

    git clone https://github.com/YourStuff/A... [github.com]

    But, when I put my name on it everybody gets all pissy.

    • by gweihir ( 88907 )

      But, when I put my name on it everybody gets all pissy.

      Yeah, no idea why that happens...

  • by PPH ( 736903 ) on Wednesday April 01, 2026 @06:01PM (#66072970)

    No attribution. No copyleft. No problems.

    No copyright either if it's AI generated. So, no "corporate-friendly licensing".

    • Yes this is my understanding as well. This has been tested in court. AI generated materials cannot be copyrighted. Legal Precedents: Cases such as Thaler v. Perlmutter have cemented that AI systems cannot be listed as authors.
  • by williamyf ( 227051 ) on Wednesday April 01, 2026 @06:04PM (#66072972)

    for clean room implementations.

    If the AI model was trained using the OG software project that is being replicated, they are screwed.

    That should be very easy to see, in the discovery phase just ask for a list of all the software that was used to train the AI model. IS a yes/no answer, if the AI saw the OG software, then there was no clean room, the room was dirty, very, very dirty

    • Thank you, that's exactly what I was trying to day. If it is cloning anything it was trained on, then it is definitely NOT a clean room design, and therefore infringes on the copyright!
    • by gweihir ( 88907 )

      Indeed. The beauty of this April fools is that it has a high level of credibility ... to the stupid.

    • It seems like this is a problem easily rectified -- someone could prepare a model where any code with a viral license is filtered from the training set.

  • by gurps_npc ( 621217 ) on Wednesday April 01, 2026 @06:05PM (#66072976) Homepage

    If the AI can clone free software, then it should be able to clone non-free software. The real question is whether we should bother copyrighting any software if it can be so easily duplicated.

    Nobody is going to copyright my voice singing - there are so many other better singers.

    If software becomes so easy to create than it loses it's value.

    Hm - perhaps someone should clone all the software we install on items we buy that comes with licenses that prevent repair. We own the hardware and we usually hate the software that comes on smart appliances. A cheap replacement for it may screw with their illegal and unethical attempts to control what you do with stuff you own and they do not.

    • It has access to the open source source. It doesn't have access to the non-free source. Of course, anything it clones by virtue of being trained on what it's cloning IS copyright infringement! This might create the case that breaks the back of AI-generated content. It's all infringing on what it was trained with!
      • by Holi ( 250190 )

        If it is using the source to clone, it is not clean room clone and thus violates copyright.

  • If it's trained on the actual open source software it's cloning, then it isn't a "clean room design", is it?
    • by gweihir ( 88907 )

      Not at all. If somebody actually took this serious, they would likely be in a world of legal hurt. But remember the date...

  • I guess that the "proprietary code" isn't so proprietary at all.

  • by Princeofcups ( 150855 ) <john@princeofcups.com> on Wednesday April 01, 2026 @06:41PM (#66073032) Homepage

    So it's slow as fuck, with memory leaks, impossible to maintain, lacking comments, nasty race conditions, 10 times bigger than the original, uses 10 times the memory, freeze trying to open files.... you know, the coding stuff.

    Let me know when we can see some head to head QA. Hey, maybe we are there. But I've not seen anything more than vague "proofs of concept." I still want to see AI produce microcode for a new undocumented chop/board. Do you read it the API like a nursery rhyme?

    Or to put it another way, if it relies on samples of code to exploit, how is it going to produce NEW code?

    • You haven't actually seen AI-generated code, have you!

      These days, AI generates code that is readable and has meaningful comments (well, as meaningful as most comments written by human coders anyway). AI tends to be good about properly structuring code to eliminate memory leaks. The code isn't necessarily blazing fast, but it seems to compare with typical human-written code. And if you ask it to improve the performance, it often can do so successfully.

      The AI code you are describing is so 2025.

    • Ignorance is a state of bliss for you.

      AI readily generates code that exceeds Jr level programmer code by a wide margin.

      It can also produce utter garbage.

      But it is in no way equivalent to monkeys on a keyboard. Its success rate is significantly higher than that.

  • For all the open source projects that were turned into commercial versions by introducing proprietary elements, it seems AI can be used to replicate the proprietary components back as open source.

  • There is no reason why billionaires should have a separate lawset.
  • There's a copy of the old Windows XP leak out there on Github. So an AI trained on the Github repos is liable to have been trained on Windows XP code. And that is ultimately a liability for people who use this tech.
  • by Sean Clifford ( 322444 ) on Wednesday April 01, 2026 @09:57PM (#66073272) Journal

    Well, Hell, *I* can clone OSS in seconds via a pull. Jeebus. AI blah blah blah AI staff cuts blah blah blah paradigm shift....yawn.

  • Only a software person could come up with a name like "malus.sh".
  • "Clean-room" means you have one group of engineers study existing code and create a specification and then another group of engineers takes that specification and writes new code that does what the original code did. This is because copyright protects expression, not ideas, AND that independent creation of the same expression is not infringement either.

    If you have the same person reading the old code and writing the new code, then to whatever extent the expression is similar there is no protection under cop

    • by gweihir ( 88907 )

      No, this is definitely not "clean room". An actual clean room clone requires a very competent analysis team that writes a spec. A ton of legal people that verify the spec does not contain descriptions of the original code and that can attest so under oath. And an implementation team that has never seen the original code and only gets said spec. It is a huge and very fragile undertaking. An AI that may have seen the code does not cut it in any way.

      But remember the date. This is a really good April's fool's

  • If it can clone open source then it should be able to clone closed source applications. Unless it's just taking the existing code and re-formatting it.

    • There is this thint called reverse engineering. I have found AI to be surprisingly helpful at it with binary payloads from various devices. I'm sure it could reverse engineer compiled binaries, too. Whether that's legal is another issue. But when has the law ever stopped AI companies?

  • by gweihir ( 88907 ) on Thursday April 02, 2026 @02:36AM (#66073506)

    And that is what makes this satirical: The result has no new copyright whatsoever and it only give the appearance of working. It is also unclear whether it is actually legal to do or whether it may remain partially or fully under the original copyright and ownership due to the model probably having being trained on the original OSS code.

    As some people will probably take this seriously, it bears pointing out that this is a technological and legal nightmare. It is a very cool satirical project though.

  • by TrueJim ( 107565 ) on Thursday April 02, 2026 @03:30AM (#66073572) Homepage

    If the clone is AI-generated, I donâ(TM)t think it can be copyrighted, based on [Thaler v. Perlmutter, 2023]. Calling the clone âoeproprietaryâ is a slight misstatement. It could be protected as trade secret maybe, but I donâ(TM)t think itâ(TM)s copyrightable, based on what courts have ruled so far.

  • Clean room design is a legal strategy, but it is not a legal requirement. There are other methods that can be used for creating works not considered to being a derivative work.

    Also a reminder, words used in law don't have the same meaning in language. The law usually narrows the meaning explicitly, or implicitly via case law.

    GNU has used this to their advantage to clone most of the shell runtime utilities, so why shouldn't the same be used to replace GNU licensed code?

"Marriage is low down, but you spend the rest of your life paying for it." -- Baskins

Working...