Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
GNU is Not Unix Programming IT Technology

New Method To Detect and Prove GPL Violations 218

qwerty writes "A paper to be presented at the upcoming academic conference Automated Software Engineering describes a new method to detect code theft and could be used to detect GPL violations in particular. While the co-called birthmarking method is demonstrated for Java, it is general enough to work for other languages as well. The API Benchmark observes the interaction between an application and (dynamic) libraries that are part of the runtime system. This captures the observable behavior of the program and cannot be easily foiled using code obfuscation techniques, as shown in the paper (PDF). Once such a birthmark is captured, it can be searched for in other programs. By capturing the birthmarks from popular open-source frameworks, GPL-violating applications could be identified."
This discussion has been archived. No new comments can be posted.

New Method To Detect and Prove GPL Violations

Comments Filter:
  • by mark-t ( 151149 ) <markt AT nerdflat DOT com> on Saturday August 25, 2007 @02:34PM (#20355117) Journal

    What is the false positive rate for this method? What if two programs just happen to do the same thing and the authors happened to choose similar ways to do it. Would this method conclude that one originated with the other? It's not a copyright violation because neither is a derivative work of the other.

    Also, it occurs to me that this method would probably not be as useful as expected for detecting GPL violations. It would think it would only be effective for checking where you have source code available, or at the very least enough symbol table information to make comparisons, which you are not likely to have if somebody is violating the GPL because that implies no source code anyways (and almost certainly no symbol table information for the binary).

  • Re:No, really (Score:3, Interesting)

    by Reziac ( 43301 ) * on Saturday August 25, 2007 @02:45PM (#20355177) Homepage Journal
    That was akin to my first thought: If opensource code is really so superior to closed source code, and if the world would be better off if all apps had been built from those codebases, then shouldn't we *encourage* it to be "pirated", for everyone's net benefit??

  • Other languages (Score:4, Interesting)

    by Mike McTernan ( 260224 ) on Saturday August 25, 2007 @03:02PM (#20355309)
    I looked through the paper, and it is cool stuff. But I couldn't see where it supposed the system would work well for other languages, and I wonder if it really would be so good.

    Java has a very large standard library that is always dynamically linked, and hence can easily be instrumented as the technique requires. C allows static linking which would make such hooking much more difficult. Additionally Java executes in a very standard environment due to the Virtual Machine, where as other languages may have varying ABIs type sizes and other properties that could add significant noise to the birthmark.

    That said, system calls are always hookable and reasonably standard, so maybe this technique could be applied successfully there for malware detection or similar?
  • by Just Some Guy ( 3352 ) <kirk+slashdot@strauser.com> on Saturday August 25, 2007 @03:12PM (#20355371) Homepage Journal

    How did you know they were cheating and didn't derive their similar approaches from a common origin (presumably material that was presented in class or else from the textbook)?

    Amen to that. This is an old story, but I think it bears repeating. A friend of mine and I got "caught" turning in identical code for an assignment. I mean, identical. Same structures, variables, types, layout - everything. However, we wrote our programs separately and never saw each others' until our teacher asked about it.

    It sounds improbable, but consider that:

    1. We both directly transcribed variable names from the homework assignment. A sentence like "it is a fatal error condition for the user to specify a negative number of tasks" became "assert(numtasks >= 0);".
    2. We used the same editor and the same indenting style.
    3. We had done much of our homework together in previous classes because we tended to take the same approach to solving problems.
    4. The assignment wasn't terribly complex to begin with, so the resulting code was only a few pages long.

    We had a teacher who trusted us and we were both good students with good test grades, so it was dismissed as a humorous coincidence. I'm glad a human was willing to listen to our explanation and not just go along with the findings of an automated tester.

  • by Just Some Guy ( 3352 ) <kirk+slashdot@strauser.com> on Saturday August 25, 2007 @03:37PM (#20355553) Homepage Journal

    I take it your code was flawless?

    Of course! ;-)

    people who write flawless code can easily prove their innocence by answering a couple of questions about the implementation on the spot.

    I think there was a bit of that, too: (pointing at me) "why did you do this?" "Because of this requirement in the last paragraph." (Pointing at friend) "and why didn't you use this approach?" "That wouldn 't have worked because of this part here."

  • Re:Heh.. (Score:1, Interesting)

    by Anonymous Coward on Saturday August 25, 2007 @03:43PM (#20355601)
    Exactly. How many anti-RIAA stories are posted on /. because they are trying to detect and sue people for copyright violation. But when it's your property that's being stolen, it's good to detect violators and threaten lawsuits.
  • Re:No, really (Score:4, Interesting)

    by The Bungi ( 221687 ) * <thebungi@gmail.com> on Saturday August 25, 2007 @04:49PM (#20356189) Homepage

    You know, I'm absolutely tired of the BSD trolls

    If by that you mean "you have a different definition of what freedom is, therefore I don't like you" then sure, I'm a "BSD troll" or whatever.

    your definition of "freedom" is ludicrous.

    GPL -> Distribution restrictions.
    BSD -> No restrictions.
    No restrictions -> More freedom.
    More freedom -> Possible unsavory side effects that people choose to live with

    Isn't logic great?

    The GPL definition of freedom is that a sofware and derivatives must always, under all conditions, be free.

    BSD has a similar one, except that it doesn't place restrictions on how that happens. No one can make BSD-licensed software "non free", it will always be available to everyone. The only difference is that it might not benefit from coerced third party improvements, but that's what you sign up for.

    it simply distributes freedoms in a different matter

    The Kool-Aid is strong with this one.

    But don't go around accusing the GPL is limiting freedoms when it gives others freedoms that the BSD could never guarantee.

    BSD licenses guarantee absolutely nothing. Here's the code, do whatever the heck you want with it. The perceived benefits to using the GPL are nice, but please don't insult people's intelligence by claiming they result in more freedom. A restriction to ensure X or Y is still that - a restriction. The distribution restrictions on the GPL are designed to further Stallman's social causes (some of which I actually agree with). If you feel that's fine, then by all means use the GPL. That's your choice.

  • by anothy ( 83176 ) on Saturday August 25, 2007 @04:50PM (#20356215) Homepage
    just to demonstrate that this sort of overlap isn't just CS undergrads doing homework assignments, take a look at Ken Thompson's Turing award lecture [bell-labs.com], particularly this section:

    In the ten years that [Dennis Ritchie and I] have worked together, I can recall only one case of miscoordination of work. On that occasion, I discovered that we both had written the same 20-line assembly language program. I compared the sources and was astounded to find that they matched character-for-character.
    that would clearly fail this test, but it's simply the result of two guys working very closely together with similar styles for a very long time.
  • by fmobus ( 831767 ) on Saturday August 25, 2007 @06:12PM (#20356937)

    In my university, this is the method most teachers will use when they suspect something. Ask each student something about the implementation, how it should be changed to achieve something slightly different. In some cases, when they allow you to form groups to solve the assignment, they will ask each student in the group about the implementation.

    Sounds to me the best way to catch copiers and leechers.

  • by Anonymous Coward on Saturday August 25, 2007 @07:26PM (#20357401)

    A couple years ago, a manager outsourced some programming work to India. When I reviewed their work, I was impressed, but the code was inconsistent (quality, indent style, variable names, etc). I figured maybe parts were written by a new programmer. A couple days later, I accidentally discovered that a lot of the code (the part that impressed me) had been copied from a GPL program. I alerted my manager, but he didn't care. I alerted the outsource company, they didn't care. I alerted our legal department, and they seemed to care a lot.

    Long story short, the manager got fired and I replaced him. We ended up using the original GPL software with some modifications (which were contributed back).

  • Re:No, really (Score:3, Interesting)

    by Nazlfrag ( 1035012 ) on Sunday August 26, 2007 @01:31AM (#20359535) Journal
    If someone wants to invest time and effort into a closed fork, good on them. Everyone else still has access to the original branch, and the creator of the fork isn't messing with any of those rights. I don't see any imbalance, the authors are not falling victim to anything as their original works are intact. The closed branch people are worse off - they aren't benefiting from the open model anymore. It's their loss, their mistake to make if they want to. The inherit superiority of open source makes GPL restrictions unnecessary and, well, too restrictive.

    Of course, everyone is entitled to their view, which is why we should have a multitude of licenses. The GPL for those who want to protect the open nature of derivatives, others who care little about derivatives and focus on the work at hand. I guess I'm just trying to say that it should be up to the author of a work whether they wish to receive the same benefits back and the term Open Source shouldn't be a narrow subset, such as GPL compatible licenses only.

  • Re:No, really (Score:3, Interesting)

    by be-fan ( 61476 ) on Sunday August 26, 2007 @08:17PM (#20366635)
    MS has 70,000 employees, most of which are mediocre. In fact, that's almost the essence of their development model --- throw thousands of crappy developers at a problem and excrete a solution that is just workable enough to make some money.

2.4 statute miles of surgical tubing at Yale U. = 1 I.V.League

Working...