New Method To Detect and Prove GPL Violations 218
qwerty writes "A paper to be presented at the upcoming academic conference Automated Software Engineering describes a new method to detect code theft and could be used to detect GPL violations in particular. While the co-called birthmarking method is demonstrated for Java, it is general enough to work for other languages as well. The API Benchmark observes the interaction between an application and (dynamic) libraries that are part of the runtime system. This captures the observable behavior of the program and cannot be easily foiled using code obfuscation techniques, as shown in the paper (PDF). Once such a birthmark is captured, it can be searched for in other programs. By capturing the birthmarks from popular open-source frameworks, GPL-violating applications could be identified."
A couple of things.... (Score:4, Interesting)
What is the false positive rate for this method? What if two programs just happen to do the same thing and the authors happened to choose similar ways to do it. Would this method conclude that one originated with the other? It's not a copyright violation because neither is a derivative work of the other.
Also, it occurs to me that this method would probably not be as useful as expected for detecting GPL violations. It would think it would only be effective for checking where you have source code available, or at the very least enough symbol table information to make comparisons, which you are not likely to have if somebody is violating the GPL because that implies no source code anyways (and almost certainly no symbol table information for the binary).
Re:No, really (Score:3, Interesting)
Other languages (Score:4, Interesting)
Java has a very large standard library that is always dynamically linked, and hence can easily be instrumented as the technique requires. C allows static linking which would make such hooking much more difficult. Additionally Java executes in a very standard environment due to the Virtual Machine, where as other languages may have varying ABIs type sizes and other properties that could add significant noise to the birthmark.
That said, system calls are always hookable and reasonably standard, so maybe this technique could be applied successfully there for malware detection or similar?
Re:new use of old trick (Score:5, Interesting)
Amen to that. This is an old story, but I think it bears repeating. A friend of mine and I got "caught" turning in identical code for an assignment. I mean, identical. Same structures, variables, types, layout - everything. However, we wrote our programs separately and never saw each others' until our teacher asked about it.
It sounds improbable, but consider that:
We had a teacher who trusted us and we were both good students with good test grades, so it was dismissed as a humorous coincidence. I'm glad a human was willing to listen to our explanation and not just go along with the findings of an automated tester.
Re:new use of old trick (Score:4, Interesting)
Of course! ;-)
I think there was a bit of that, too: (pointing at me) "why did you do this?" "Because of this requirement in the last paragraph." (Pointing at friend) "and why didn't you use this approach?" "That wouldn 't have worked because of this part here."
Re:Heh.. (Score:1, Interesting)
Re:No, really (Score:4, Interesting)
If by that you mean "you have a different definition of what freedom is, therefore I don't like you" then sure, I'm a "BSD troll" or whatever.
GPL -> Distribution restrictions.
BSD -> No restrictions.
No restrictions -> More freedom.
More freedom -> Possible unsavory side effects that people choose to live with
Isn't logic great?
BSD has a similar one, except that it doesn't place restrictions on how that happens. No one can make BSD-licensed software "non free", it will always be available to everyone. The only difference is that it might not benefit from coerced third party improvements, but that's what you sign up for.
The Kool-Aid is strong with this one.
BSD licenses guarantee absolutely nothing. Here's the code, do whatever the heck you want with it. The perceived benefits to using the GPL are nice, but please don't insult people's intelligence by claiming they result in more freedom. A restriction to ensure X or Y is still that - a restriction. The distribution restrictions on the GPL are designed to further Stallman's social causes (some of which I actually agree with). If you feel that's fine, then by all means use the GPL. That's your choice.
Re:new use of old trick (Score:3, Interesting)
Re:new use of old trick (Score:2, Interesting)
In my university, this is the method most teachers will use when they suspect something. Ask each student something about the implementation, how it should be changed to achieve something slightly different. In some cases, when they allow you to form groups to solve the assignment, they will ask each student in the group about the implementation.
Sounds to me the best way to catch copiers and leechers.
Re:Consider the student's culture, too! (Score:2, Interesting)
A couple years ago, a manager outsourced some programming work to India. When I reviewed their work, I was impressed, but the code was inconsistent (quality, indent style, variable names, etc). I figured maybe parts were written by a new programmer. A couple days later, I accidentally discovered that a lot of the code (the part that impressed me) had been copied from a GPL program. I alerted my manager, but he didn't care. I alerted the outsource company, they didn't care. I alerted our legal department, and they seemed to care a lot.
Long story short, the manager got fired and I replaced him. We ended up using the original GPL software with some modifications (which were contributed back).
Re:No, really (Score:3, Interesting)
Of course, everyone is entitled to their view, which is why we should have a multitude of licenses. The GPL for those who want to protect the open nature of derivatives, others who care little about derivatives and focus on the work at hand. I guess I'm just trying to say that it should be up to the author of a work whether they wish to receive the same benefits back and the term Open Source shouldn't be a narrow subset, such as GPL compatible licenses only.
Re:No, really (Score:3, Interesting)