Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
GNU is Not Unix Programming IT Technology

New Method To Detect and Prove GPL Violations 218

qwerty writes "A paper to be presented at the upcoming academic conference Automated Software Engineering describes a new method to detect code theft and could be used to detect GPL violations in particular. While the co-called birthmarking method is demonstrated for Java, it is general enough to work for other languages as well. The API Benchmark observes the interaction between an application and (dynamic) libraries that are part of the runtime system. This captures the observable behavior of the program and cannot be easily foiled using code obfuscation techniques, as shown in the paper (PDF). Once such a birthmark is captured, it can be searched for in other programs. By capturing the birthmarks from popular open-source frameworks, GPL-violating applications could be identified."
This discussion has been archived. No new comments can be posted.

New Method To Detect and Prove GPL Violations

Comments Filter:
  • new use of old trick (Score:5, Informative)

    by toolslive ( 953869 ) on Saturday August 25, 2007 @01:30PM (#20355077)
    I used to be a research assistent, and at university, we used this technique to see if students copied their assignments. They could rename variables, move pieces of text, change comments all the way they liked, but the execution profile stayed the same. We caught a lot of students, and they never figured out how we did it.
    • by mark-t ( 151149 ) <markt AT nerdflat DOT com> on Saturday August 25, 2007 @01:44PM (#20355171) Journal
      How did you know they were cheating and didn't derive their similar approaches from a common origin (presumably material that was presented in class or else from the textbook)? My experience with marking for a computer science professor showed that about 80% of the students approached any given programming assignment almost exactly the same way in terms of their final implementation... their common origin being something the teacher described during a lecture.
      • by Just Some Guy ( 3352 ) <kirk+slashdot@strauser.com> on Saturday August 25, 2007 @02:12PM (#20355371) Homepage Journal

        How did you know they were cheating and didn't derive their similar approaches from a common origin (presumably material that was presented in class or else from the textbook)?

        Amen to that. This is an old story, but I think it bears repeating. A friend of mine and I got "caught" turning in identical code for an assignment. I mean, identical. Same structures, variables, types, layout - everything. However, we wrote our programs separately and never saw each others' until our teacher asked about it.

        It sounds improbable, but consider that:

        1. We both directly transcribed variable names from the homework assignment. A sentence like "it is a fatal error condition for the user to specify a negative number of tasks" became "assert(numtasks >= 0);".
        2. We used the same editor and the same indenting style.
        3. We had done much of our homework together in previous classes because we tended to take the same approach to solving problems.
        4. The assignment wasn't terribly complex to begin with, so the resulting code was only a few pages long.

        We had a teacher who trusted us and we were both good students with good test grades, so it was dismissed as a humorous coincidence. I'm glad a human was willing to listen to our explanation and not just go along with the findings of an automated tester.

      • Re: (Score:3, Interesting)

        by anothy ( 83176 )
        just to demonstrate that this sort of overlap isn't just CS undergrads doing homework assignments, take a look at Ken Thompson's Turing award lecture [bell-labs.com], particularly this section:

        In the ten years that [Dennis Ritchie and I] have worked together, I can recall only one case of miscoordination of work. On that occasion, I discovered that we both had written the same 20-line assembly language program. I compared the sources and was astounded to find that they matched character-for-character.

        that would clearly fai

    • by Azarael ( 896715 )
      Or you could just use MOSS http://theory.stanford.edu/~aiken/moss/ [stanford.edu] (or other) like everyone else.
      • by SnowZero ( 92219 )
        How do you know he was an teaching assistant after 1998? AFAICT Moss wasn't available before then, and didn't seem to become popular until the 2003 paper, and we certainly didn't know about it when I was a TA on the east coast in early 1999. Not that it would have mattered for us, but it's also not compatible with the GPL (non-commercial use only), so you couldn't link it in as part of the rest of your submission system if it used anything GPL, unless you were careful.

        So, there are some quite normal reaso
        • by Azarael ( 896715 )
          If it wasn't available then, that is certainly reasonable.
          As far as the GPL goes, I'm pretty sure that's irrelevant as as far as I know, MOSS is a free service available for educational use.
          • Re: (Score:3, Informative)

            by SnowZero ( 92219 )
            Well, for the class I TA'ed, it was probably available, just not widely popular yet. Of course, cheaters are usually easy to catch, so even simple systems work pretty well. So, in their attempt to save time and effort, cheaters are often are bad at covering up their tracks. Anything that yields possible hits can be verified by human inspection. Why are almost all cheaters so lazy? Because if they weren't, they'd just do the assignment.

            Cheaters in my classes tended to: (1) not correct misspellings or bu
            • by Azarael ( 896715 )
              Exactly true, and in most cases the effort needed to cover you tracks well is equivalent to doing the assignment anyway.. thus the laziness factor.
    • by kasperd ( 592156 )
      Whether that approach gives false positives depends on the size and complexity of the piece of code they had to write. As a teaching assistent I have seen assignments that looked even more like copying than what you described, but even in that case they were eventually accepted. One time the students had to add some functionality to an assembler. All groups were given the same code to start with and just had to add one clearly defined piece of functionality. There is really not many ways to do that, so havi
  • No, really (Score:3, Informative)

    by Plunky ( 929104 ) on Saturday August 25, 2007 @01:31PM (#20355093)
    lets just set the code free. lets not chase it down the street to make sure it stays free, just let it go as it will.
    • Re: (Score:3, Interesting)

      by Reziac ( 43301 ) *
      That was akin to my first thought: If opensource code is really so superior to closed source code, and if the world would be better off if all apps had been built from those codebases, then shouldn't we *encourage* it to be "pirated", for everyone's net benefit??

      • Re: (Score:2, Insightful)

        by Anonymous Coward
        You can use the BSD license for your code if you unconditionally believe that "more copies of good code = better world". Heck, in many countries you can put code directly in the public domain. For those who think that authors of good (open) code need to be able to get an advantage in return for their generosity, so that they can keep being generous and produce more good code, there's the GPL, and that needs some level of enforcement.
      • Re: (Score:2, Insightful)

        That was akin to my first thought: If opensource code is really so superior to closed source code, and if the world would be better off if all apps had been built from those codebases, then shouldn't we *encourage* it to be "pirated", for everyone's net benefit??

        One of the strengths of open source is that improvements are shared. If one company just makes some improvements to an open source project and then redistributes it in a way that violates the terms of the license designed to keep it open, that onl
      • Re:No, really (Score:4, Insightful)

        by TheRaven64 ( 641858 ) on Saturday August 25, 2007 @02:34PM (#20355533) Journal
        For Open Source code, you are right. The Open Source movement believes in the superiority of the 'bazaar' development mode. If you try to create a closed fork then you are going to fall behind the open version, and have to spend a lot of time and effort merging changes from the main tree.

        The Free Software movement, however, believes that code which protects the user's freedoms to use, modify and distribute it is intrinsically superior, and that people who wish to write code that does not respect these freedoms should not be aided by being able to use the work of those who do.

        As such, an Open Source advocate would not mind, because the closed copy would quickly become inferior. A Free Software advocate would object, because their work would be being used for (in their view) unethical purposes (denying end users their freedoms).

    • Re:No, really (Score:4, Insightful)

      by The Bungi ( 221687 ) * <thebungi@gmail.com> on Saturday August 25, 2007 @01:58PM (#20355281) Homepage
      That won't do. The GPL is really more of a social instrument than a software license, so for people like Stallman a BSD-style license (which is just one step above public domain and true freedom) would be unacceptable. A lot of bandwidth and keyboard lubricant has been spent over the years to ensure that everyone thinks the GPL is the "best" software license - and the thousands of developers that buy into the FSF "freedom, with caveats" spiel by using the GPL (because well, that's what everyone uses) without really understanding what it's for are part of that problem.

      As you can imagine I really don't like the GPL or the FSF or Richard Stallman or any of his friends too much. While I recognize their contributions I think that they've fallen into the trap of trying to force everyone to convert to what has become a quasi-religion where the Inquisition is more important than celebrating mass.

      • Re: (Score:3, Funny)

        by Anonymous Coward
        keyboard lubricant

        I've never heard it called that before.
      • by jez9999 ( 618189 )
        Making the code freer than the GPL lets eg. Microsoft's embrace, extend, extinguish a whole lot easier. Now they just have to copy/paste and slightly modify the code, compile it, and pass it off as theirs. Some of us don't like that.
        • I consider BSD to be a superior server environment to Linux, and so far it's doing quite well.
      • Re:No, really (Score:5, Insightful)

        by Daishiman ( 698845 ) on Saturday August 25, 2007 @02:52PM (#20355683)

        You know, I'm absolutely tired of the BSD trolls that claim that the BSD license is "freer", not because I have a beef with the BSD, simply because your definition of "freedom" is ludicrous.

        There are no absolute freedoms. Freedom to infringe on other's rights or freedoms gives more freedom to yourself, but limits it to other members of society. So long as there are things that cannot be owned or achieved communaly without side effects to others, freedoms have a limit, that is the actions that you cannot do so that others can do them.

        The GPL definition of freedom is that a sofware and derivatives must always, under all conditions, be free. Yes, it a restriction to the developer who would wish to close up his source and use a GPLed piece of code, but it is an additional freedom to all the users who now have access to this source, which would have otherwise been denied.

        Analogy time: the King is free to treat his peasants as dogs if he wished and if he has sufficient power to repress any opinions the peasants would have about that. The peasants, however, are limited by the freedoms the king has. Therefore the balance of freedoms for a more equal society would be that the king's freedoms be limited in order to allow the peasants to live their life.

        So as you said, the GPL is also a social instrument, but it is no less free than the BSD; it simply distributes freedoms in a different matter. If you have a problem with that, use whichever license you wish to use. But don't go around accusing the GPL is limiting freedoms when it gives others freedoms that the BSD could never guarantee.

        • Re:No, really (Score:4, Interesting)

          by The Bungi ( 221687 ) * <thebungi@gmail.com> on Saturday August 25, 2007 @03:49PM (#20356189) Homepage

          You know, I'm absolutely tired of the BSD trolls

          If by that you mean "you have a different definition of what freedom is, therefore I don't like you" then sure, I'm a "BSD troll" or whatever.

          your definition of "freedom" is ludicrous.

          GPL -> Distribution restrictions.
          BSD -> No restrictions.
          No restrictions -> More freedom.
          More freedom -> Possible unsavory side effects that people choose to live with

          Isn't logic great?

          The GPL definition of freedom is that a sofware and derivatives must always, under all conditions, be free.

          BSD has a similar one, except that it doesn't place restrictions on how that happens. No one can make BSD-licensed software "non free", it will always be available to everyone. The only difference is that it might not benefit from coerced third party improvements, but that's what you sign up for.

          it simply distributes freedoms in a different matter

          The Kool-Aid is strong with this one.

          But don't go around accusing the GPL is limiting freedoms when it gives others freedoms that the BSD could never guarantee.

          BSD licenses guarantee absolutely nothing. Here's the code, do whatever the heck you want with it. The perceived benefits to using the GPL are nice, but please don't insult people's intelligence by claiming they result in more freedom. A restriction to ensure X or Y is still that - a restriction. The distribution restrictions on the GPL are designed to further Stallman's social causes (some of which I actually agree with). If you feel that's fine, then by all means use the GPL. That's your choice.

          • Re:No, really (Score:4, Insightful)

            by Daishiman ( 698845 ) on Saturday August 25, 2007 @05:21PM (#20357013)

            GPL -> Distribution restrictions. BSD -> No restrictions. No restrictions -> More freedom. More freedom -> Possible unsavory side effects that people choose to live with

            GPL -> Code will always be open and derivatives will stay that way
            BSD -> Code can be closed off and new improvements to it can remain closed off forever.
            Always open code -> More freedom
            Sometimes open code -> Permanent loss of freedom with regards to that code.
            Indeed, logic is great.

            BSD has a similar one, except that it doesn't place restrictions on how that happens. No one can make BSD-licensed software "non free", it will always be available to everyone. The only difference is that it might not benefit from coerced third party improvements, but that's what you sign up for.

            I never said that you can't sign up for that if so you wish, but code is always used within contexts, and when used in the context of proprietary software, any improvements on the code will be lost, any bug fixes will be lost, any added functionality will be lost.

            Sure, some people will build upon it, but losing the obligation of putting the improvements back into the codebase means that it will eventually stagnate, and that the improvements that could have been used for the good of everyone who contributed can be denied at will. Look at FreeBSD with OS X: Apple got the foundation of their OS for free, and after that they simply closed up the rest at will. Perhaps the Apple folks got to improve their memory management, or add some new DRM techniques. Whatever they've done, the FreeBSD devs will never get to see it.

            If they don't mind as users and developers to see their work used to create a proprietary, vendor-locked platform then it's their prerogative; as a used and dev I prefer to make sure that my code is an established base of constant improvement. With the GPL they're empowered and free to do that; with BSD new parties are empowered to do whatever and completely ignore original creators aside from the required attributions.

            Notice that I'm not saying the BSD license is more free; it is equally free, but shifting freedom to new developers and vendors to be,IMO, lazy bastards and profiting for nothing, while GPL shifts it to original developers, contributors and users to get reciprocal treatment from others. You're free to think that the former is more important; I belive the latter brings greater benefits to everyone in the long term.

            BSD has a similar one, except that it doesn't place restrictions on how that happens. No one can make BSD-licensed software "non free", it will always be available to everyone. The only difference is that it might not benefit from coerced third party improvements, but that's what you sign up for.

            No one is coercing anyone here. If you had read and understoof the GPL, and it looks like you haven't, you'd know that the conditions apply only to those who want to redistribute software. If you want to keep your patches to yourself you can do that and it's your right, but if you're going to be using other's code to sell it or gain from it you have to abide by the creator's conditions. Going back to my point about freedom, perhaps as distributor you have less leeway regarding your changes, but your users have just gained the guarantee that they'll always be able to see and change the code. The BSD could not have done that.

            BSD licenses guarantee absolutely nothing. Here's the code, do whatever the heck you want with it. The perceived benefits to using the GPL are nice, but please don't insult people's intelligence by claiming they result in more freedom. A restriction to ensure X or Y is still that - a restriction. The distribution restrictions on the GPL are designed to further Stallman's social causes (some of which I actually agree with). If you feel that's fine, then by all means use the GPL. That's your choice.

            You hit the nail on the head. Th

            • by Raenex ( 947668 )
              Access to source code != freedom. It is a convenience.

              Copyright denies freedom, and it is debatable whether this is ultimately justified, but you don't hear people claiming that the benefits of copyright enable "freedoms" of copyright holders. Yet that's the same kind of nonsense that the FSF wants everybody else to swallow.
              • by Jonner ( 189691 )
                I'm pretty sure that most of the people at the FSF would share your skepticism of the value of copyright. The whole point of copyleft is to use the existing legal system to promote the FSF's ideals. If there weren't copyright, the GPL would carry no weight and wouldn't be needed in the first place, a situation that RMS would have undoubtedly preferred.
                • by Raenex ( 947668 )

                  If there weren't copyright, the GPL would carry no weight and wouldn't be needed in the first place, a situation that RMS would have undoubtedly preferred.

                  You're ignoring the main point of contention here: FSF considers access to source code one of the fundamental "freedoms" and "rights" of users. That aspect can only be enforced with copyright and the GPL. Alternatively, if there was no copyright, the FSF would require a law mandating something like the GPL, which in my view would be decidedly anti-freedom.

                  I full agree with the FSF that being able to copy, use, and modify is an aspect of freedom. But being entitled to source is not.

                  • by bky1701 ( 979071 )
                    While they may claim that, I think the source code part is more about functionality. Since we lack the same development abilities as companies like Microsoft, the best way to develop is the open method; it avoids "dead ends" when a project dies, helps find errors and generate fixes to them, allows more people to help than would normally be 'allowed', allows code reuse and gives free software another selling point (you can change it easily to do whatever you want).

                    It's more about competition than ideals a
                  • by Peaker ( 72084 )

                    I full agree with the FSF that being able to copy, use, and modify is an aspect of freedom. But being entitled to source is not.

                    How can you practice the freedom to modify a program without the source?
                    • by Raenex ( 947668 )

                      How can you practice the freedom to modify a program without the source?
                      Freedom would entail being free to modify whatever you receive. It does not entail being entitled to receiving the source. That would be a matter of convenience, not freedom.
          • by fsmunoz ( 267297 )
            BSD -> No restrictions.

            There are restrictions, using the BSD license in the OSI page [opensource.org]:

            # Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
            # Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

            So some people might feel that their liberty is diminuished by having to retain copyright notices and putting disclaimers in their stuff. If absolute "freedom" - actually the ability to use something without any kind of restrition whatsoever - is paramount why doesn't the BSD community (whose contribution to free software is important) put their code in the Public Domain?

            The rest of the GPL vs BSD license discussi

          • by kaffiene ( 38781 )
            You failed to answer any of the points raised by the poster you responded to. His post was well thought out, unlike your reply. How about answer the points raised than making cute quips like "The Kool-Aid is strong with this one".

            There are plenty of people who ARE in the GPL camp, philosophically, so attempting to waive them off with glib comments is simply not good enough. If you can't be assed to do anything more than make glib comments, then the "BSD Troll" label is probably appropriate.
        • And my problem with the GPL trolls is their belief that inanimate software deserves freedom more than developers do. Look, GPL offers a fair bargain: I'll show you mine if you show me yours. BSD fulfills its purpose too: an unconditional donation of code to the world. The real freedom is that we have a choice between the two, and to make our own license if we want to.
        • by trifish ( 826353 )
          it is no less free than the BSD

          GPL certainly is less free than BSD.

          Need an example? If I create software under the GPL I cannot take any code under other open source licenses (MPL, BSD, Apache, etc.) and integrate it as part of my GPLed program. Why? Because the GPL requires that all parts of my software are released under the GPL! See?

          The other licenses allow me to combine open source code covered by different licenses in one single product. That's why they give us more freedom than the GPL.
          • Now you're using a different situation as an example.

            The point is that there is no universally acknowledged "absolute freedom, guaranteed" and opinions are always subject to bias.
            You, for example, have had a run-in with the restrictions of the GPL and so you have an impression of the GPL restricting your freedom.

            Then there are other people who have been frustrated about not being able to change some piece of software, because some of the source is available and some isn't. Now they feel burnt. Maybe they ev
            • by trifish ( 826353 )
              Let's try a different approach, so that everyone understands:

              Simply, if we remove from the GPL the requirement that the whole program must be under the GPL, and if we add a new requirement that other licenses covering other parts of the program must require complete source code to be open, will we have more freedom? Yes, we definitely and obviously will.

              The "if-you-are-not-with-us-you-are-against-us" attitude reminds me of the communists. The GPL insists that it should be the only license in the world...
              • Uhh, no. If you were a bit more aware of the subject, you would know that there are "GPL-compatible" licenses, where you can link code between them and the clauses of each license permit so without trouble.

                I'm actually quite suprised how little most people know on the subject. If you want a license that frees your code while letting others link proprietary code to it, you have the LGPL, which allows just that, and is IMO the best license for core libraries such as GTK a libc.

                Sounds to me like you people s

                • by trifish ( 826353 )

                  Uhh, no. If you were a bit more aware of the subject, you would know that there are "GPL-compatible" licenses

                  Uh, no. The GPL explicitly requires the entire program to be covered by the GPL. So there is simply no license "compatible with GPL" in this regard (apart perhaps from the LGPL, which I find almost ironical).

                  where you can link code between them and the clauses of each license permit so without trouble.

                  Yes, all open source licenses allow you to do that, except the GPL.

                  I'm actually quite suprised how little most people know on the subject.

                  You're right, and, no offence, but you are a good example of that.

                  If you want a license that frees your code while letting others link proprietary code to it, you have the LGPL, which allows just that, and is IMO the best license for core libraries such as GTK a libc.

                  Sounds to me like you people should take a look at the LGPL. The FSF and Stallman realized that some of the freedoms of the GPL might translate into unreasonable restrictions to others, and that's why they created that license.

                  Nobody has talked about LGPL. We (including the grand parent poster) have talked about the GPL. So please try to stay on-topic. The topic was GPL vs. BSD.

          • by Peaker ( 72084 )

            Need an example? If I create software under the GPL I cannot take any code under other open source licenses (MPL, BSD, Apache, etc.) and integrate it as part of my GPLed program. Why? Because the GPL requires that all parts of my software are released under the GPL! See?

            Sure you can. The GPL applies to those who don't have the right to distribute the software in the first place. You have the copyright to your own work, so the GPL limitations only apply to whoever you, as the copyright holder, want them to a

      • Re: (Score:3, Insightful)

        by Kjella ( 173770 )
        Oh I think everybody understands it just fine because it's basicly "Modify it any way you want. If you distribute it, source code goes with it". Ok so it's not free as in public domain, but who really has a problem with the GPL? Only those that want to take source code and not distribute source code. Which is fine, I'd love it if someone did my work so I could download it off the Internet too. I just don't see why anyone should bother to listen to them, no matter how many strawmen are being used about "real
      • The GPL is really more of a social instrument than a software license, so for people like Stallman a BSD-style license (which is just one step above public domain and true freedom) would be unacceptable.

        Not so fast. The GPL FAQ [gnu.org] states that there exist situations where a permissive license is appropriate, in particular short programs [gnu.org] and web site templates [gnu.org]. Mr. Stallman has also endorsed the use of a permissive license for a library designed as the reference implementation of a Free file format that replaces patented file formats [xiph.org].

      • Re: (Score:3, Insightful)

        by DaleGlass ( 1068434 )
        The GPL vs BSD "freedom" argument is really boring semantics. Whether the GPL is freedom, slavery, communism or whatever else you want to call it is irrelevant to me: It does precisely what I want, which is why I use it.
      • by fsmunoz ( 267297 )
        The GPL is really more of a social instrument than a software license

        I agree with you here.

        so for people like Stallman a BSD-style license (which is just one step above public domain and true freedom) would be unacceptable.

        Here I disagree: it's not unaceptable at all, only less prefered. It's a free license, but lacks the "social instrument" provisions that you mentioned, but it *is* a free license nonetheless. From the FSF licences page [fsf.org]:

        If you are contemplating writing a new license, please contact the FSF by writing to . The proliferation of different free software licenses means increased work for users in understanding the licenses; we may be able to help you find an existing free software license that meets your needs. We try to list the most commonly encountered free software license on this page, but cannot list them all; we'll try our best to answer questions about free software licenses whether or not they are listed here.

        Modified BSD license

        This is the original BSD license, modified by removal of the advertising clause. It is a simple, permissive non-copyleft free software license, compatible with the GNU GPL.
        If you want a simple, permissive non-copyleft free software license, the modified BSD license is a reasonable choice. However, it is risky to recommend use of "the BSD license", because confusion could easily occur and lead to use of the flawed original BSD license. To avoid this risk, you can suggest the X11 license instead. The X11 license and the revised BSD license are more or less equivalent.
        This license is sometimes referred to as the 3-clause BSD license.

        From the What is Free Software page [gnu.org]:

        In the GNU project, we use copyleft to protect these freedoms legally for everyone. But non-copylefted free software also exists. We believe there are important reasons why it is better to use copyleft, but if your program is non-copylefted free software, we can still use it.

        ... and from

      • What bothers me is that all of this discussion, all these constant debates on Slashdot over which license says what, all of the millions of comments on the GPLv3... that all represents time *not spent writing actual code*.
      • The GPL is really more of a social instrument than a software license


        In what way is any license not a social instrument?
      • by be-fan ( 61476 )
        If you think Stallman has made this into a religious thing, you really don't have a very good understanding of the history of the GPL.

        I'm not going to claim to have a deep insight into Stallman's mind, but based on historical events in the public record, I think it's fair to say that the GPL has an extremely practical intention behind it.

        Stallman is a product of the computer industry of the 1980s. This was when "open" in the commercial software industry meant nothing similar to what it does now, and when lo
    • Re: (Score:3, Insightful)

      by marcello_dl ( 667940 )
      The code doesn't need freedom. People need freedom. Let the bad guys incorporate GPLed stuff and they are likely to become an issue because they'll enhance it and defend it as it were all their own, against similar enhancements done to the GPLed branch.

      Besides, If i were to buy software from a company I'd like to know if it's stuff they designed and know line by line or if they just rebranded things i could obtain for free elsewhere.

      I say, if you can expose them, do it.
    • More power to you. I believe that's the MIT or BSD license you're looking for.

      This is about GPL violations, which is as much about guaranteeing the future freedom of the code as it is the freedom to tinker with the code in the first place.

      If you don't like it, that's fine, use something else.
  • by mark-t ( 151149 ) <markt AT nerdflat DOT com> on Saturday August 25, 2007 @01:34PM (#20355117) Journal

    What is the false positive rate for this method? What if two programs just happen to do the same thing and the authors happened to choose similar ways to do it. Would this method conclude that one originated with the other? It's not a copyright violation because neither is a derivative work of the other.

    Also, it occurs to me that this method would probably not be as useful as expected for detecting GPL violations. It would think it would only be effective for checking where you have source code available, or at the very least enough symbol table information to make comparisons, which you are not likely to have if somebody is violating the GPL because that implies no source code anyways (and almost certainly no symbol table information for the binary).

    • More to the point what is the false negative rate? There is tons of really useful code out there that doesn't make any system or library calls at all. It just takes data, processes it in some way, and hands back the results. That description could apply to something like an image decoder library like libpng to a fully blown 3D graphics engine.
      • by arth1 ( 260657 ) on Saturday August 25, 2007 @02:11PM (#20355359) Homepage Journal
        My guess is that it would work much better for java and possibly C++ than more concise languages which don't have tonnes of implicit calls and inheritances. And even with OO languages like java, I'd think that simply adding a try in the middle would change the fingerprint quite a bit.
        Also worth considering is what a compiler optimiser might do -- they can be quite good at rearranging code different ways depending on whether optimising for speed or code size, and what the target is. That's probably another reason why this might work better with java, which only has rather rudimentary jit optimiser.

        If this tool can help identify some infringing code, that's well and good, but I wouldn't rely on it, wouldn't think it would add much if any legal weight, and neither would I think it could replace a thousand eyes.

        Anyhow, the real problem, as I see it, with identifying open source code pilfered and added to a closed source project is that you generally aren't allowed to reverse engineer the code itself to see what it actually does. So even if you're Very Damn Sure that a piece of commercial software illegally uses open source and sells it as its own closed source, you're not allowed to investigate and come up with evidence. You'll have to file a suit and get a judge to order the code examined, and with only a good hunch to go on, and no way to document a financial loss, and probably not having too deep pockets yourself, that's rather unlikely to go anywhere.
        Which is why I think it's important that we support institutions like FSF, which can occasionally fight the battle on behalf of the little guy.

        Regards,
        --
        *Art
        • by mark-t ( 151149 )

          Anyhow, the real problem, as I see it, with identifying open source code pilfered and added to a closed source project is that you generally aren't allowed to reverse engineer the code itself to see what it actually does.

          The number of cases where this is actually enforceable is far outweighed by the number of cases where it isn't. Reverse engineering by itself isn't illegal anyways... so evidence of copyright infringement acquired by reverse engineering wouldn't be inadmissable.

      • Re: (Score:3, Informative)

        by TheRaven64 ( 641858 )

        There is tons of really useful code out there that doesn't make any system or library calls at all. It just takes data, processes it in some way, and hands back the results
        Are you sure? You know that read and write are system calls? And that printf, sqrt, exp, etc are all library functions? Even trivial code makes a lot of system calls. A hello world program, in C, on Linux, makes 27 system calls (number from strace).
    • probable cause? if two programs execute in virtually identical ways there is a reason to investigate. it doesnt catch them all but it is better than nothing.
  • by Ungrounded Lightning ( 62228 ) on Saturday August 25, 2007 @01:35PM (#20355123) Journal
    An identical library call signature for a nontrivial part of the execution could be produced by a clean-room analysis or even independent development of an equivalent component. Neither of these is a GPL violation.

    This is not to say that the technique wouldn't be useful for hunting down GPL violations. But a positive is not difinitive by itself.

    Meanwhile code obfuscation (even automatically generated obfuscation) could easily modify at least the timing, if not the order, of such calls.

    Nevertheless this is a powerful tool: An hunk of GPL code that hasn't had its flow obfuscated systematically (even code that HAS been obfuscated but not systematically) will have large swaths of code that trips the detector. And it doesn't require reverse engineering until after the alarm goes off.

    Good job, guys.
    • Meanwhile code obfuscation (even automatically generated obfuscation) could easily modify at least the timing, if not the order, of such calls.

      (Yes I know that the article says it can't. But that refers to the usual sort, which is directed at hiding the similarity from someone reading the source. I'm talking about obfuscation directed at tools reading the routine-call signature.)
    • This is not to say that the technique wouldn't be useful for hunting down GPL violations. But a positive is not difinitive by itself.

      Indeed. The title of this slashdot article would be pretty much dead on if the words "and Prove" were taken out of "New Method to Detect and Prove GPL Violations".

  • by koh ( 124962 ) on Saturday August 25, 2007 @01:35PM (#20355127) Journal
    GGA! The GNU Genuine Advantage program!

  • by fishthegeek ( 943099 ) on Saturday August 25, 2007 @01:54PM (#20355239) Journal
    Pitchfork? ... Check
    Torch? ... Check
    Map of Corporate Castle locations? ... Check
    FSF Lawyers programmed to be speed dialed in emergencies? ... Check
    Desire to burn the non-believers? ... Check

    Okay, I'm ready! What IRC Channel are we meeting in?
  • Other languages (Score:4, Interesting)

    by Mike McTernan ( 260224 ) on Saturday August 25, 2007 @02:02PM (#20355309)
    I looked through the paper, and it is cool stuff. But I couldn't see where it supposed the system would work well for other languages, and I wonder if it really would be so good.

    Java has a very large standard library that is always dynamically linked, and hence can easily be instrumented as the technique requires. C allows static linking which would make such hooking much more difficult. Additionally Java executes in a very standard environment due to the Virtual Machine, where as other languages may have varying ABIs type sizes and other properties that could add significant noise to the birthmark.

    That said, system calls are always hookable and reasonably standard, so maybe this technique could be applied successfully there for malware detection or similar?
  • When people go to these lengths to prove misuse of commercial licenses, they're called fascists. When it's done to prove misuse of free licenses, it's OK.

    I see the community is still working as it always has.
    • by sepluv ( 641107 )

      It may be news to you but non-commercial licenses are AFAIK universally considered non-free (where as you see to imply the two are mutually exclusive). And when has anyone ever had any problem with people going to lengths (whatever that means) to prove license violations?

      I've certainly never heard anyone complaining about people coming up with evidence of violations. In fact, what I've come across a lot of is the opposite: asking people who are making vague libelous accusations about someone "stealing"

  • Very Cool (Score:2, Insightful)

    by maz2331 ( 1104901 )
    This is very cool and potentially useful. By itself, it wouldn't be enough to force compliance or win a violation suit, it could well be enough to meet the threshold for filing a suit and forcing source code analysis in discovery. Really, it is a great tool to have to ensure that open source license terms are respected by removing the "code anonymity" inherent in a binary.
  • Instead of coding open source projects, now we're coding projects to detect license violations.

    Next, the Open Source Business Software Alliance and raids by the Secret Service...

    When is the last time we read anything about open source that wasn't about licensing?

    When did it stop being about the code and the value?
    • by sepluv ( 641107 )

      Instead of coding open source projects, now we're coding projects to detect license violations.

      Well one person has as part of some academic research. You see, the beauty of FLOSS development is everyone can code what they enjoy coding, and that you don't have to help anyone but can instead do something you prefer.

      I won't even bother addressing your incoherent comment about the Secret Service, but would be interested in what you are smoking.

      When is the last time we read anything about open source that wasn't about licensing?

      In around 95% of stories about it. For instance the last FLOSS story on here was about a new release of WINE and the one before that about possible moral iss

    • When is the last time we read anything about open source that wasn't about licensing?

      In reality, it's pretty much impossible to discuss open source without it being about licensing - because licenses are the legal expression of the open source philosophy. If your are discussing the code, you are discussing the application or the language, not open source.
    • Instead of coding open source projects, now we're coding projects to detect license violations.

      No, we're not. Those guys [uni-sb.de] are developing a method to detect license violations, and despite Slashdot's implications, I can't personally see any reference they've made in their project to GPL, open source or free software.

      It's only a net loss for open source projects if they were otherwise going to be working on something more beneficial for open source, and they probably weren't. Usually people who develop op

    • by petrus4 ( 213815 )
      Yep...it ought to be the new motto of the FSF.

      "Enjoy the FREEDOM, and don't worry about the cognitive dissonance. It goes away if you don't think about it, eventually."
      • by Peaker ( 72084 )
        Wow, its hard to believe how many people on Slashdot still think that there is a dissonance with the GPL.

        Its been explained well at least 500 times in every such license discussion, but some people are dense enough to still not get it.

        BSD->More freedom to software developers (few), less to users (many)
        GPL->More freedom to software users (many), less to developers (few)

        GPL generates more net freedom.

        Its as simple as that.
    • by Peaker ( 72084 )
      Yeah!

      We should have a Slashdot article about the new query_database function in GenericProject 3.2!

      Talk about code, baby!
  • How well does it work with the Wine versus Windows comparison?
  • a new method to detect code theft

    I realise this is going off on a tangent, but I'm concerned about the use of the word theft. Usually I'm one of the first people to jump up and down when I hear the RIAA or MPAA accuse people of stealing, and I've noticed that quite a few other people on Slashdot do the same. I think it's mis-representative of the paper to represent copyright infringement as anything other than exactly what it is, which is copyright infringement.

    Language is what it is, and it changes ov

    • Language is what it is, and it changes over time, but I'd be really disappointed if this one was let to slip, because rather than the language changing because it's more convenient or better, it's changing because a group of powerful corporations want to confuse the issue for their own control and commercial benefit.

      I used to think the same, but you can check modern dictionaries. The word theft already includes copyright infringement.

      The battle was lost. The best way to act is to simply declare that some t

  • False positives....

    The story is presented with a stage light focused on linux but then the house lights come up and show linux in jail along with most of the audience.

    This is just one paper for one Automated Software Engineering (ASE) conference.

    But if you really want to insure software becomes genuinely free, then the level or automated software development will have to become easy enough for the typical user to apply it. Much like most anyone knows how to use a calculator and uses it as they need.

    There is
  • They should have known this earlies, but now it's too late.

Beware of all enterprises that require new clothes, and not rather a new wearer of clothes. -- Henry David Thoreau

Working...