Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Napster, Audio Fingerprinting, and the Future of P2P

Posted by CmdrTaco on Sun Jul 13, 2003 10:17 AM
from the well-I'll-believe-it-when-I-see-it dept.
mjmalone writes "Napster founder Sean Fanning is poised for a comeback, seems the now 22 year old Fanning has developed technology which creates "audio fingerprinting" of individual tracks and compares them against fingerprints in his firm's database to determine legality. A fee may be set and collected on a copyrighted track by its rightful owner. Fanning is actively recruiting industry support as well as pushing the idea to p2p services such as kazaa and grokster. " This isn't exactly new technology, but it's still interesting to see what Fanning is up to these days besides movie cameos.
+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • well.. (Score:4, Interesting)

    by dema (103780) on Sunday July 13 2003, @10:18AM (#6428041) Homepage
    You have to give him credit. At least someone out there is actually trying to make p2p legit, and not just throwing their weight around *cough* RIAA *cough*
    • by Anonymous Coward on Sunday July 13 2003, @10:28AM (#6428081)
      Umm no.

      This is not going to make P2P "legit".

      This is going to further destroy legit and non infringing usage of P2P. Now, RIAA will still say "p2p has no purpose other than piracy ban it"! And if people start paying for music from these services, guess what LEGITIMATE users of p2p suffer.

      Sean Fanning did not invent P2P. Before napster we used to have IRC/DCC bots etc. and web search pages. Sean Fanning made downloading mp3's easier for the masses because of his windows client that automagically shared files you had downloaded. He's great but he's no God.
      • Sean Fanning did not invent P2P.

        I'm sure lots of people around here already know this, but Sean Fanning's service wasn't even P2P, it used a client-server model, which turned out to be its achilles heel. Killing a service based on that model is a simple matter of removing the servers, the vast majority of which were owned by Napster. Thats why P2P has become the prefered method for trading, it suffers from no such weakness; all nodes have to be individually removed.
    • You have to give him credit. At least someone out there is actually trying to make p2p legit

      P2P is, has, was, and always will be legit. It doesn't need support, approval, or acknoledgment.

      If we insist on clinging to greed, laziness, and possession as a way of life....there's no reason to question building tools which vastly fascilitate theivery.

      The RIAA has been stealing millions a year while defending a fascade of legitimate service. In fact, this is what capitalism has become in this country. When
  • by Radrik (79810) on Sunday July 13 2003, @10:22AM (#6428059)
    if(md5sum("myfile.mp3") == md5sum("Limp_Bizkit-Crap.mp3")
    cout "PIRATE!";
    • And after changing a little ID3 tag (or altering a part of the fil itself), the md sum is vastly different from what you have stored. Not to mention that this'll require you to have md5's of that song in allot of formats, and with allot of different bitstreams. That doesn't work. ;) I'd actually like to know how he solved this problem. The only thing I can think of is to compare music itself, in a way that's smart enough to ignore minor differences. Not that I have a clue of how to do that.

      On the other han
      • by gordyf (23004) on Sunday July 13 2003, @11:17AM (#6428305)
        This is not an md5, this is spectral analysis "fingerprint" of the song. Thus they can identify the song no matter what the encoding (within reason, of course, but you wouldn't want to listen to a song so badly encoded that it can no longer be recognized anyway).

        See http://musicbrainz.org/ for some software that uses the same technology to help you tag your MP3s.

        I'm sure someone will come up with some software that, say, rearranges the MP3 frames of a song, foiling the fingerprinting but allowing the song to be restored on the other end..
        • I'm sure someone will come up with some software that, say, rearranges the MP3 frames of a song, foiling the fingerprinting but allowing the song to be restored on the other end..

          PGP

          Yeah, there are issues re: p2p, but the tech is there.

          Also there's Freenet Project which obfusicates the source. You can ID it, but you can't get rid of it.
            • I haven't RTFA, but I'm assuming this fingerprinting would analyze the analog signature, not the digital one. If this is true, then any publicly playable format would be ID'able even if the digital format were obfuscated (damn I spelled that wrong in my earlier post). If you can hear it, you could ID it.

              If you want to avoid being ID'ed, I think you would have to hide one of three things:
              • The source
              • The content
              • The distribution method

              Most P2P relies on all three bits of info being readily available. Free

        • OK, I propose the 3PM format: The header is the same as an MP3 file. The ID3 tag is the same as an MP3 file. The audio data is byte-swapped, so the four bytes ABCD are stored DCBA. If the length of the audio data is not a multiple of four bytes, the last 1-3 bytes are left untouched. The extension can be .mp3 or .3pm, depending on how easy you want to make it for Sean to filter out byte-swapped files.
      • How about regenerating the waves, then making a fourier transform on it, then use some algorithm to transform that 10 to 20 32 digits integer, different enough for many song that it would not find too much false positive... Ok maybe the spectre between two hard rock song isn't too much different. Does someone knows ?
    • by RobPiano (471698) * on Sunday July 13 2003, @12:12PM (#6428601)
      But audio fingering printing is very much a reality, and nobody uses a check sum.

      There are many good papers on this.. I particularly like the one on "AudioDNA" visit your local google. You see with Audio Fingering Printing we are actually able to take a song that has been rerecorded onto an analog tape, slightly time stretched and still be able to tell that its the original song. It doesn't rely on bytes, but instead qualities of the audio signal.

      There are many ways to do this, but one solid method is to analysis the audio signal for acoustic events that are resistant to change. Make a listing of these events and store there locations in time as a chain. Even if you only have a small segment of the chain you can search for it with techniques similar to the one's they use in biology (nobody looks for a complete DNA chain). Its a little difficult to explain without knowing something about signal processing so I suggest just searching the web. Here are a few good topics:

      Music Information Retrival - (MIR)
      Audio Finger Printing
      Audio DNA
      CUIDADO
      ISMIR
      MPEG-7

      Oh and try not to insult all the people who research this stuff by claiming some goof at Napster invented it.

      Rob
  • question (Score:5, Funny)

    by the uNF cola (657200) on Sunday July 13 2003, @10:25AM (#6428071)
    Will it be able to tell the diff between...

    Backstreet boys, N'Sync and other boy bands?
    Creed, Nickleback and other "rock bands"?
    50-cent and DMX?

    I wonder if record companies will accept mistakes when differentiating between these artists :)
  • by ergo98 (9391) on Sunday July 13 2003, @10:25AM (#6428075) Homepage Journal
    I recall that in its dying days Napster was talking about adding this [internet.com] to appease the recording industry. The variation then was from a company called Relatable [relatable.com]. Sounds like Shawn is stuck in a recursive loop.
    • To see the tech being used in the wild, check out the Neuros MP3 Player [neurosaudio.com]. It is good to see it in a positive context (helping to find songs to buy, not reimbursing for theft)...

      Reviews I have read says it only accurately identifies tracks which you'd probably know anyway (Emenem etc, basically high chart stuff), but that it has the potential to grow and become an effective service...

  • The Parson's Code (Score:5, Informative)

    by Ian Jefferies (605678) on Sunday July 13 2003, @10:28AM (#6428084)
    I remember seeing a book once that helped you identify songs by whether the sequence of notes at the beginning of the piece went up, down or stayed the same pitch when compared to the previous note. It was about the size of a telephone directory.

    A quick Google finds out that its called The Parson's Code, with a lot more information here [bbc.co.uk].

    Presumably the fingerprinting scheme works in a similar fashion (over a larger portion of the song, and probably over multiple fragments of the song as well).

    Ian.
  • by Krapangor (533950) on Sunday July 13 2003, @10:29AM (#6428086) Homepage
    At Napster he was basically a strawman to make this company look "rebel" and "young" instead of the copyright stealing money-donkey for fat, greedy investors that it was.
    Such people don't "change sides" or comit "treason". They don't have any morales at all and work basically for any bloke who has money in his pockets. And Fanning thinks that this bloke is the music industry. I wonder, however, if they'll take him. Elephants are said to have good memory and to be unforgiving.

    And for this P2P thing: does anyone here really think at the music industry will just lean back and watch their profits flush aways through DSL customer lines ?

  • Why comply? (Score:5, Interesting)

    by Pacer (153176) on Sunday July 13 2003, @10:30AM (#6428097) Homepage Journal
    Or P2P networks could NOT verify "legality," NOT pay Fanning anything, remain distributed enough to avoid any serious legal problems, shift the responsibility (rightfully) onto users, and music will remain -- as it was and ever shall be -- completely free.

    Avast, me hearty! Arrrr!
    • There is the odd concept that its the right thing to do. The fact that you might not like it if someone came in and trashed the underpinings of how you make a living.

      Theres also the matter of self interest. If you like music you better make certain that musicians get rewarded for making it.
      • Sure, however you have to consider the general backlash agains the obsessive/repressive control that the current crop of large copyright holders are imposing/attempting to impose.

        Seen in such light, it may still seem like the 'right thing to do'.

        The side-effect is, unfortunately, that individual artists may have a more difficult time making a living with music, etc.

        IMHO the large copyright holders are eliminating the business model of selling copyrighted 'property' because they are not meeting the market
        • ...you start to see why the asumptions of the underlying system (begins with a c and ends with -ism) little by little will fail.

          I was going to argue with you, but I can't figure out what you're talking about. Capitalism? Communism? Consumerism?

          My first assumption was that you meant capitalism. You seem to say that the capitalist system is failing, but the example you provide is a perfect illustration of capitalism at work. Music isn't worth US$20 per CD to consumers, so that model is failing and will ada
  • before the "audio fingerprint" changes? Say, speed it up by 5%, filter out some of the bass and drum, and profit.

    • Err, the current mass of shitty 128kbps mp3 files made by your average aol loser is bad enough. If your method allows flying under the fingerprint radar, fine. But I wouldn't want to download that crap then.

      Those people who care about quality you could catch with a simple md5 check, because they release lossless [sf.net] ripped by EAC [exactaudiocopy.de] with offset-corrected settings et al.

    • +5% is a mighty big shift in speed as well as pitch (think about it, most Dj turntables pitch +/-8%, and a full shift is well noticable), if it takes that much to defeat the fingerprinting it is most likely not worth it.

      Also, it depends on the implementation. Perhaps it takes possible shifting in the music into account? Perhaps the fingerprinting algorithm will shift all tracks to a constant BPM first? I'm sure with a little thought, such a workaround could be easily defeated, especially considering the na
    • If slowing down the audio is enough to escape the fingerprint, this would be preferrable. Slowing down the music would just interpolate the existing audio, instead of removing information which would happen if its sped up. Playback would go through a WinAmp filter and things would sound normal again.
      • Depends a lot on how the fingerprint is generated.

        If you alter the speewd up or down by just changing the sample rate neither would lose data, but only a fools fingerprint genorator would be fooled.

        And re-encoding the data either way would loose information unless you intened on doing it from the source (ie CD).

        Also, does anybody anybody use winamp anymore? Ick!
  • What if you introduced a little bit of static or something into the MP3? Not enough to be annoying, or maybe even really perceptible, but just enough to throw the "fingerprinting" off. I wonder if the technology is good enough to detect that. Also, if you were to record a song from vinyl, clean it up, and post it online, it might be different from the "official" version of the file. Maybe the technology might be able to detect the general pattern of the song, rather than exact sounds, but if not, Fanning's technology might not work out.
  • by BillsPetMonkey (654200) on Sunday July 13 2003, @10:32AM (#6428107)
    So you have a way of authenticating that a song is legitimately bought? An audit trail for each track? Wonderful. It's not going to be taken apart and cracked within a week is it? No-one's going to take our model and release a free implementation with much wider popular appeal are they? Are you sure? Great! We'll buy your company and give you generous stock options then.

    Please excuse me now, my pet unicorn needs feeding ...
    • Watermarking or fingerprinting files will change nothing so long as there are no easily-accessable, legit ways to cheaply download MP3s.

      But give us a reliable source of quality MP3s, at a reasonable price (like 10-25 cents apiece) -- and it's no longer worth the time or trouble to chase after files of unknown quality via P2P.

      Yeah, a few people will hack out the watermark and release them as freebies, but there again -- is it worth my time to hunt for and download freebie.mp3 (assuming anyone hacked the

  • by Anonymous Coward on Sunday July 13 2003, @10:37AM (#6428121)
    He has a good business plan: create a big problem, then solve it.
  • by Cyno01 (573917) <Cyno01@hotmail.com> on Sunday July 13 2003, @10:41AM (#6428134) Homepage
    I use the MusicBrainz [musicbrainz.org] tagger sometimes, and it works by comparing the audio signature of a song to the one in its database. This seems like the same sort of idea, but MusicBrainz tags files completly wrong a good percentage of the time, even listing the wrong artist - title info as a 100% match. I think this kind of technology has a ways to go before it could hold up in court or whatever.
  • A good idea, but.... (Score:5, Interesting)

    by Ride-My-Rocket (96935) on Sunday July 13 2003, @10:42AM (#6428144) Homepage
    the cat's already out of the bag. The real issue here is the existence of the middleman in the music industry. Prices of CDs are artificially inflated by the middleman (the music outfits behind the RIAA), because they control most of the musical output in this country. Consumers want this music, and some continue to purchase at these inflated prices. But when you can get the same music, albeit illegal, from an alternate provider (KaZaA et al.), why bother paying those prices at all?

    The solution is to bring the price of music back down to a reasonable level. If consumers are able to more directly compensate the artist for their music, and they can do so at a more granular level (i.e purchase tracks, vs CDs), and the easy of use is comparable to the p2p networks, then I bet you'll see a rebound in purchases. Granted, not all the people who use p2p will buy legit copies -- but I bet you'll see a significant rebound.

    This country is long overdue for some overhauls on copyright / fair use law. The RIAA likens consumers who use p2p as criminals, but the RIAA backers have already been convicted for price fixing and routinely screw the artists they purport to represent out of cash. Criminals calling their target market criminals? Even if they're right, it's a matter of the pot calling the kettle black.

    The days where the music industry could rob consumers without consequence is coming to an end. Exactly how it turns out is anybody's guess, but consumers are on to the RIAA's schemes and have a found a way to get their music without their shenanigans. Expect to see year-over-year sales to continue to fall until some of these leviathans go belly-up, and artists gain more control over production and licensing -- the way it should be.
  • not impressed (Score:5, Interesting)

    by mooface (674033) on Sunday July 13 2003, @10:45AM (#6428159)
    The concept of audio "fingerprinting" is an interesting one, but likely outside of Fanning (and his local folks) experience or abilities. Fingerprinting has to rely on one of two things. The first is the artificialities of files -- things like file length, name, checksums, etc etc. All of these are easily overcome, and likely not robust to differing compression/bit rates/etc. The second thing it could reply on is data content -- that is, things like how many beats per minute, the time/frequency pattern in segment(s) X, Y, etc etc. I'll call this analysis of content. Unfortunately, simple analysis of content and watermarking schemes are very easily detected and overcome (remember the Felten/audio protection challenge?). TRUE analysis of content (when certain instruments play, their timing, the singing, etc) is a very difficult signal processing problem that won't be overcome without serious mathematics. And as much as I like Fanning, I don't think he's got the juice for it. Just my $0.02
  • Fingerprint for free (Score:5, Informative)

    by Davak (526912) on Sunday July 13 2003, @11:04AM (#6428244) Homepage
    MusicBrainz [musicbrainz.org] already has a free music fingerprint program. It identified about 60-70% of my songs correctly. It also will rename your files and update the ID tags.

    The 30-40% it did not find... I could easily find by doing some searching manually through the program.

    It was a nice way to completely identify my mp3 collection. Yes, it's a legal collection, but I wanted an easy way to rename the files and id tags.

    Anyhoo... the program is pretty buggy so save often. Help the cause.

    Enjoy.

    DavaK
  • The problem... (Score:2, Insightful)

    by Anonymous Coward
    What good is this going to do?

    I thought the whole filesharing problem comes from people wanting to download music for free instead of paying for it. IMHO, the problem is not that there is trouble IDENTIFYING copyrighted songs, it's that it's hard to get people to PAY for them.

    Imagine this -- you have a network that identifies what you try to upload, and if it determines that the file is copyrighted, it charges you a fee. What do you do? Well, what did millions of people do when Napster tried to limit w
  • Audio fingerprinting is not something like a hash function that leads to a deterministic identifier. It is more like a web search engine that finds the best fuzzy match.

    If you use audio fingerprint scores in the aggregate, for example to see what's popular, it works. If you depend on any one audio fingerprint matchup being accurate, especially accurate enough to use for legal notices, it doesn't make sense.

    Music is a semantic object. Saying whether two pieces of music are the same thing depends on stuf
  • Ultimately, it will be up to the consummer to choose if something is worth paying or not. Those intelligent enough to understand that their favourite artist needs cash to survive will pay whatever they deem appropriate or a reasonnable fixed price. As long as content is available on the internet, forget trying to control it. And even if you wanted to enforce copyright laws, you can't prove that the targetted ip/account hasn't been hijacked/trojaned and/or it was a specific person behind the computer.
    As soon
  • by Anonymous Coward
    Sean's business model seems a tad flawed. His new software has already been written, and an SDK is freely available here [musicbrainz.org]. Source code for both the Linux and Windows clients (which includes the fingerprinting code) is a click away under their downloads section. Redhat and Debian packages are there too, as well as Ruby and Perl bindings.. so fire up apt-get and go to it!
  • Perhaps we can take the time to look at the root causes of the whole P2P/ music industry / RIAA debacle. We all know the context. But what are the hidden assumptions? Can we reanalyse these? And can we find a new model for buying and selling digital media that does not pit the greedy tycoons against the valiant hackers? I think so...

    1. The first assumption is that consumption is completely elastic. In other words, people will pay whatever the goods they want cost. (Assumption of the media industry.)

    2. The second is that value is constant. In other words, digital theft by a million people is equal to physical theft of a million CDs.(Assumption of the media industry, who come up with bizarre figures as to the "loss" sustained thanks to illegal file sharing.)

    3. The third is that digital content has no value. In other words, digital theft is not theft because bits and bytes have no value. (Assumption of the file swappers.)

    All these assumptions are wrong.

    First, consumption is almost completely inelastic. People will spend every disposable penny they have. If goods are cheaper, they will buy more of them. Raising the price of goods simply decreases demand.

    Secondly, value is not just constant, it is almost always inversely proportional to rarity. In other words, the more of an item is available, the less it is worth.

    Thirdly, of course digital content has value: that people go to great lengths to aquire it demonstrates this. However, its value is subject to the law of rarity.

    What does this all mean?

    Firstly, whether or not people illegally share music (and the same applies to movies), the value of media is going down inexorably thanks to the huge volume produced. And I'm not speaking of the cost of manufacture, but the perceived value, the price people are willing to pay. Diamonds cost practically nothing to produce, their value comes from their rarity.

    Secondly, an industry faced with this value equation has several options. They can try to restrict supply and eliminate competition, which is what the music industry has done for about 20 years since the CD eliminated the production bottleneck. In a competitive market they will lower their prices so that consumers stay loyal. We have also seen this. Finally, they can ignore reality and die.

    Thirdly, one of the ironies about digital distribution is that it eliminates the rarity variable. This means that any object distributed digitally will inevitably tend towards zero. I can download music from the Net but I value my own (irreplacable) CD collection much more.

    I believe that even the 'pay as you go' model is doomed to failure. The only sustainable model is one in which prices are set by the market and production by the producers.

    So, what I propose (or rather, predict, for this is almost inevitable) is a media market that works as follows:

    1. The producer of a work creates a specific number of instances of the work. This can be as large or small as they want, but they cannot change the quantity afterwards.

    2. The instances are individually serialized so as to be traceable to their owner. They can be copied freely.

    3. These instances are now auctioned and can be resold in an open market.

    This scheme can be applied to music, writing, photographs, almost any digital creation. Imagine a famous writer produces a short story. They issue a series of 1000. Now, you can buy one of these copies. It will be, forever, an original that is certified and unlosable. The price is set by auction, and the rights to these copies can be traded in an open market. What's the cost of a 2003 Madonna? Around $1.20, these days. And a 1998 Leftfield? Up to $30, if you can get them. In fact, you have paid not for a real thing, but for a slice of rarity. Sound strange? What about shares and options...?

    There is only one requirement for such a market, and that is the market place. All the rest follows from the natural laws of supply and demand.

    • Diamonds cost practically nothing to produce, their value comes from their rarity.

      Interesting comparison. Do some reading about DeBeers. They pretty much restrict diamond supply worldwide to keep the prices up. The music industry is trying to pull off the same thing, but it's much harder to restrict bytes than rocks.

      For some reason, demand for real diamonds is high, too. Science is now able to create diamonds that are molecularly nearly identical to natural diamonds. I asked a lady friend if she would ca
  • Beat me any my friends at a game of caps last night... there is no end to his talents ;-)
  • by Hao Wu (652581) on Sunday July 13 2003, @12:31PM (#6428733) Homepage
    Sure- I'll volunteer all my files to be tested by some random company, and they can tell me whether I owe them money or broke the law in some way.

    Just contact me here: Hao_Wu@not-likely-to-happen.com, care of GET FUCKED.
  • dead horse (Score:3, Insightful)

    by August_zero (654282) on Sunday July 13 2003, @12:36PM (#6428757)
    Nothing is worse than somebody who is too stupid to realize that their 15 minutes are up.

    • Nothing is worse than somebody who is too stupid to realize that their 15 minutes are up.

      You're forgetting about those who think their 15 minutes have started but they haven't. ;-)
  • by Dr.Dubious DDQ (11968) on Sunday July 13 2003, @01:12PM (#6428953) Homepage

    About these 'fingerprints' - are they SIMILAR for similar pieces of music? Or are they only useful for identifying the one piece of music that each fingerprint is for?

    If the 'fingerprints' are similar enough, you could ALSO use the technique to search for songs that you may have never heard but match the general style of music that you like. Sounds like something independent musicians could really benefit from ("Hey, I'd never heard of THESE guys before, but their music is exactly the style that I like....")

    And if this is NOT the case, is anyone working on a "music style" analysis of some sort that could be stored in a 'searchable' fashion? (i.e. take your favorite song, run it through an analysis program to get it's 'fingerprint', then feed that 'fingerprint' to a search engine to get a listing of similar songs...)

  • As a class project, a friend and I built a music recognition database. You can read our paper [trevorstone.org].

    The general approach is fairly straightforward. You extract a set of "features" (typically several Mel Frequency Cepstral Coefficients, or MFCCs) from each sample of the song, say 10ms. You then pick several (say, 16) arbitrary points and iteratively generate that many "average" feature vectors, along with their weights so that they all sum to a one vector. This data is turned into a Hidden Markov Model (HMM). To see what audio you have, you run it through each of the possible HMMs and see which produces the greatest likelihood.

    This method is typically applied to speaker recognition, where a linear search through HMMs is reasonable. This obviously isn't the case when you know about hundreds of thousands of songs, so a large part of the challenge is narrowing the field of HMMs to check (which is one of the focuses in our paper). Relatable [relatable.com], who were working with Napster a long time ago, have clusters that can classify 1,000 songs per second; I'm pretty sure they use this technique.

    This technique has several important features. First, it doesn't depend on any properties of files themselves. Checksums would be trivial to beat, looking at a file's length could be circumvented by inserting silence, etc. Since this creates an average of sample data, a song would need to be changed quite a bit to fail to match. (The system is robust to, for instance, changes in bitrate, slowing the music down, and rearranging bits of the song or putting it in reverse.) We didn't have enough "derivative" music to test how it handles sampled music vs. the original -- it depends how much is changed.

    Finally, this sort of system is useful for much more than song identification. You can build a model for an artist or genre and determine how to classify the song. One of my focuses in the paper is unsupervised genre classification -- my tests indicated some fairly reasonable groupings. This technique could be used for music recommendation -- "You like Dropkick Murphys? Well, they sound like Flogging Molly, so you might want to check them out."