Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Media Music

Analyzing YouTube's Audio Fingerprinter 116

Al Benedetto writes "I stumbled across this article which analyzes the YouTube audio content identification system in-depth. Apparently, since YouTube's system has no transparency, the behaviors had to be determined based on dozens of trial-and-error video uploads. The author tries things like speed/pitch adjustment, the addition of background noise, as well as other audio tweaks to determine exactly what you'd need to adjust before the fingerprinter started mis-identifying material. From the article: 'When I muted the beginning of the song up until 0:30 (leaving the rest to play) the fingerprinter missed it. When I kept the beginning up until 0:30 and muted everything from 0:30 to the end, the fingerprinter caught it. That indicates that the content database only knows about something in the first 30 seconds of the song. As long as you cut that part off, you can theoretically use the remainder of the song without being detected. I don't know if all samples in the content database suffer from similar weaknesses, but it's something that merits further research.'"
This discussion has been archived. No new comments can be posted.

Analyzing YouTube's Audio Fingerprinter

Comments Filter:
  • music ip? (Score:5, Informative)

    by FredFredrickson ( 1177871 ) * on Wednesday April 22, 2009 @02:13PM (#27677041) Homepage Journal
    There's the open-source library - libOFA - developed by Music IP (http://code.google.com/p/musicip-libofa/) which happens to create PUIDs on the first 135 seconds of audio in a track. It's used in the music-IP mixer (for mood mixes) but is also used by music database projects such as MusicBrainz.

    From what I've seen, it's pretty decent audio fingerprinting, but I'm sure would be subject to the same limitations- if you remove the first 30 seconds of a clip- it would produce a very different fingerprint.

    There's no reason to believe youtube isn't using this library or a derivative. There's also no reason to believe this result isn't intended. If the first 30 seconds of a song are missing- maybe that makes youtube confident that it could be considered fairuse.

    Either way, I could imagine creating a fingerprint based on different sections of a song has the same problems doing an MD5 hash would- each fingerprint would be entirely different. If you don't just compare bit-to-bit, it'll be impossible to catch ALL permutations. And the fact is, that's a lot of computing power anyhow.
  • by bipbop ( 1144919 ) on Wednesday April 22, 2009 @04:05PM (#27678335)
    Youtube uses Audible Magic's audio fingerprinting technology, which is based on this patent by MuscleFish: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=5918223.PN.&OS=PN/5918223&RS=PN/5918223 [uspto.gov]
  • by BabyDuckHat ( 1503839 ) on Wednesday April 22, 2009 @04:38PM (#27678793)
    This is actually very useful information for someone looking for ways to defeat the filter, in that is lists the features of the audio that are used for generating the fingerprint. A successful work-around would most likely require modifications to several aspects of the signal.

    From the patent:

    The feature vector thus consists of the mean and standard deviation of each of the trajectories (amplitude, pitch, brightness, bass, bandwidth, and MFCCs, plus the first derivative of each of these). These numbers are the only information used in the content-based classification and retrieval of these sounds. It is possible to see some of the essential characteristics of the sound by analyzing these numbers.
  • Re:Yeah (Score:4, Informative)

    by Anonymous Coward on Wednesday April 22, 2009 @04:42PM (#27678853)

    Dear Pandora Visitor,

    We are deeply, deeply sorry to say that due to licensing constraints, we can no longer allow access to Pandora for listeners located outside of the U.S. We will continue to work diligently to realize the vision of a truly global Pandora, but for the time being we are required to restrict its use. We are very sad to have to do this, but there is no other alternative.

    If you believe we have made a mistake, we apologize and ask that you please contact us at pandora-support@pandora.com

    If you are a paid subscriber, please contact us at pandora-support@pandora.com and we will issue a pro-rated refund to the credit card you used to sign up. If you have been using Pandora, we will keep a record of your existing stations and bookmarked artists and songs, so that when we are able to launch in your country, they will be waiting for you.

    We will be notifying listeners as licensing agreements are established in individual countries. If you would like to be notified by email when Pandora is available in your country, please enter your email address below. The pace of global licensing is hard to predict, but we have the ultimate goal of being able to offer our service everywhere.

    We share your disappointment and greatly appreciate your understanding.

  • Re:Yeah (Score:2, Informative)

    by ausekilis ( 1513635 ) on Wednesday April 22, 2009 @04:46PM (#27678949)
    Perhaps a better link for information: Music Genome Project [wikipedia.org]. A little more detail from Pandora's blog [google.com].

All seems condemned in the long run to approximate a state akin to Gaussian noise. -- James Martin

Working...