Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
United States AI Government Input Devices The Almighty Buck Technology

US Intelligence Unit Launches $50k Speech Recognition Competition 62

coondoggie writes The $50,000 challenge comes from researchers at the Intelligence Advanced Research Projects Activity (IARPA), within the Office of the Director of National Intelligence. The competition, known as Automatic Speech recognition in Reverberant Environments (ASpIRE), hopes to get the industry, universities or other researchers to build automatic speech recognition technology that can handle a variety of acoustic environments and recording scenarios on natural conversational speech.
This discussion has been archived. No new comments can be posted.

US Intelligence Unit Launches $50k Speech Recognition Competition

Comments Filter:
  • by korbulon ( 2792438 ) on Thursday November 20, 2014 @08:04AM (#48425093)
    "Go fuck yourself."
    • My exact thoughts....I hope developers heed this and they get a total of 0 entries.
    • "Go fuck yourself."

      Better idea. Enter the competition, use already well-developed commercial software (or write a program to average the results of several commercial programs), and easily win the competition. It's not like anyone is going to create software worth millions and give it away for a tiny prize.

    • by Kirth ( 183 )

      The would better start to recognize peoples right to free speech.
      Which includes the right of not being pestered by the government (as in: put under surveillance) for it.

  • by Roodvlees ( 2742853 ) on Thursday November 20, 2014 @08:04AM (#48425095)
    Haven't Microsoft, Apple and Google already spend billions of dollars on this?
    Seems they are appealing to any random developer who might have an idea.
    • by bouldin ( 828821 )

      Haven't Microsoft, Apple and Google already spend billions of dollars on this?

      All the speech recognition software I've used has relied on a controlled environment (e.g. yelling directly into your phone with almost no reverberation, no competing conversations, very little background noise).

      Reverberation *should* be the easiest kind of noise to remove, because it has a simple mathematical model:

      S(t) = signal(t) + f(signal(t - delay))

      Where f() is a pretty simple function that may attenuate some frequencies mor

      • by ranton ( 36917 )

        All the speech recognition software I've used has relied on a controlled environment (e.g. yelling directly into your phone with almost no reverberation, no competing conversations, very little background noise).

        ...

        Modelling all the other kinds of background noise is much, much harder.

        I agree, but the issue is this problem is harder than those that industry leaders are putting billions of dollars of R&D money into. What is $50k really going to accomplish? There are Kaggle competitions that pay out more than that for far more trivial problems (like a marginal increase in CTR prediction).

      • Reverberation *should* be the easiest kind of noise to remove, because it has a simple mathematical model:

        S(t) = signal(t) + f(signal(t - delay))

        It's not that simple, a reverberant space can have dozens of different discrete delay taps, add secondary (and tertiary, etc) reflections and the resulting spectral envelope is just a fog with an effectively continuous system of delay. Also keep in mind that all "functions that attenuate frequencies" are themselves just delays whose length is a function of a parti

  • Call Nuance [wikipedia.org]and tell them you are going to make a money injection in their R&D dept.

    I'm sure your 50k will make a real impact, when added to their 1.9 billion dollar revenue.

    • by Anonymous Coward

      Thing is, every huge company has a core of an idea (perhaps built by the founders on a weekend), that they're just milking for all its worth... the $50k might motivate a lone wolf developer to build something that's qualitatively better than the multibillion dollar's core idea.

      For example, right now, all sound is filtered, transformed (frequency bands), quantized, and then those values are used to train a hidden markov model... that works for speakers in a quiet room---but doesn't for noisy environments or

      • Thing is, every huge company has a core of an idea (perhaps built by the founders on a weekend), that they're just milking for all its worth... the $50k might motivate a lone wolf developer to build something that's qualitatively better than the multibillion dollar's core idea.

        You may be right, let's offer $50k to whoever sends another probe to a comet. Sure it cost $1,4 billions to the ESA but a lone wolf could find a qualitatively better way to do the mission. By February 4, 2015.

        Slashdot is the last place where I expected to see an extremely difficult problem underestimated just because it's a computing problem.

      • I remember a demo out of IBM, I believe, for recognizing controlled vocabulary in high-noise environments. It handily OUT-performed humans -- listening to the test audio, you couldn't really be sure there was a human voice at all, but the software detected and interpreted the speech with high accuracy.

        This demo would have been circa 2000. I can't help imagining that there's been more progress since then.

        The proposed task, where the interference is correlated with the original sound, seems like fertile groun

        • by bouldin ( 828821 )

          The proposed task, where the interference is correlated with the original sound, seems like fertile ground for superhuman performance again. The original signal gets replicated and redundantly presented. Our brains are hard-wired to be confused by that, but it seems like a well-designed speech-recognition system could take advantage of it.

          Mammalian auditory systems actually have a lot of wiring that seems dedicated to processing reverberation.

          I'm not familiar with the IBM demo you mention, but the key there

          • I'm not familiar with the IBM demo you mention, but the key there is the controlled vocabulary. It was probably also trained on the speaker's voice. Those are huge constraints.

            I'm remembering that it was controlled-vocabulary, but speaker-independent. I think it was trained on spoken digits -- a very small vocabulary. It's been a long time, and I may be misremembering even the most basic details. Still, it was impressive to hear it picking out numbers where all I could hear was noise.

  • by MtHuurne ( 602934 ) on Thursday November 20, 2014 @08:23AM (#48425165) Homepage

    So they want a complex problem solved in 2 months (first test on Feb 4 and there are holidays inbetween), for which they will pay a relatively low amount and only to the winners. Even if the result wouldn't be used for spying, I don't think there would be many takers.

    • I am sick of these "challenges" that effectively try get programmers to work for effectively well below market rates. As if we're like children, a "challenge" is supposed to make us set aside months or years of income to work on a really difficult problem that if we had to actually go out and do for a company in the job market, we'd be paid $100K/year or more. I think they probably attract young people who don't understand the value of their own time or skills, or who are more easily lured by childish notio

      • by dj245 ( 732906 )

        I am sick of these "challenges" that effectively try get programmers to work for effectively well below market rates. As if we're like children, a "challenge" is supposed to make us set aside months or years of income to work on a really difficult problem that if we had to actually go out and do for a company in the job market, we'd be paid $100K/year or more..

        You're completely missing the point. They've found the Stargate and egyptologists are a dime a dozen. They need to form an elite team of programming and AI experts who will decode the symbols on the Stargate and defeat Apophis. This is just a fancy recruitment test.

        • Then they should code the tests into the next call of duty game. They can call it. Call of Duty :Prometheus. And features alien worlds and starships.

        • This is just a fancy recruitment test

          I don't think I've missed the point, as I'm saying the same thing - I just think it's a lousy way to do recruitment. Analogy time: Say you want to hire a sex worker. Here are two methods:

          1. Go find one that looks reasonable, initiate a negotiation. If you can find a mutually agreeable rate, hire her, otherwise continue looking for another one.

          2. Issue a "challenge" to all sex workers. Declare that every day for the next 30 days, every applicant must give you a free b

      • Well they're not doing it on purpose. The DOD is just used to it's contractors massively under-bidding to win the contract and then exploding the budget with 1000 MBA's united in the goal of shareholder profit maximization. Just enter the contest and when it comes time to demonstrate the algorithm say the schedule has slipped to April...2022 and you'll need an extra $3billion. You'll see, they won't even blink!
      • You might have a look at the IARPA releases on this, especially https://www.innocentive.com/ar... [innocentive.com]. Programmers are *not* being asked to release their software rights: "To receive an award, Solvers will not have to transfer their IP rights or grant a license to the Seeker – the purpose of the Challenge is to gauge how far recent advances in speech recognition have come in solving this important problem. With broad participation, this Challenge has the potential to provide IARPA with insights on the be

    • So they want a complex problem solved in 2 months (first test on Feb 4 and there are holidays inbetween), for which they will pay a relatively low amount and only to the winners. Even if the result wouldn't be used for spying, I don't think there would be many takers.

      Relatively low amount? For $50k it would have to be coded by volunteers and prison inmates.

      "It's breaking rocks with a hammer, being stabbed in the laundry, or coding the speech recognition thing."
      "Hmm, the laundry thing seems superfun but I'll pick hammering rocks. Give the coding gig to the guys in death row. They have nothing to lose anyway."

  • Given my own personal experience with voice recognition, it's not a problem we can throw money at. We can throw money AWAY trying, but we haven't improved much in many, many years of trying.

    I don't have a particularly poor speech, or unusual accent, and English-speakers all understand me - even foreign English speakers like the one I live with. But speech recognition has always been an absolute flop unless I want to learn how to talk to the computer, which is the exact opposite of what I want to happen.

    Si

    • by bouldin ( 828821 )

      Telling the difference between "eight" and "A" is much more involved than just context matching on a rough FFT of my voice.

      To do it properly, we're really looking into problems that are the equivalent of the higher functions of AI.

      Maybe the problem isn't with the AI techniques we're using, it's with the FFT.

      FFT assumes a very periodic, stable signal. It doesn't handle transients well at all.

  • When US Intelligence says something so clearly stupid, you always have to look for the subtext. The hidden message. The truth crouching behind the apparent idiocy.

    In this case, the hidden message seems to be "we are incompetent in even the simplest basics of our main task".

  • by Ol Olsoc ( 1175323 ) on Thursday November 20, 2014 @08:40AM (#48425241)
    First person arrested will be Stephen Hawking.
  • As usual with competitions like this, you shouldn't settle for the prize money if you develop such a thing because its worth quite a bit more.

  • Fifty THOUSAND dollars!

  • by Anonymous Coward

    Coincidentally, this competition, by its very introduction also reveals a method for making massive automated eavesdropping difficult. Unless it produces a success, that is.

  • What? they didn't want to hire the same people who built the Obamacare site? I'm shocked!
  • by l3v1 ( 787564 ) on Thursday November 20, 2014 @09:55AM (#48425671)
    So, who wants to be the one who improves the automatic speech2text capabilities of automatic wiretapping systems in the US for a few bucks? :))
  • Let's see...for $50K...I could probably write up a quick mobile app ($1K) that feeds microphone input into a streaming acceptance service on a server ($3K), that chops it up into wav files for Mechanical Turk processing. Fund that long enough to pass the POC stage ($2K), ride some odds (25%) and cash the check before the tech collapses = $6K for possible $12.5K win = $6.5K possible profit? Er...still no.

  • That acronym is an utter failure. It doesn't even work.
  • Course we'll have a list of entrants! And that is probably a good thing!

    Where do people who would do this come from? Is it child abuse?

Every nonzero finite dimensional inner product space has an orthonormal basis. It makes sense, when you don't think about it.

Working...