Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Music Media

eDigital MXP100 with Voice Control 150

An anonymous reader writes: "Here is a lengthy review of eDigital's 1GB flash MP3 portable that is as much a review on Lucent's remarkable speech recognition technology VoiceNav as it is on the player. VoiceNav offers speaker-independent recognition, meaning it doesn't have to learn each individual user's particular speech patterns like IBM's ViaVoice. Just say the name of a music track into the player's microphone and VoiceNav pulls up and plays that song. In ideal conditions the reviewer was able to twice run through a list of 14 song titles without fail. This included titles with "non-real word" band names like Sum41 and U2. Neat technology that could make its way into PDAs soon. The player is a pretty good one too, using IBM's Microdrive for storage."
This discussion has been archived. No new comments can be posted.

eDigital MXP100 with Voice Control

Comments Filter:
  • by Mdog ( 25508 ) on Sunday February 10, 2002 @04:59PM (#2983516) Homepage
    I think I'm feeding the trolls on this one, but I can't understand why you think a company would spend money on adding support for that format unless it would be a selling point. I grant that mp3 is worse than ogg, but can you honestly say that ogg is big enough in the "real world" for a company to go to the trouble of supporting it? The vast majority of my linux using friends still use mp3, and you can bet almost no one in the windows world uses ogg.
  • by oregon ( 554165 ) on Sunday February 10, 2002 @05:03PM (#2983536) Homepage
    From the article ...

    Test 2 - Walking outside with occasional traffic passing by. All track names said in proper order. - Result: very good to excellent
  • by d5w ( 513456 ) on Sunday February 10, 2002 @05:15PM (#2983593)
    Voice recognition on computers has been around for a while now with products like Dragon, Via Voice, etc. All of these programs are clunky, somewhat bloated, and need to be trained to individual speakers. A truly speaker-independent voice recognition system could be just what the doctor ordered for Lucent.
    This kind of thing comes up every time speech recognition is mentioned here, and it's largely missing the point. Desktop speech recognition, as handled by Dragon NaturallySpeaking, is a very different problem from simple commands and list selection, and it has very different solutions. If you have to recognize and transcribe arbitrary sentences in a given language you have to handle a much larger search space in basically every dimension -- so much larger that the optimal search techniques can be very different, and (as in your comment) the resources required to implement those techniques will be incomparable.

    I won't say the problems are fundamentally different, because the fundamentals are much the same between the two domains; but nearly every detail of the implementation of those fundamentals is likely to be different.

  • by Anonymous Coward on Sunday February 10, 2002 @05:16PM (#2983600)
    IBM's voice recognition line extends past ViaVoice. We offer several products, including an embedded product, that do not require any training. Only the highest end dictation product requires training because of the demands on it to understand what you just said from tens of thousands of words. If all you can say is a hundred or so phrases like "play", "stop", "rewind", "livin' la vida loca", etc. then it's a lot easier to make a prediction and training is a waste of time. At that point it's just a matter of microphone quality and filtering out the background noise. We can even do untrained natural language voice recognition in situations like this with the proper processor power. Since we know what you're by and large going to say, we can pick out enough from the whole free-form sentence to get the gist of what you meant without any training.

    And believe me we're getting to the point where training isn't needed for dictation either :)
  • by sean23007 ( 143364 ) on Sunday February 10, 2002 @07:32PM (#2984104) Homepage Journal
    The error rate will grow exponatially with the number of songs, because statisically more song will be phoneticly more equal, the more you add. (bad way to say it, but you prob get the point)

    See sig. Wow.

With your bare hands?!?

Working...