Forgot your password?
typodupeerror
Music Media

eDigital MXP100 with Voice Control 150

Posted by michael
from the speak-your-mind dept.
An anonymous reader writes: "Here is a lengthy review of eDigital's 1GB flash MP3 portable that is as much a review on Lucent's remarkable speech recognition technology VoiceNav as it is on the player. VoiceNav offers speaker-independent recognition, meaning it doesn't have to learn each individual user's particular speech patterns like IBM's ViaVoice. Just say the name of a music track into the player's microphone and VoiceNav pulls up and plays that song. In ideal conditions the reviewer was able to twice run through a list of 14 song titles without fail. This included titles with "non-real word" band names like Sum41 and U2. Neat technology that could make its way into PDAs soon. The player is a pretty good one too, using IBM's Microdrive for storage."
This discussion has been archived. No new comments can be posted.

eDigital MXP100 with Voice Control

Comments Filter:
  • Have you seen any hardware player of Ogg Vorbis [xiph.org] format?
    • I think I'm feeding the trolls on this one, but I can't understand why you think a company would spend money on adding support for that format unless it would be a selling point. I grant that mp3 is worse than ogg, but can you honestly say that ogg is big enough in the "real world" for a company to go to the trouble of supporting it? The vast majority of my linux using friends still use mp3, and you can bet almost no one in the windows world uses ogg.
    • I have yet to see one - there was a rumor that somebody was going to release a jukebox with no advertising for it, but support in the firmware.

      I don't think we'll see players for Ogg Vorbis until the people behind the format realize that they desperately need to change the name. Unfortunately, (ugh) marketing counts for quite a bit in the real world, and 'Ogg Vorbis' is a name only a geek could love.
      • Granted, I don't use Ogg Vorbis. I think I looked into it a while back, but I spent too long ripping all my CDs to switch. That's the real issue. Even a batch mp3>ov converter wouldn't work. I don't want to recompress an already lossy compression.

        As for the name, I think ogg would be better to say than mp3. Ogg= 1 syllable, mp3 = 3. Plus, instead of ripping CDs, you can ogg them. Ogg players. No, in terms of names, ogg has mp3 beat.

        The problem is mp3 is "good enough", and already entrenched.
        • Ogg is just the name of the, uh, 'group' doing the work. The actual audio format is called Ogg Vorbis, in contrast with Ogg Tarken, their proposed video codec.

          So your sylable count is really incorrect :P
          • Well, maybe they should change it, then :)

            Like I said, I'm not way into Vorbis (there, better?), so I got that wrong. I don't believe I've *ever* heard of Tarken, although I vaguely remember hearing about a sibling video codec.

            So in all my examples, change it to "vorb". Vorb that CD. Still doesn't sound as good as ogging it, though :)

        • it's easy to configure grip to automatically rip and sort any cd you put in the drive. That way you just spend some time swapping cds around. You don't need to do it all at once
      • by Anonymous Coward
        I don't think we'll see players for Ogg Vorbis until the people behind the format realize that they desperately need to change the name. Unfortunately, (ugh) marketing counts for quite a bit in the real world, and 'Ogg Vorbis' is a name only a geek could love.

        And MP3 isn't a geeky name? A player isn't going to lose sales just because it supports a format with a weird name. It's not like Vorbis will be the only format supported - it will still be sold as an "MP3 player".

        A free fixed-point decoder would be more helpful than a new name. Fixed-point Vorbis decoders (for ARM processors) exist, but they need to be licensed. Most companies probably couldn't justify the licensing cost, but if it was free, they'd be more likely to add support. At least one company has stated (on the Vorbis mailing list) that they will support Vorbis when a free decoder is available.

        There are also rumors that iRiver's MP3 player will support Vorbis in a future firmware upgrade, but I haven't heard much about that.

    • by Anonymous Coward
      Ogg is _NOT_ better then MP3 from a market standpoint. Every 6 months there will be some new format that improves the compression and sound quality. Many times, geeks are to focused on the technical aspect, and not the market aspect. Ubiquity is the key. MP3 is good enough, and it is here to stay. WMA is also catching on, but you'll notice that even that took years to happen.

    • I belive the standard response whenever this question comes up (once a week or so it seems) is :

      There cannot be support for Ogg Vorbis in a hardware device until someone writes an integer-only decoder. These units do not have FPUs.

      Of course I don't know if anyone has written such a decoder recently, but I see the same response so often I thought I'd repeat it ;-)
  • This technology is just cool, with some pretty serious applications.

    I remember sitting around with some voice recognition program (can't remember what one) about 5 years ago, running through all of the little training things to get it to learn my speech patterns.

    I find it kind of strange that it's first appearing in an MP3 player, but I suppose that's the kind of market where a lot of innovation is going to be. I just wonder how long it's going to be until we start seing this in more practical applications, instead of just being a convenience thing.
    • I think they make it pretty clear in the review that the voice recognition technology does not need to learn any speech patterns.

      "Unlike a software product like IBM's ViaVoice, which needs to learn each individual user's particular speech patterns over time through regular use, VoiceNav requires no such learning input. That's a heck of an accomplishment if it works well."
  • Now all somebody has to do is link this with a domain-specific natural language parser, and put it in my car stereo (along with a decent amount of storage) and I'll pay any amount for it.
  • Does the voice recognition filter itself out? When U2 sings "one" I don't necessarily want it switching to Aimee Mann's "one" and vice versa.
    • According to the article, voice rec doesn't work when it's playing. So no stop or pause voice commands, and it won't switch songs accidentally...

    • Re:Filters (Score:3, Informative)

      by d5w (513456)
      Does the voice recognition filter itself out? When U2 sings "one" I don't necessarily want it switching to Aimee Mann's "one" and vice versa

      From the review:

      Navigation using VoiceNav only operates when a song is not playing (manual controls will allow navigation when a tune is pumping), therefore there is no "Stop" or "Pause" command.
      So they punted on that problem.

      On another front, tt looks like "one" isn't likely to produce useful responses from the speech recognition in any case. The only times the reviewer seems to have gotten acceptable recognition of track names were when saying the entire artist and title.

  • Anyone who has their song names in japanese charecter format might be SOL for the voice part, unless it can read kanji/hiragana/katakana
  • wonder how well it would work on, say, the side of a highway. if it worked well this would be a nice little toy for those of us who run (or bike) around.
  • yelling (Score:1, Funny)

    by Niksie3 (222515)
    We tried and found that the background din of music, talking, and slamming weights was too much for VoiceNav. Once in a blue moon we got the track to shift, but not until speaking loud enough to draw the gaze of a few patrons who wondered why we were yelling at our MP3 player.

    heh, I loved that part
    • Yeah, this brings up the only problem with this technology- do you really want to be observed on public transport saying, apparently to yourself:

      "Hit me baby one more time"

      graspee

  • It seems really nice now, but when you have the machine in hand, the VC really sucks. As stated in the review, it's really picky with order. The best thing though would be when you use it in a really crowded place. I'd get it just to get looks from people when I'm yelling at the player. Interesting stuff. "Play Bach damnit! I want to hear something soothing!"
  • by Anonymous Coward
    I guess I have too many obscure mp3s, but how can the voice control differentiate:

    Daydream Boat.mp3
    Day Dreamboat.mp3

    Alpha Betray.mp3
    Alphabet Ray.mp3

    Mont Anagram.mp3
    Montana Gram.mp3
    ...
  • Bearing in mind that speech recognition not yet the equivalent of the chatty computer on TV's Star Trek...
    Let us be grateful for small favors! [wavsource.com]
    • it's always puzzled me, why does Data have to ask the ship for things?
      Why doesn't he have a tricorder built into him?

      It's like "we need to make the equipment purposely braindead or else the viewers won't know wtf. is going on.

      Same as the coppers in The Bill always telling each other what they are doing. "I'm going to put this in a back to stop anyone's fingerprints getting on it". I mean didn't these coppers ever go to cop school! (or watch TheBill and see what the coppers said to each other, er out of stack space
      • it's always puzzled me, why does Data have to ask the ship for things? Why doesn't he have a tricorder built into him?

        It's like "we need to make the equipment purposely braindead or else the viewers won't know wtf. is going on.

        You've answered your own question. Bad TV shows don't trust their audiences to figure things out. So the characters waste a lot of time telling each other things they already know, but the audience might not. That's also why characters tend to be stereotypes. (Hence the "teaching black actors to act black" scene in Hollywood Shuffle [imdb.com].) Picard is brave-but-awkward, Worf is absurdly obsessed with "honor", yada yada.
        • To be fair, if "Data" regularly told the ship what to do without uttering the commands, viewers simply wouldn't know he's doing it. Not because viewers are dumb (well, not exclusively) but because it would probably be unreasonably difficult to convey the transaction otherwise. Of course, the dramatic element be ignored either.

          Consider this point also, why is it that many high level computer protocols (HTTP, POP3, IMAP, etc.) use ASCII strings? Technically, it isn't necessary and it is a lot less efficient. But there are many benefits to it, and there might be similar benefits in the future to spoken language interfaces between machines as well.
  • Lucent - formerly Bell Labs - really needs a shot in the arm. It's stock price has been battered big-time lately for reasons unconnected to the dot-bomb phenomena. Voice recognition on computers has been around for a while now with products like Dragon, Via Voice, etc. All of these programs are clunky, somewhat bloated, and need to be trained to individual speakers. A truly speaker-independent voice recognition system could be just what the doctor ordered for Lucent.

    I searched Google for "VoiceNav" and the only references that came back were those connected to the MXP-100. I wonder if this is brand new. On the down side, if this does represent a breakthrough of sorts, Lucent probably holds patents on the technology that they will milk for all they're worth. The old Bell Labs used to have fairly liberal licensing policies for some of their stuff (UNIX anyone?) but now they're profit-driven. Shareholders might not look favorably on giving away a possible golden goose. I would love to see the magic behind this technology in an Open Source form.
    • by d5w (513456) on Sunday February 10, 2002 @05:15PM (#2983593)
      Voice recognition on computers has been around for a while now with products like Dragon, Via Voice, etc. All of these programs are clunky, somewhat bloated, and need to be trained to individual speakers. A truly speaker-independent voice recognition system could be just what the doctor ordered for Lucent.
      This kind of thing comes up every time speech recognition is mentioned here, and it's largely missing the point. Desktop speech recognition, as handled by Dragon NaturallySpeaking, is a very different problem from simple commands and list selection, and it has very different solutions. If you have to recognize and transcribe arbitrary sentences in a given language you have to handle a much larger search space in basically every dimension -- so much larger that the optimal search techniques can be very different, and (as in your comment) the resources required to implement those techniques will be incomparable.

      I won't say the problems are fundamentally different, because the fundamentals are much the same between the two domains; but nearly every detail of the implementation of those fundamentals is likely to be different.

      • Re:Voice Recognition (Score:2, Informative)

        by marphod (41394)
        ACtually, I work in this field.

        Dragon, ViaVoice, etc. are dictation recognizers. They work by analyzing the speech data, and attempting to do phoneme matching to generate words, from a huge dictionary, and then do word matching.

        This isn't an overly exciting model for different reasons. Large vocabulary recognizers have been around for 8-10 years. Nuance, SpeechWorks, Philips, and Temic end up being the big four in this market, allthough there is also a large vocabulary implementation of ViaVoice and others.

        These products take a fixed grammar set, compile them in an speaker-indepedant manner, and can be used to recognize the compiled grammar. Without getting overly techincal, it is a very different speech recognition method than the dictation recognizers, as they aren't trying to recognize everything out of a dictionary, but simply out of what the known grammar is. The flexiblity in how the user can phrase the requests is small, but for relatively simple tasks, its a fine trade off.

        Look at SprintPCS's VoiceCommand for example. (I was one of the writters of the product -- not the handset based recognition, but the serverside voice activated dialing solution). The idea is very similar, but we handle the concept a little differently.

        This type of device is just waiting to happen. With VoiceXML designing tools like this will be standardized, but its not anything new, just a use of existing technology.
        • Large vocabulary recognizers have been around for 8-10 years. Nuance, SpeechWorks, Philips, and Temic end up being the big four in this market, allthough there is also a large vocabulary implementation of ViaVoice and others.
          Um... You meant "small", didn't you?
          • No, I meant large.

            Small vocabulary recognizers are of the sort that have grammars in the range of dozens or, possibly, hundreds of utterances in grammar. The DTMF-replacement recognizers are small vocabulary recognizers ("press or say 1", for instance).

            While 'large' isn't overly descriptive, all off the recognizers I mentioned can handle grammars in the thousands of possible utterances. Not as large as dictatioin recognizers, but the theory of operation is vastly different.
  • . . . otherwise, there'll be a special broadcast on radio, cable, and embedded in trojan MP3s one day. It'll be Jack Valenti's voice saying "Don't play non-SDMI compliant content anymore." :).
  • by Ezubaric (464724) on Sunday February 10, 2002 @05:13PM (#2983582) Homepage
    When I can't get voice rec to work, I usually end up speaking louder because the frustration is just too much. It's bad enough listening to people yapping down the street or in stores with those little embedded mikes and earphones. Can you imagine hordes of people walking down the street screaming:

    "Uncle Fucker"
    "Baby Got Back"
    "Cocaine"
    "Cocacabana"

    The last is probably worst of all. We know Barry exists, but it's horrible to be reminded that people actually listen to him.
  • by Anonymous Coward on Sunday February 10, 2002 @05:16PM (#2983600)
    IBM's voice recognition line extends past ViaVoice. We offer several products, including an embedded product, that do not require any training. Only the highest end dictation product requires training because of the demands on it to understand what you just said from tens of thousands of words. If all you can say is a hundred or so phrases like "play", "stop", "rewind", "livin' la vida loca", etc. then it's a lot easier to make a prediction and training is a waste of time. At that point it's just a matter of microphone quality and filtering out the background noise. We can even do untrained natural language voice recognition in situations like this with the proper processor power. Since we know what you're by and large going to say, we can pick out enough from the whole free-form sentence to get the gist of what you meant without any training.

    And believe me we're getting to the point where training isn't needed for dictation either :)
  • by hovik (257174) on Sunday February 10, 2002 @05:20PM (#2983614)
    In ideal conditions the reviewer was able to twice run through a list of 14 song titles without fail.

    This doesn't mean much. To pick the correct one between only 14 possible is quite easy. The reviewer should rather have tried with a playlist with more than 3000 entres. The error rate will grow exponatially with the number of songs, because statisically more song will be phoneticly more equal, the more you add. (bad way to say it, but you prob get the point)
  • When will the posters read the damn article this is not a 1GB flash system. When there is a way to fit a gig of flash into a CF2 slot I would like to be the first to know. The diffrence in both seek time and transfer speed between Flash and a microdrive is definatly non-trivial. The micro drive is a small HD, thats it, and they have been around for a while.

    As for the test I would like to know if anyone here has less then 15 mp3's that they would like to store on one of these. I want to see how it reacts when you have a few hundred songs and try to use the name recognition system. I have the odd feeling that it might not work so well

  • Moving parts (Score:2, Interesting)

    by DodgyGeezer (83311)
    For me, the biggest attraction of MP3 players is the ability to have no moving parts. This makes it truly portable and useful in more situations that what we had previously. So, my question is, how reliable is this IBM microdrive? How robust is it? If I'm training for to run a marathon, is it going to survive all of the pounding?
    • b/c the microdrive is not really FLASH it is an actual HD in there it would probably not survive the pounding that running gives...

      In a post a while back someone mentioned that they dropped their Microdrive from a height of about 4 ft onto a carpeted floor and it never worked again. I would suspect that long-term pounding from running would have much the same effect.

      • ... and in the same post somebody said they dropped their microdrive one a hard floor, it bounced several times, and it still runs fine.

        Everybody has walkmans that withstand the pounding of joggers. The microdrive, since it has less mass than a CD, is pobably less affected & no doubt has ant-shock features designed in.
        • Maybe so, but if your CD hits the laser unit on your discman, it won't die horribly. A microdrive (as with any hard drive) will, i think, be toast if the head touches the platter.
          • A microdrive (as with any hard drive) will, i think, be toast if the head touches the platter.

            That's not the problem it used to be as I understand it. I was told the microdrives are using heads "printed" on mylar or something similar so that they won't carve impact craters on the platters.
        • The Microdrive's disk does have less mass than a CD, but thats not a good comparison since the two storage mediums are read differently.
          I my experience, (I used to work in a photography store) IBM's microdrives have much lower tolerance than a standard desktop or laptop harddrive. Of the 6 microdrives we sold while I worked there 3 of them broke.

          Reliability issues aside, a hard drive the size of a compact flash card really is a pretty amazing little piece of technology.
          • Interesting. I would have thought the smaller the drive the more able it was to withstand acceleration (what with the mass shrinking as the cube, but torsional strenght as the square, of the feature size).

            Any experience with the toshibas that Apple uses?
    • you don't need to use Microdrive. Any CF card will do it. Of course, CF are more expensive, but still the cheapest among the solid state formats.
  • Hrm, the thing dosn't look quite as cool as the ipod. Not that I don't hate apple or anything, but there don't seem to be a lot of players out there that have both a high capacity and the esthetic styling approaching or surpassing the iPod. There are some cool looking mp3 players, and there are some that are better technically then the iPod. But unfortunately, they don't seem to be in the same group. (of course, given the price you could just get a real PDA that can play mp3s for a bout $100 more...)

    Personally, I doubt the voice nav in the current system is really that great, especially since you have to manually stop the music in order to use it. Of course with 200 or so songs it might come in handy (if it scales that well).
  • Hype Company (Score:4, Informative)

    by Anonymous Coward on Sunday February 10, 2002 @05:40PM (#2983686)
    edigital has a long history of using hype and grossly misleading tactics to, IMO, defraud investors. So far they've lost tens of millions of dollars, and recently had to resort to taking a loan at a 49% interest rate [bloomberg.com] just to stay in business. Even the CEO has referred to the investors as a "cult".

    As for their history with their products, their much-hyped Treo barely sold any units in stores, and is now being sold by liquidators on ebay [ebay.com]. A lot of customers were a bit pissed that their players didn't come with any storage media!

    This wasn't intended as flamebait, but E.digital has a long history of using hype and misleading tactics to pursue little more than an incursion of investment money from gullible public investors. I didn't lose any money to them, but a lot of people did, and will continue to.

    In fact, they recently registered 20 million more shares [yahoo.com] so they can stay in business a while longer. They really don't deserve this kind of attention from Slashdot.

    For those considering investing in them, I'd say stay away. For those considering a product purchase, I'd recommend the same [pcworld.com].

  • Voice navigation systems are cool and they definitely have a "gee wiz" factor, but are they really useful? Sure they have a very short learning curve, but people tend to use alternative navigation methods after using the product for awhile. I remember having voice nav way back in 93 with the soundblaster AWE32. [man.ac.uk] That was really cool back then, but nobody actually used it. Sure voice nav on the computer is much more reliable now via products such as viavoice [ibm.com] and dragon [dragonsys.com], but both those products aren't nearly as fast for an midly experienced using point and click or especially keyboard shortcuts.

    I have a lot of friends who have sprint phones with voice nav. They all used it for the first week because it was "cool" but after awhile, they went back to traditional methods. Another example is my father; he got the 02 Infinity Q45 [infiniti.com] which has loads of tech toys built in. The voice nav is really cool but it's not nearly as fast a clicking a button.

    • The only thing I could see is that with up to 1 gig of MP3s..that's approximately 500 songs. It might be difficult to scroll through the list to find a particular song using those tiny buttons. Also, if you were driving or walking or doing something else, you don't want to have to keep looking down to change songs. But you're right, to the most part, consumers either love it or look at it as a fad.
      • Voice navigation systems are cool and they definitely have a "gee wiz" factor, but are they really useful?
        Yes! Yes! Everyone needs speech recognition! Everywhere!

        Oh, wait, I don't work in that business anymore. Never mind.

        That said,

        It might be difficult to scroll through the list to find a particular song using those tiny buttons
        List selection is one of the areas where speech recognition can really shine. The recognition task is usually fairly easy (or, in the case of phonetic ambiguity, impossible), and it fills a real gap in the other available interfaces. On the down side, though, when it goes wrong it's a pain to correct a mistake. "No, I meant that other one of the six thousand items in the list."
  • by flacco (324089) on Sunday February 10, 2002 @05:54PM (#2983732)
    ...only I pictured it with the ability to retrieve a song by just singing a bit of it or speaking some lyrics.
  • I guess there was really no reason not to add voice nav to the system. The DSP arhcitecture they use for decoding is also pretty ideal for voice recognition apps. It's just a matter of adding some software they probably alreayd own and want to test.
    I figure this gives them a cheap opportunity to test their voice rec. system where it won't cause too many problems if it doesn't work (You can still play MP3's) adn none will be too pissed.
  • Just say the name of a music track into the player's microphone and VoiceNav pulls out a rabit...
  • "In ideal conditions the reviewer was able to twice run through a list of 14 song titles without fail. This included titles with "non-real word" band names like Sum41 and U2."

    Don't buy the "it-worked-for-me" argument. Especially with speech-recognition technology. A selective test is not a benchmark.

    This speaker-independent technology is based on recognition of phonems. To be able to perform recognition, you first need to translate written entries into sequences of phonems. For example, "Genesis" will become "JH EH1 N AH0 S AH0 S". Usually, this conversion is done by looking up in a phonetic dictionary. When there's a missing entry, a fallback strategy is to perform automatic graphem-to-phonem automatic, i.e. create phonem strings based on lexical structure of a word. This yields poor results for many languages such as english which has unpredictable graphem-to-phonem correspondence. So, either this technology uses a dictionary (within the PC application) or it uses a graphem-to-phonem engine. The problem with dictionary is that it may be HUGE with all the music authors and titles available and it evolves rapidly.

    Also, the training is usually done for only one language (sometimes, two). This is called acoustic model training. Each phonem of a given language will be trained in HMMs (Hidden Markov Model). You can only achieve limited results when using words made out of foreign phonemes. "Björk", for instance, will be phonetized "B Y AO1 R K" for english-speaking persons. If you happend to pronounce correctly (i.e. in Icelandic), the engine won't be able to figure it out because the acoustic data is not modeled properly.

    I have strong doubts about this gadget because it requires dynamic dictionaries and multi-lingual support. I listen a lot to foreign music. I don't think this toy will work ok for me.

  • If this thing ran off CDs and supported ogg vorbis I would buy this in an instant. As it is i'm forced to drool over the spiffy voice recognition and keep waiting...
  • Li-Ion rechargeable battery (3.7V/1200mAh) for over 12 hours of playback

    you can't really get a gig worth of MP3s out of that..

  • It's tempting, but I won't go for it. I'm too much of a They Might Be Giants fan. I can see it now, sitting there in a public area with some weird looking device in my hand:
    "PUT YOUR HAND INSIDE THE PUPPET HEAD!"
    "...NO!" Someone speaks to me "Are you OK?"
    "Yeah Yeah," Yeh Yeh starts playing. "Ahh!"
    "DIG MY GRAVE"
    "Sir, are you sure you're alright?" [stopping]
    "Yeah, fine." suddenly person A asks person B for a light. "I've got a match."
    The thing starts playing agian. Just then a Dirt Bike wizzes by and someone says "Man, that's a fast Dirt Bike." Guess what song starts playing. Then I stop it so I can play "I AM A HUMAN HEAD!" again getting more stares.

    Then what if I want to hear Chuck Berry? "MY DINGALING" *SMACK*

    No, for me this is nothing but trouble...

    --Josh
  • you might be interested in the fact that this has already been done [apple.com]
  • The only reason we haven't seen OGG Vorbis support on solid state players is that they would only lose money by doing so, at least for now. This is coming from someone who encodes all of his own CD's as .ogg's.

    Alas, I wish there were some incentive for player manufacturers to add the support. There are two ways I can see for this to happen:

    (a) Make adding it as trivial as possible. If adding .ogg support required only a few days of extra development time, you'd see it.

    (b) Increase the market share that OGG Vorbis has. This one is trickier, mainly because of the slim market that a good, lossy codec serves. What do I mean? Well, audiophiles aren't going to want to listen to any compressed format (though these dinosaurs claim their hissy records are better-sounding than Super Audio CD), and Joe Sixpack isn't going to notice any difference at all between .mp3 and .ogg.

    Having done numerous sound quality tests of OGG Vorbis and MP3 on my own equipment, I can say without a doubt that were all things considered equal, OGG would win out. Unfortunately, OGG has had a very late start, and is up against lots of other competitors who are all "good enough" for the average person, so its supporters will have to reduce the barriers to its use before anyone will care.
    • The only reason we haven't seen OGG Vorbis support on solid state players is that they would only lose money by doing so, at least for now. This is coming from someone who encodes all of his own CD's as .ogg's.

      Actually I think that the only thing stopping OGG Vorbis on hardware players is the lack of a free fixed point decoding library. Right now you can find free floating point decoding libraries, but not fixed point. Most of the processors used in hardware players do not support floating point operations. The CPU's only have an integer unit. When a fixed point library is released, I think that you will find Ogg supported everywhere that MP3 is, since it should be trivial to add, and will only take up a little more ROM.
  • It is very impressive when you shout the word "Folder" talking like Apu Nahasapeemapetilon and it still works.

    Not really. I'd be impressed though it picked up on the guy's name instead of his accent.
  • What if someone tries queing up their favorite track from The Faint's Danse Macabre.
    • That's no big deal, French pronounciation is probably easier for the software to deal with than, say, Aphex Twin's DeltaMi-1=aSigman=1Di(n)(SigmajEC(i)Fij(n-1)+Fexti (n-1)).

      (Couldn't find the appropriate keystrokes for the above characters on this IBook keyboard I am stuck with for the moment....)
      • You obviously haven't seen the back cover of the faint's album :)

        I tried desperately to find an image, but no luck. They're not French either, that is unless there's a lot of new-new-wave-indie-rock-frenchmen in Omaha.

        I do, however, concur that some of Aphex Twin's tracks might be a bit harder to pronounce.
        • Actually I've got the back cover of the Faint's albums, Danse and Blank-Wave, at home about 60 miles west. Yes the print used for track titles is heavily artistic, some would say illegible, but that doesn't mean that they don't have normal english names. My point was that some songs have titles which are near impossible to pronounce, if not possesing names that are intentionally impossible to say.

          I think just the ability to feed voice commands such as "Stop", "Track, Next", or "Mode, Shuffle" would be enough for me to consider paying a little extra.
          • Ah, fair enough then. I suppose there could be verbalizable words hidden in that gibberish, so I'll concede. I'd have to wonder what it would be like with something like The Microphones "Window" with a bunch of the tracks having the same name.
  • Portable MP3 players of all things get the voice tech first. Why? Same with phones. The cell phones have the voice recognition, but if there are POTS phones that have it, they aren't exactly making commercials about it (not that I watch TV anyways)

    This feature would be no less useful on a desktop. It's definitely ideal for a small portable unit where working with a tiny display screen and buttons to switch between a large selection of songs can be tedious. However, being able to swap songs by simply speaking to your computer without forcing yourself to do a task switch could be helpful as well. Certainly, the 10-20 seconds you spend doing so isn't significant by itself, but this does add up over time. Its all about productivity people!

    MP3 players are pioneering the way in other areas as well. Other than perhaps digital cameras, they provide a market for flash memory. And getting realtime playback, and hopefully soon widespread use of unrestricted realtime mp3 encoding for these units, will enhance their use beyond the simple playback of music. And of course, don't forget, anything that pisses off the RIAA is a good thing. :)

    -Restil
  • Because of the amount of songs mp3 allows us to carry around, indexing the songs we have with us is a tricky thing. There are numerous indexing methods on MP3 players at the moment.. playlists on the iPod, simple numeric 'album' jumps on MP3-CD players, search facilities on in-car units etc.. but voice definitely simplifies matters.

    However, I spy a problem. Even if it doesn't require training to recognise a voice, I bet it's still limited to a subset of accents.

    You notice it with voice-recognition computer programs here in the UK. You speak normally and it rarely works.. put on the dullest most monotone American-style accent you can, and hey presto, up and running!

    So, to get one of these, is a prerequisite that I practice my 'dull American drone'?


  • why is everyone suprised that a mp3 player has voice control ?, cellphones [sonyericssonmobile.com] here in the UK have had that technology for ages, but now they are moving just beyond "call pizza" to built in mp3 players and radios [nokia.com] , at current speed of development its gonna be this year they merge these technologies and we end up with a voice controlled mp3 player with pda and cellphone with built in cameras ! [nokia.com] yay, the end is near for all these fragmented devices [archos.com] and we will soon have that device we all want that fits in our pockets and does everything in one single device :)
  • One of the differences: CF (including IBM Microdrive) is removable. iPod uses non-removable 1.5'' Toshiba drive. The fact that it is removable may make the difference. For instance, if you also have a digital camera which uses CF cards. Also with CF you have a choice between a microdrive and a solid state. As to memory capacity, 1.5'' 10GB drive just became available (doubles the capacity of the one used in iPod), so even if IBM comes up with 6GB microdrive next year, 1.5'' format is still the higher capacity. Finally, firewire is a way faster than USB. we need at least usb2.
  • I might get modded down for this, but eDigital has just left a bad taste in my mouth..... And I wanted to share... ;)

    I personally see this as being *on* topic because before you buy something from eDigital let me tell you what you *might* just be in for.

    I'll do a condensed version of my story and just say "don't let this happen to you". I got a Treo 10 MP3 Jukebox from http://www.treoplayer.com for an xmas present. I'll be looking for a new xmas present.

    My Treo 10 was basically D.O.A. the unit's harddrive would lock up during playback.

    It took me *one month* to get an RMA number.

    When I got *finally did get* the RMA number and sent the unit back I was to "promptly have a new unit sent" to me.

    This didn't happen. The Treo 10 is on back order and no replacements will be sent out until *APRIL*. Like I'm going to wait three months for a replacement.

    SO, I demanded a full refund. Their main support center said 'OK'. I got my credit email today and was told they were going to keep 15% for a "restocking fee" (?!?!?).

    So, I called -- again -- raised hell, and am finally getting a full refund.

    During this time, I went back to doing realtime recording of MP3's using my Sony MZR-900 (minidisc Walkman) and my digital soundcard. What I found was that the sound quality of my MP3's coming off my computer and onto my MD Walkman was *better* sounding than anything coming out of the Treo 10. I guess there's something to be said for Sony's D/A chips. I also re-discovered how convenient the MD Walkmanis.. It, and 3 Minidiscs easily fit in my coat pocket. I also have more than enough battery power to get through the day at the office.... And MDLP 4 mode is certainly livable enough for my needs. Hell it *still* sounds better than a cassette tape walkman if you ask me and I can 'boost' highs and lows to compensate for the sound loss during compression via WinAMP if I need to.

    So that's it. No more MP3 jukebox BS for me. I'll stick to what works. And if you *do* decided to get an MP3 juke box - avoid eDigital like the PLAUGE! Their customer service is horrible and
    their product when it *does* work is only of passible sound quality.

  • On the article they dont mention the battery life on this thing. The microdrive is GREAT, but it consumes a lot of battery life when compared to a flash memory card.

    Anyone has any info on this?

    Thanks!
  • In one form or another, speech recognition is going to be used more and more in the future, perhaps especially with handheld devices and tablet PC's. So, in light of this, who is working on Open Source speech recognition. I'm aware of CMU's Sphinx project, but last I saw it was quite obsolete technologically compared to commercial offerings. Is there any other Open Source'd work being done with cutting edge SR techniques?
  • by acoustix (123925)
    No where in the actual article does it say that it uses "1GB flash" cards. However, the IBM microdrive does store that much data (340 MB, 512 MB or 1GB).

    As far as I know the "SanDisk-compatible CompactFlash(TM) Cards" max out at 128 MB.

    They might want to update the article seeing how it may get some people's hopes up.
  • I hope voice recog is better than the last time I used it!

    Trying to load stairway to heaven:

    "Stairway...delete that...Stairway...delete that...no! Delete that!...Shit...delete that...delete that...delete that... Stairway...to...delete that...to...delete that...to...delete that...to...heaven...delete that...heaven...delete that...heaven...delete that...heaven...play...delete that...play...delete that...play...delete that...play...delete that..."

    :)

    I hate voice recognition.

  • They'll never get it to play track 2 from Windowlicker. (Although, I do love Amazon's attempt.)
  • The writing standards in this review by Richard Menta are amongst the worst I have seen. He repeats almost every piece of information at least once (so you say it supports MP3 and WMA?) and fails to mention some pretty crucial features of any mp3 player. For example, he mentions the lithium-ion batteries "had no trouble handling the power hungry Microdives" - but how long did they actually last?!

    Also, testing the VoiceNav feature with 14 songs is laughable. You basically had to know the full artist+track name anyway, so why not just memorise the 14 tracks and refer to them by number? My mp3 player has over 3000 tracks at the moment and I have no confidence that VoiceNav could handle them, despite reading this review which gave it 4 stars. And seriously, on a crowded and noisy subway train who is going to yell "Achey breaky heart" into their shirt pocket?

    Not I, for more reasons than one.

  • We even tried various accents to throw the player off, everything from Brooklyn to southern to bad impersonations of various Monty Python and Simpson's characters. It is very impressive when you shout the word "Folder" talking like Apu Nahasapeemapetilon and it still works.

    I would have tested it only using the Comic Book Store guy's voice. It seems like the type of thing he would use. "Worst playlist, ever!"

...when fits of creativity run strong, more than one programmer or writer has been known to abandon the desktop for the more spacious floor. - Fred Brooks, Jr.

Working...