Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
News

Consonants Not Required 139

billybob2001 writes: "A report at the BBC explains how voice-control of computers can be more successful using grunts and sighs, as "voice recognition programs often failed to accurately capture words". Dr Takeo Igarashi, of Brown University suggests the use of "ahhhh" for skipping tracks on a cd, or adjusting tv volume, but I wonder what the effect would be on pr0n sites? Another suggestion is "uh oh" for undo. Perfect for online banking. Is this going to confuse your system or what?"
This discussion has been archived. No new comments can be posted.

Consonants Not Required

Comments Filter:
  • It's cute, but... (Score:5, Interesting)

    by d5w ( 513456 ) on Thursday October 18, 2001 @08:57AM (#2446100)
    The computer can't distinguish words easily, so we'll give you a potentially much smaller vocabulary and see if it does better? Of course it'll do better, whether or not that smaller vocabulary contains consonants.

    What I'd worry about is whether these unarticulated sounds sound more like background noise than articulated speech; if so, then you've made the situation worse by making it harder for the computer to know when you're talking to it.

    On "uh oh": Dragon Dictate (discrete speech recognition from a few years ago) used "oops" for telling the SR system when it made a mistake; it was reasonably easy to distinguish from words that you actually wanted to put into your text with any frequency.
  • by Anonymous Coward on Thursday October 18, 2001 @09:02AM (#2446115)

    Seriously. I have colleagues that work on this type of thing:

    "Sound Symbolism in Conversational Grunts in English"
    "The Challenge of Non-lexical Speech Sounds"
    "Issues in the Transcription of English Conversational Grunts"

    http://www.sanpo.t.u-tokyo.ac.jp/~nigel/publicatio ns.html [u-tokyo.ac.jp]
  • by wowbagger ( 69688 ) on Thursday October 18, 2001 @09:04AM (#2446120) Homepage Journal
    Of course, many have said that the GUI is a "caveman interface" - point and grunt, err, click.

    This really strikes me as the verbal equivelent of Palm's Grafitti - if normal interactions (printing/speaking) is too hard, make a simplified interface (Grafitti/grunting) that isn't.

    I don't know, but I already learned one interface (typing) to make my computer's life easier. Why should I do all the work?
  • Asking people to use another language when dealing with machines -- especially one that's more visceral -- is just asking for trouble. Already computers are seriously affecting the ability of humans to communicate orally, by concentrating the language into short bursts used during chats we lose the particles of sentences that help establish context in speech (yes, there is a reason for "the" and "a"). Besides, here's an oppurtunity to elleviate a lot of the bad habits that make dialectic English so tough to understand for those outside the dialect: set the machines to understand one sort of English, so that everybody has to speak at least that type along with their colloquial speech. Of course, there's always the possibility for eugenic practices with this, so my proposal is this: teach the computer the differences between the 8 vowel sounds used by people in Colorado, where pretty much every vowel approaches the schwa (the schwa being the neutral position for the human vocal system and therefore easiest to pronounce). After a while, people will realise that to be successful at using voice activated systems, they'll need to adjust their inflection, and after a while will adjust it automatically when dealing with people who don't understand them, either.

    But voice activated systems are stupid, anyway...speech is one of the slowest forms of human interaction, and is one of the few we have to actively concentrate on to perform. You know when people say, "Think before you speak?" That's because once you start speaking a large portion of your brain activity is devoted to doing so...it actually becomes harder to think about what to say next. Pressing a button or turning a dial takes practically no thought...which is another reason why a speech written in spontaneous draft still sounds better than one that is spoken aloud. If we convert machines to speach recognition, we're effectively asking people to interact with them in dumber ways. And can you imagine the logic involved with processing a fairly simple statement like "This check in my hand should be processed by you and in return i'd like fifty bucks in tens and ten one dollar bills." Since the command isn't linear, the machine not only has to recognize what each word means, but try and interpret them in queue. And if humans can't construct complicated sentences like the one above -- which any human over the age of about 4 can understand, before that kids can't identify the subject and object in complex sentences -- they'll be inconvenienced by speaking machines. Oh and for a simpler example, try this: "My pin number? 376 uhhhhhh...Forty-two thirteen...aaaaaaaaaaaand...is it six? no. Eight?...oh! oh! sixty eight!" A human can understand that...we'd be annoyed, but we'd get it.
  • by glebite ( 206150 ) on Thursday October 18, 2001 @09:47AM (#2446281)

    How selective would the speech recognition be? If I was playing musing on that computer, would the computer pick up the tones coming in and start "doing stuff(tm)" on my computer? What about background noises? My friend's Jello Biafra spoken word CDs?

    I won't even go there with my Saturday Morning Cartoon CD - Eep Opp Ork Ah-Ah (This means mail all of my friends a copy of my resume)...

  • Re:Typing vs. speech (Score:3, Interesting)

    by Asic Eng ( 193332 ) on Thursday October 18, 2001 @09:59AM (#2446351)
    Any new interface requires some accomodation from the user.

    Ok, that sounds fair, but I guess you'd want to have some sort of benefit after you invest your time?

    I just don't see this sort of interface to catch on for standard applications. I mean - imagine you are in an office with 20 people grunting at their computers, the noise they make is just going to be unbearable. That's got to be worse than that annoying guy who's checking his voicemail via speaker phone. *shudder*

    From the article:

    By increasing the pitch of your voice, the scrolling speed increases. When you stop speaking, the scrolling ends.

    Can you imagine sitting next to a guy who uses this, and not have a headache after 10 mins?

  • Sheep (Score:2, Interesting)

    by Kpechtunx ( 529353 ) on Thursday October 18, 2001 @01:01PM (#2447333)
    Sound kind of like how a farmer controls a sheepdog ... - !K
  • Undo (Score:3, Interesting)

    by Fjord ( 99230 ) on Thursday October 18, 2001 @02:11PM (#2447744) Homepage Journal
    Great. I'm almost finished my ultra-long /. post and someone ICQs me.

    "Uh oh"

    On another note, I knew a guy who worked with voice rec software where the delete-word command was "oops". Whenever he would watch another person typing and they would typo, he would instinctively say "oops". I'm guessing it's kind of how my writting went bad went I was using graffiti a lot. You get used to these quirky mannerisms you use to control the machines. Then you end up looking like a dork and annoying the people around you
  • Re:It's cute, but... (Score:2, Interesting)

    by dollargonzo ( 519030 ) on Thursday October 18, 2001 @04:28PM (#2448717) Homepage
    well, yuo are actually not quite correct on the consonant thing. ever try doing an FFT on some sound, and keeping only the major frequencies? we humans hear consonants, but for example p and b are essentially the same thing. and in the case of say, an S, its sound like noice to the computer, making it harder to distinguish than when an AAA makes one distinct frequency. So, although yuo are correct in saying that a smaller vocabulary would help, not as much as removing cononants.

  • Re:It's cute, but... (Score:3, Interesting)

    by plastik55 ( 218435 ) on Thursday October 18, 2001 @11:28PM (#2450281) Homepage
    FFT is exactly the wrong technique for resolving transient or plosive sounds. Wavelets work better. Take the CWT of a person speaking, and you can *see* the shape of all the consonants.

    When people speak, it is the consonants that matter. Ever try listening closely to someone with a pronounced regional accent? The vowels are all jumbled up but the speech is still intelligible. IIRC, people tried to teach gorillas to communicate using different grunts, and gave up in favor of sign language. Reason being that you *can't* string two different vowels together without a consonant in between and have it be intelligible.

Understanding is always the understanding of a smaller problem in relation to a bigger problem. -- P.D. Ouspensky

Working...