Google Opens Access To Its Speech Recognition API, Going Head To Head With Nuance (techcrunch.com) 46
An anonymous reader quotes a report from TechCrunch: Google is planning to compete with Nuance and other voice recognition companies head on by opening up its speech recognition API to third-party developers. To attract developers, the app will be free at launch with pricing to be introduced at a later date. The company formally announced the service today during its NEXT cloud user conference, where it also unveiled a raft of other machine learning developments and updates, most significantly a new machine learning platform. The Google Cloud Speech API, which will cover over 80 languages and will work with any application in real-time streaming or batch mode, will offer full set of APIs for applications to "see, hear and translate," Google says. It is based on the same neural network tech that powers Google's voice search in the Google app and voice typing in Google's Keyboard. Google's move will have a large impact on the industry as a whole -- and particularly on Nuance, the company long thought of as offering the best voice recognition capabilities in the business, and most certainly the biggest offering such services.
Nuance the Biggest (Score:4, Interesting)
Re: (Score:2)
It's not so much that Nuance is known for being the best for a long time, it's more that they've bought out all their competitors and have pretty much controlled the market.
It's mostly that they were afraid of losing market share to Alexa Voice Service, which was opened up to developers a while ago.
Local STT is the optimum end game (Score:2)
The world needs high quality STT that works when the net is down and isn't vulnerable to arbitrary changes in API, availability, and legal impediments.
It's clearly one of the harder software problems, but I expect it to be solved in fairly short order; years, not decades.
Re: (Score:2)
Not to mention, security issues (e.g. of the "sending all your private speech to the NSA" variety).
Re: (Score:2)
Likely your phone would be doing that anyway -- if the NSA cared even in the slightest about you in particular. They're doing it on every phone call anyway. Government is long out of control on privacy issues. Then there's the "smart TV" issue...
Orwell was an optimist [fyngyrz.com]
Re: (Score:3, Interesting)
we work in transcription business. that is exactly what nuance did, and do, especially the medical transcription segment.
american-based, native english speaking transcriptionists are essentially just training nuance's computers to do the transciptionists' jobs. once the voice recognition accuracy hits a certain mark, they outsource to india or some other piss-poor country with lower wages and more favorable-to-them contract and labor laws, the editing of their now trained and automated output
and we do that
Re:Nuance the Biggest (Score:4)
1) Transcription doesn't require the level of skill that practising medicine does, but it's skilled work and there is a lot more to it than typing.
2) It's one thing to be replaced by a computer that genuinely replaces the work you do. It's another to lose your livelihood or have your income reduced by software that is terrible at the work. People using transcription software generally are getting less value for their money even though they might be paying less for the first draft, while the talents of transcriptionists who want the work are under-used.
Re:Nuance the Biggest (Score:4, Interesting)
The nerds at Ma Bell used to provide very high quality telephony; they were shocked and appalled when the market chose low quality low cost telephony. The medical transcription market has gone through the same change..
The documents, especially the ones used clinically, can suffer from lower quality of ASR and/or offshoring.. Also, in the old days, light editing was usually part of the process. This happens less in today's price obsessed market and sadly results in less readable reports.
On the other hand, today it's possible to get turn around times of 0 with document issues identified in real time by NLP. That is a really big improvement. (I don't know if Nuance has that, but if they don't, they will soon)
Unrealistic expectations (Score:3)
Any idea you might have that the market will do what you think is optimum is based upon a complete misunderstanding of markets.
Markets often choose inferior performance options. High quality solutions often fail to gain, or keep, traction. No undertaking that doesn't have significant lobbying impact (which of course means high $) with the relevant legislature can reasonably expect its business model to be protected in the face of any particular eroding force. Once a particular solution to a problem has been
Privacy (Score:5, Interesting)
>" Google says. It is based on the same neural network tech that powers Google's voice search in the Google app and voice typing in Google's Keyboard."
Indeed. So does this mean Google will store and mine and analyze and profitize the spoken text data too?
Re: (Score:2)
The speech goes into retraining the machine. They profit from the transcribed data as well.
Re: (Score:2)
Pics, text, sound and any other environmental sensor data found networked will feed the ads
Google looks to patent tech that listens to calls to promote ads (23 March 2012)
http://www.cnet.com/news/googl... [cnet.com] "..the patent application also looks into placing onto people's computers online ads that are influenced by data from environmental sensors--such as temperature, humidity, light, and sound. "
Re: (Score:2)
does this mean Google will store and mine and analyze and profitize
No, it doesn't mean that.
Though only because Google is already doing it.
Re: (Score:2)
Re: (Score:3)
Re: (Score:2)
Homsar [hrwiki.org], is that you?
Do I have to say it? (Score:3)
To attract developers, the app will be free at launch with pricing to be introduced at a later date.
The first one's always free...
Re: (Score:2)
Re:Do I have to say it? (Score:5, Informative)
if they were to announce the future pricing now it might even be worth trying.
Keep in mind that the VR API used to be open, then they closed it, screwing anyone using it. Now they are opening it up again "for free", but it will supposedly be yanked away yet again, when/if they finally decide on the pricing. Google has a terrible record of supporting their products. You would be foolish to rely on this API if you have any alternative.
Re: (Score:3)
That's what Google does, though. Create something amazing ...
Sometimes they do create it... but more often they buy it, run with it for a while, and then shut it down.
Re: (Score:1)
I couldn't agree more. Google has established a pattern of either buying or creating something cool and then shutting it down when some new whim takes hold. They are like a little spoiled kid in a toy store. TBH, either one is annoying as hell.
2018 Headline (Score:2)
Guffaw. (Score:2)
...the app will be free at launch with pricing to be introduced at a later date.
/insert metaphor about drug dealers here
Pebble Time has been waiting for this (Score:5, Interesting)
I'm waiting to see if/how this affects Pebble Time. We've been wanting access to the Google Voice API for ages now. Personally I want it mostly for Google Now integration, which may or may not be separate.
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
The current approach seems to be to do speech recognition in the cloud
There's a reason for this. They use a neural network and an absolutely massive dataset. They seeded this data set with GOOG-411 a few years before Google Now came out. Microsoft did the same thing with BING-411 when GOOG-411 shut down and now we have Cortana.