Speech to Text in c++

gfxx

there are some library for speech to text (speech recognition) that it is easy to integrate into a qt application and that it is also well-functioning? QtSpeech in fact from what I understand is just text to speech ...

If anyone know a good library for c++ I would love to report it.

Thanks
Giorgio

SGaist

Hi,

Do you mean an equivalent of Dragon NaturallySpeaking ?

VRonin

That's very complex stuff and normally it is not run locally (it would be too demanding). All the major players in the space (bar from Apple's siri) have APIs that you can use

Google's is probably the easiest to use, works with REST so stateless request/response easily manageable by QNetworkAccessManager+QJson*+QAudioInput
Amazon's comes with a C++ specific API
Microsoft's is still in free beta

gfxx

@SGaist said in Speech to Text in c++:

Dragon NaturallySpeaking

no an equivalent of android voice recognition SDK ... I have play around it and it work ok ... but actually is quite impossible to use a custom hotworld ... so instead develop in android java with AS my app i try for other solution with c++ QT.

@VRonin
Actually google now work offline so my app too .... but is not a nice things see the people that call "ok google" or "ok you go" or "ok hugo" everitime he need to use my custom voice command!
I think using Voce or directly Sphinx4/5 is possible to have the similar result ... but other user/developper have not these opinion about. So I can not decide which way to take .... and I ask some opinion in these post.
That you know Amazon's as work fine offline? I need only simply world as: start, stop, pause, alarm, give me the report .... no other ... obviusly a nice custom hotworld ... until February 2017 with google now you could use "hey jarvis" ... no real professionals but better than "ok google".

The targhet device is linux or android with good 4core or 8core cpu/gpu ...

regards
giorgio

VRonin

I don't know about the offline versions, I guess they work using the proprietary software included in the OS (jarvis, alexa, cortana or siri). Regrading the wake up word however, using Google's API you can just listen to everything, send a POST https://speech.googleapis.com/v1/speech:recognize to get the text of what it was said and only process it if your chosen keyword was in the front (don't know if 100% ethical as you are sending a lot of recordings to google and it's not great for privacy). for limited-resources devices (mobile phones) probably using the native OS API is the best way to go

gfxx

@VRonin said in Speech to Text in c++:

(don't know if 100% ethical as you are sending a lot of recordings to google and it's not great for privacy)

Really you can start the recognition by a gui button with some advice about privacy, and stop it with the same gui button, so the mic is not everytime online. Any how my app work offline (no bluethoot, no wifi, no dataconnection) so is quite difficult for google server receive the user voice file. Any how talk about you open a new ideas, find and delete that file every 30sec ... so if the phones or tablet become online a major part of that file is not so simply discover from google server.

Any how, if you suggest my to perform a continuos recognition, google api stop these after some time (60 sec) and it is not possible perform a 6 hour continuos recognition for example without some resources expensive workaround .... yes you can press the "start recognition button" everytime you need, but is not a good solution for a free hands app. An other solution is start the app with shake gesture ... but it is not so precise ....

Any how thanks for the suggest.

regards
Giorgio