Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct
Voice Recognition Implementation
I wanted to know is it possible to implement voice recognition using qt and what is the procedure for the same.
thanks in advance
QtAndrew last edited by
@Naveen_D you must use an external voice recognizer, or write it by your own (i think thats not possible...)
as @Naveen_D says you will need an external lib and which one also depends on what platforms you want to support?
For Text to voice, Qt has
@QtAndrew ok i will use external library such as pocketsphinx.
Using this how i can do. Since em new to this em asking...if small example is available that would help me alot.
for any external lib , one should go and look at build info for the platform that is wanted.
Lets see for windows:
Seems to use on windows, you must have V Studio 2010 or newer installed for this lib.
So for it to work for you , you should have Qt for VS version installed. exact version. (2010,2012,2013,2015)
Then after you made the LIB to build. That is, it produces .DLL or .LIB files
Then you will include in your own Qt project
Check all paths in the .pro file. If it gives errors, check forum. plenty of posts about using a lib/dll.
Now you are ready to try use the functions it gives. :)
So a key thing to understand is that the external lib and the Qt version your are using - should been made by same compiler.
So if you are using mingw compiler, using a Visual Studio DLL won't work and reverse.
@mrjj Thanks alot.
If you mean the pocketsphinx-master.zip , its the source code.
On windows, you will use the pocketsphinx.sln to compile.
Then you get DLLS etc. ( the resulting library)
There seems to be no precompiled binaries so its up to you to make them.
Note. you must compile both on windows and linux. You cannot
use libs from windows in linux and reverse.
@mrjj that means i should run the complete source code in qt creator and build the binary and use the required libraries ???
@mrjj for linux also i need to use pocketsphinx.sln ?
Yes, first step is to get it to compile. Either in Creator or in Visual Studio .
That will produce DLL/LIB file.
Then you will make new project ( your project) and add this DLL/LIB to it.
- linux also i need to use pocketsphinx.sln ?
No. for linux , there is other build instructions.
$ make clean all
$ make check
$ sudo make install
SLN files are for visual studio.
There is no visual studio on linux.
- linux also i need to use pocketsphinx.sln ?
@mrjj Okay thanks...i wanted to know is it possible to develop an desktop app using this which accepts voice recognition? or we need a pos device with some microphone?
Well the pc or device
must have soundcard and a microphone but other than that,
there should be nothing stopping you to run as a Desktop app.
I used this on pc
@mrjj ya but nuance is not open source rite so em trying with pocketsphinx
Oh, no. its as commercial as it gets.
it was as example of voice Recognition on the desktop :)
The best i ever tried. It worked flawless even when multiple people speaking!
They even do allow others to use
but its not open source or gratis.
So i just mentioned it for a sample of VR that truly works :)
@mrjj Is there any algorithm for voice recognition that i can use ?
@jsulm no i will use pocketsphinx only
Just in case, there's a speech recognition branch in the QtSpeech module that's current work in progress but might be interesting for you.
my question is do I need to install CMU-SPHINX first and then pocketsphinx ? or there is any other way for installing pocketsphinx in ubuntu?
You can try this
Running pocketsphinx Speech Recognition on Ubuntu
@mrjj i have installed cmu-sphinx and pocketsphinx using the instructions given in this link http://jrmeyer.github.io/installation/2016/01/09/Installing-CMU-Sphinx-on-Ubuntu.html is it correct ?
If yes, where i will get the lib of this and how to use this lib to make use of voice recognition ?
It seems like a good tut.
When you compile you will get the lib
and this lib you will use in the real project.
@mrjj In qt creator i need to compile?
one more question, since i am new i want know what this sudo make install will do ?
Im not sure if it comes with a .pro file?
Its should be clear from build instructions what to do
on each platform.
sudo make install
That will copy the "result" (exe, .a ) to a place so its installed.
The tut you found is not Qt related so its with normal linux toolchain
He starts with
"When I installed SPHINX for the first time in September 2015, it was not a fun experience."
So be prepared to really read what he does and read docs for it. its not trivial to build.
@mrjj I am totally confused regarding this, what i need to do for pocketsphinx so that i can use it in qt for voice recognition in ubuntu?
First you get it to build then
use in Qt. its in 2 steps.
You really MUST read docs slow and carefully ,
else u miss a step and it wont work.
So first step is to get it to build using the tuts.
@mrjj what ever now i have installed pocketsphinx...can i use it in qt o not that em not clear ? anyhow its external lib rite i can use the .so file of pocketsphinx ?
@mrjj any links where i can get the tut regarding this to get it build ??
The tut you found seemed good?
So if you have build the so files you are to use them
This part mostly fails due to paths so make sure u check.
You can open .pro file then, in open file, right click and select
Add library. Fill it out and make sure!! its correct.
Linux lib, shared etc.
Then you should be able to link against it.
Same story with pocket
also MAKE 100% sure you did as it says
You must have SphinxBase, which you can download from http://cmusphinx.sourceforge.net. Download and unpack it to the same parent directory as PocketSphinx, so that the configure script and project files can find it. On Windows, you will need to rename 'sphinxbase-X.Y' (where X.Y is the SphinxBase version number) to simply 'sphinxbase' for this to work."
@mrjj i want to attach the screen shot of what files i have got after running the steps in that link http://jrmeyer.github.io/installation/2016/01/09/Installing-CMU-Sphinx-on-Ubuntu.html for both sphinxbase and pocketsphinx but i qm not able to do that...how can we attach the screen shots here ???
Use external site like postimage.org and paste link here or use
!( direct link here )
@mrjj after running this commands
$ make clean all
$ make check
$ sudo make install
for both pocketsphinx and sphinxbase, what i need to do ????
Well if all works and says no "error" of any kind,
I would try one of the existing examples
and see if it works.
Then I would start think about how to use it in Qt.
@mrjj after running the command pocketsphinx_continuous i get the following result
ERROR: "cmd_ln.c", line 682: No arguments given, available options are:
Arguments list definition:
[NAME] [DEFLT] [DESCR]
-adcdev Name of audio device to use for input.
-agc none Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
-agcthresh 2.0 Initial threshold for automatic gain control
-allphone Perform phoneme decoding with phonetic lm
-allphone_ci no Perform phoneme decoding with phonetic lm and context-independent units only
-alpha 0.97 Preemphasis parameter
-argfile Argument file giving extra arguments.
-ascale 20.0 Inverse of acoustic model scale for confidence score calculation
-aw 1 Inverse weight applied to acoustic scores.
-backtrace no Print results and backtraces to log.
-beam 1e-48 Beam width applied to every frame in Viterbi search (smaller values mean wider beam)
-bestpath yes Run bestpath (Dijkstra) search over word lattice (3rd pass)
-bestpathlw 9.5 Language model probability weight for bestpath search
-ceplen 13 Number of components in the input feature vector
-cmn live Cepstral mean normalization scheme ('live', 'batch', or 'none')
-cmninit 40,3,-1 Initial values (comma-separated) for cepstral mean when 'live' is used
-compallsen no Compute all senone scores in every frame (can be faster when there are many senones)
-debug Verbosity level for debugging messages
-dict Main pronunciation dictionary (lexicon) input file
-dictcase no Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only)
-dither no Add 1/2-bit noise
-doublebw no Use double bandwidth filters (same center freq)
-ds 1 Frame GMM computation downsampling ratio
-fdict Noise word pronunciation dictionary input file
-feat 1s_c_d_dd Feature stream type, depends on the acoustic model
-featparams File containing feature extraction parameters.
-fillprob 1e-8 Filler word transition probability
-frate 100 Frame rate
-fsg Sphinx format finite state grammar file
-fsgusealtpron yes Add alternate pronunciations to FSG
-fsgusefiller yes Insert filler words at each state.
-fwdflat yes Run forward flat-lexicon search over word lattice (2nd pass)
-fwdflatbeam 1e-64 Beam width applied to every frame in second-pass flat search
-fwdflatefwid 4 Minimum number of end frames for a word to be searched in fwdflat search
-fwdflatlw 8.5 Language model probability weight for flat lexicon (2nd pass) decoding
-fwdflatsfwin 25 Window of frames in lattice to search for successor words in fwdflat search
-fwdflatwbeam 7e-29 Beam width applied to word exits in second-pass flat search
-fwdtree yes Run forward lexicon-tree search (1st pass)
-hmm Directory containing acoustic model files.
-infile Audio file to transcribe.
-inmic no Transcribe audio from microphone.
-input_endian little Endianness of input data, big or little, ignored if NIST or MS Wav
-jsgf JSGF grammar file
-keyphrase Keyphrase to spot
-kws A file with keyphrases to spot, one per line
-kws_delay 10 Delay to wait for best detection score
-kws_plp 1e-1 Phone loop probability for keyphrase spotting
-kws_threshold 1 Threshold for p(hyp)/p(alternatives) ratio
-latsize 5000 Initial backpointer table size
-lda File containing transformation matrix to be applied to features (single-stream features only)
-ldadim 0 Dimensionality of output of feature transformation (0 to use entire matrix)
-lifter 0 Length of sin-curve for liftering, or 0 for no liftering.
-lm Word trigram language model input file
-lmctl Specify a set of language model
-lmname Which language model in -lmctl to use by default
-logbase 1.0001 Base in which all log-likelihoods calculated
-logfn File to write log messages in
-logspec no Write out logspectral files instead of cepstra
-lowerf 133.33334 Lower edge of filters
-lpbeam 1e-40 Beam width applied to last phone in words
-lponlybeam 7e-29 Beam width applied to last phone in single-phone words
-lw 6.5 Language model probability weight
-maxhmmpf 30000 Maximum number of active HMMs to maintain at each frame (or -1 for no pruning)
-maxwpf -1 Maximum number of distinct word exits at each frame (or -1 for no pruning)
-mdef Model definition input file
-mean Mixture gaussian means input file
-mfclogdir Directory to log feature files to
-min_endfr 0 Nodes ignored in lattice construction if they persist for fewer than N frames
-mixw Senone mixture weights input file (uncompressed)
-mixwfloor 0.0000001 Senone mixture weights floor (applied to data from -mixw file)
-mllr MLLR transformation to apply to means and variances
-mmap yes Use memory-mapped I/O (if possible) for model files
-ncep 13 Number of cep coefficients
-nfft 512 Size of FFT
-nfilt 40 Number of filter banks
-nwpen 1.0 New word transition penalty
-pbeam 1e-48 Beam width applied to phone transitions
-pip 1.0 Phone insertion penalty
-pl_beam 1e-10 Beam width applied to phone loop search for lookahead
-pl_pbeam 1e-10 Beam width applied to phone loop transitions for lookahead
-pl_pip 1.0 Phone insertion penalty for phone loop
-pl_weight 3.0 Weight for phoneme lookahead penalties
-pl_window 5 Phoneme lookahead window size, in frames
-rawlogdir Directory to log raw audio files to
-remove_dc no Remove DC offset from each frame
-remove_noise yes Remove noise with spectral subtraction in mel-energies
-remove_silence yes Enables VAD, removes silence frames from processing
-round_filters yes Round mel filter frequencies to DFT points
-samprate 16000 Sampling rate
-seed -1 Seed for random number generator; if less than zero, pick our own
-sendump Senone dump (compressed mixture weights) input file
-senlogdir Directory to log senone score files to
-senmgau Senone to codebook mapping input file (usually not needed)
-silprob 0.005 Silence word transition probability
-smoothspec no Write out cepstral-smoothed logspectral files
-svspec Subvector specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
-time no Print word times in file transcription.
-tmat HMM state transition matrix input file
-tmatfloor 0.0001 HMM state transition probability floor (applied to -tmat file)
-topn 4 Maximum number of top Gaussians to use in scoring.
-topn_beam 0 Beam width used to determine top-N Gaussians (or a list, per-feature)
-toprule Start rule for JSGF (first public rule is default)
-transform legacy Which type of transform to use to calculate cepstra (legacy, dct, or htk)
-unit_area yes Normalize mel filters to unit area
-upperf 6855.4976 Upper edge of filters
-uw 1.0 Unigram weight
-vad_postspeech 50 Num of silence frames to keep after from speech to silence.
-vad_prespeech 20 Num of speech frames to keep before silence to speech.
-vad_startspeech 10 Num of speech frames to trigger vad from silence to speech.
-vad_threshold 2.0 Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level.
-var Mixture gaussian variances input file
-varfloor 0.0001 Mixture gaussian variance floor (applied to data from -var file)
-varnorm no Variance normalize each utterance (only if CMN == current)
-verbose no Show input filenames
-warp_params Parameters defining the warping function
-warp_type inverse_linear Warping function type (or shape)
-wbeam 7e-29 Beam width applied to word exits
-wip 0.65 Word insertion penalty
-wlen 0.025625 Hamming window length
INFO: continuous.c(295): Specify '-infile <file.wav>' to recognize from file or '-inmic yes' to recognize from microphone.
Does look like the tuts so I think its working :)
\o/ good work
@mrjj u said I would try one of the existing examples
and see if it works. where i can get existing examples ?
@mrjj thanks, how to use this existing examples in pocketsphinx ?