Speech Recognition

  • Hello everybody, I'm trying to create an application Speech-to-text with Qt

    Unfortunately I can not find anything on the internet about Speech-to-text, only about Text-to-speech, but that's not what I want.

    I found this library for Qt, but I think it does work as a Speech-to-text, just like Text-to-speech.


    Does anyone know where I can find a library written in C / C + + to do that? Or do this using Qt-Speech?


  • Moderators

    There are a number of items tagged with "speech recognition" here on Devnet which might help. http://qt-project.org/search/tag/speech~recognition

  • [quote author="mlong" date="1349932294"]There are a number of items tagged with "speech recognition" here on Devnet which might help. http://qt-project.org/search/tag/speech~recognition[/quote]

    That did not help me much.

    The question remains, How to make a speech-to-text with Qt? Is there any library written in C/C++?

  • Moderators

    There are no Qt-specific ways to do it. Various external libraries exist (some are mentioned in the links above)

  • Yes, of course, but the question is not that. I know there are several libraries for voice recognition, but has no tutorial, or anyone that shows how to build a speech-to-text, not a text-to-speech.
    Actually there are several libraries, but I do not know which has support for speech-to-text.

  • Moderators

    Try CMUSphinx: http://cmusphinx.sourceforge.net/wiki/ They have an open-source C library, their documentation looks pretty good, and they have a Hello World speech recognition tutorial.

    Keep in mind that you need to create (or download) a language model and an acoustic model. They only provide an acoustic model for American voices, which may produce inaccurate text for non-American speakers. You'll probably also need an understanding of statistics and linguistics.

  • I wonder if all Speech Recognition libraries supports Speech-to-text.

  • Moderators

    That's the definition of "Speech Recognition": Converting spoken words into text. :)

  • I found one that supports several languages, a library seems to be good for us to discuss, what do you think?



    en-us American English

    en-sc English with a Scottish accent.

    af Afrikaans

    bs Bosnian

    ca Catalan

    cs Czech

    de German

    el Greek

    eo Esperanto

    es Spanish

    es-la Spanish - Latin America

    fr French

    hr Croatian

    hu Hungarian

    it Italian

    kn Kannada

    ku Kurdish

    lv Latvian

    nl Dutch

    pl Polish

    pt Portuguese (Brazil)

    pt-pt Portuguese (European)

    ro Romanian

    sk Slovak

    sr Serbian

    sv Swedish

    sw Swahihi

    ta Tamil

    tr Turkish

    zh Mandarin Chinese

    cy Welsh

    grc Ancient Greek

    hi Hindi

    hy Armenian

    id Indonesian

    is Icelandic

    jbo Lojban

    ka Georgian

    la Latin

    mk Macedonian

    no Norwegian

    ru Russian

    sq Albanian

    vi Vietnamese

    zh-yue Cantonese Chinese

  • Moderators

    eSpeak is a Speech SYNTHESIS program.

    You wanted a Speech RECOGNITION library, right?

    Speech Synthesis = Text-to-Speech
    Speech Recognition = Speech-to-Text

  • Is there any Speech Recognition library that supports those same languages that eSpeak support? Or with a good amount of languages​​?


  • Moderators

    Which language(s) do you want?

    Google can help you

  • I've looked up to almost the last page of Google and find no library that supports a variety of languages​​.

    I want almost all languages​​, if possible. (Like eSpeak, that supports many languages)

  • Moderators

    [quote author="l3e0wulf" date="1350170935"]I want almost all languages​​, if possible. (Like eSpeak, that supports many languages)[/quote]That's just not possible. Even though CMUSphinx supports base models for several languages, many developers still choose to "train" their software with their own audio recordings, to improve accuracy.

    Understand that speech recognition is extremely complex. You need to support more than just the vocabulary: You also need to support the speaking style. Think of England, Scotland, USA, Australia, New Zealand, South Africa, Brazil, Germany, Singapore, India -- people in these countries can speak English, but they sound very different. Speech recognition software designed for one country will be very inaccurate in another country. And that's only for one language!

    So, when you develop speech recognition software, you need to "train" it to match your users. I think it's too expensive and time-consuming for library-makers to support a large variety of languages and styles -- their priority is to spend time on designing algorithms, not on collecting speech recordings.

    What are you plans for your software, and which users do you want to target?

  • Can some one please let me know whether QT5 has support for speech command ?


  • Hello there,

    I've been looking lately for a complete toolkit for developing a research project which we have to use "Speech Recognition" and "Speech Synthesis". The main library and API should have "Speech Recognition" in C++ with an integrated "acoustic model", "linguist model" and of course with a "decoder".

    During this project I'll have to use both techniques, "Speech recognition" and "Speech Synthesis". Both in* Portuguese (Brazilian)* and English.

    I plan using the native Qt library for Speech Synthesis:
    "Qt TTS":https://gitorious.org/qt-speech/qt-speech/source/6ddec1ee6dbcaa4ca74a625bdc87c8ace08bb045:

    But I know the Speech recognition process is way harder, I tried using CMU sphinx ("CMU Sphinx":http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx) but didn't find examples and concrete tutorial for C++ developing. Also tried VOCE ("VOCE Sourceforge":http://voce.sourceforge.net/) but it can't be handled for portuguese.

    I'm an open mind guy, if you are developing something like this please contact me so we can share some stuffs (vitorsgobbi@hotmail.com)

  • Hi,

    After googling a lot and studying about "speech recognition" I realized CMU Sphinx is the best option for me.

    For those who needs to train an acoustic model here is the tutorial.
    However, it is complex and requires lots of work and time though.

    "Acoustic model":http://cmusphinx.sourceforge.net/wiki/tutorialam

    This is a hello world example in .C

    "Hello world sample":http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx

    And this is for those that are Brazilians developing their own application in speech recognition like me. You can find a Portuguese acoustic model and the language model here:

    "LAPS UFPA":http://www.laps.ufpa.br/falabrasil/downloads.php

Log in to reply

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.