Realtime Audio Playback in QT: QAudioOutput seems to wait unnecessarily

Mr.FreshDachs

Hello QT-Community,

just in advance, this is my first time posting here, so I am really sorry for the wall of text and code that is following, because I don't really know which parts are important and which i should have left out...

I am working on a realtime audio project in which I read audio data from the microphone with the QAudioInput class, process it and play it with the QAudioOutput class.
I implemented my own QIODevice and its read- and write-functions. To mimimize the latency I set the input and output buffer sizes as low as audio playback still works (3000 Bytes for now, could be still lower I think) and set the QIODevice property to "Unbuffered". So far it's working really well.

The problem is, when I start the Input and Output simultaneously, the Output Device reads a still empty buffer, which still works, if I catch that special case in the read function of my QIODevice, but it doesn't seem to me as a good solution overall.
So I waited till the AudioInput collected as many data as the size of the QAudioOut Buffer Size and then started the AudioOutput. My expectation was, that as soon as I start the QAudioOutput, it will fill its internal buffer. But here are the strange things to occur: Wenn I start the QAudioOutput, it still waits like 5 to 9 cycles of AudioInput till it begins to reads data, even if there's already enough data available! Interestingly, if I start both simultaneously, the QAudioOutput immediatly begins to read 3000 (not existing) bytes. This behaviour seems quite illogicaly to me.
Here is some code of my implementations:

Audio Format

    QAudioFormat format;
    QAudioDeviceInfo deviceInfo_in = QAudioDeviceInfo::defaultInputDevice();
    QAudioDeviceInfo deviceInfo_out = QAudioDeviceInfo::defaultOutputDevice();
    format.setSampleRate(48000);
    format.setChannelCount(1);
    format.setSampleSize(16);
    format.setSampleType(QAudioFormat::SignedInt);
    format.setByteOrder(QAudioFormat::LittleEndian);
    format.setCodec("audio/pcm");
    
    format_in = format;
    format_out = format;

    audiobuffer.resize(16*3000);    // buffer to store data (QByteArray)

Audio Devices: I pass the pointer to the audiobuffer and the position 'write_pos' to the constructor of the QIODevice (called "AudioDevice"). write_pos is the position in audiobuffer, where the next chunk of date has to be written (or read)

    audiodevice_in = new AudioDevice(this,format_in,0,&audiobuffer);
    audioinput = new QAudioInput(deviceInfo_in,format_in,this);
    audioinput->setBufferSize(2000);
    audiodevice_in->open(QIODevice::WriteOnly | QIODevice::Unbuffered);

    audiodevice_out = new AudioDevice(this,format_out,audiodevice_in->write_pos,&audiobuffer);
    audiooutput = new QAudioOutput(deviceInfo_out,format_out,this);
    audiooutput->setBufferSize(3000);
    audiodevice_out->open(QIODevice::ReadOnly | QIODevice::Unbuffered);

    audioinput->start(audiodevice_in);
    qWarning() << "Input Period Size =" << audioinput->periodSize();

my AudioDevice.cpp:

AudioDevice::AudioDevice(QObject* parent,QAudioFormat form,qint64 writepos,QByteArray* opb) : QIODevice(parent)
{
    audiobuffer = opb;
    format = form;
    prefill = 8;
    x = 1;
    channel1.resize(1920);
    write_pos = writepos;
}

qint64 AudioDevice::writeData(const char* data, qint64 len) 
{                                                          
      Q_ASSERT(format.sampleSize() % 8 == 0);
    const int channelBytes = format.sampleSize() / 8;
    const int sampleBytes = format.channelCount() * channelBytes;
    Q_ASSERT(len % sampleBytes == 0);
    const int numSamples = len / sampleBytes;
    const unsigned char *ptr = reinterpret_cast<const unsigned char *>(data);
    // pointer "data" points to the data which has to be written

    for (int samp=0; samp<numSamples; samp++)
    {
        for (int chan=0; chan<format.channelCount(); chan++)
        {
            qint16 value = qFromLittleEndian<qint16>(ptr);
            if (value>32760) value = 32760;
            if (value<-32760) value = -32760;
            channel1.replace(samp,value);// for channelCount>1 more have to be added
            ptr += channelBytes;
        }
    }
    // result: QVector "channel1" is filled with samples, now I could do my audio processing

    //------------write data in audiobuffer----------------    
    signed char* ptr2 = reinterpret_cast<signed char *>(audiobuffer->data()+write_pos);
   // ptr2 points to audiobuffer+data (where next data has to be written)
    for (int samp=0; samp<channel1.length(); samp++)
    {
        for (int chan=0; chan<format.channelCount(); chan++)
        {
            qToLittleEndian<qint16>(channel1.value(samp), ptr2);
            ptr2 += channelBytes;
        }
    }
// result: chunk of data is added at position write_pos in audiobuffer

    write_pos += len;                 //both instances of AudioDevice have their own
    emit writeposchanged(write_pos);  //'write_pos' -> value is "shared" with signals and slots

    qWarning() << "Successfully written" << len << "Bytes, write_pos =" << write_pos;

    if (x == prefill) emit startoutput();  // after 'prefillI' write processes startoutput() is called
    if (x < prefill + 1) x++;    // do not endlessly increment x

    return len;
}

qint64 AudioDevice::readData(char *data, qint64 len) 
{                          // *data = pointer to where the data has to be written
    qWarning() << "Reading Data";

    qint64 total = 0;
    qint64 m_pos = 0;

    if (!audiobuffer->isEmpty()) 
    {
        while (len - total > 0)
        {
            const qint64 chunk = qMin((audiobuffer->length() - m_pos), len - total);
            memcpy(data + total, audiobuffer->constData() + m_pos, chunk);
            m_pos = (m_pos + chunk);
            total += chunk;
        }
        // result: all bytes have been written: total==len
    }
    //Now I rotate all bytes in audiobuffer 'total' places to the left
    for (int i=0; i<(audiobuffer->size()-total); i++)
        memcpy(audiobuffer->data()+i,audiobuffer->constData()+i+total,1);
  
    write_pos -= total;
    if (write_pos < 0) {qWarning() << "Buffer Underrun occured"; write_pos = 0;}
    emit writeposchanged(write_pos);
   
    qWarning() << "Successfully read" << total << "Bytes, write_pos =" << write_pos;
    return total;
}

As you may have read from the code, I read always from the very beginning of audiobuffer and afterwards shift the bytes to the left, so the next-to-be-read Bytes are at the beginning again. "write_pos" is saving the position, where the Input has to write the next chunk of data. The value of write_pos is shared between the two instances of AudioDevice between signals and slots (yes i connected them, i just didn't post these lines here)

At last, after 'prefill' times of writing input data, the slot on_startoutput() is called in the main function, which just starts the audiooutput:

void AudioProcessing::on_startoutput()
{
    audiooutput->start(audiodevice_out);
    qWarning() << "Output Period Size:" << audiooutput->periodSize();
    qWarning() << "Output Status:" << audiooutput->state();
}

I would imagine, that the Output now reads 3000 Bytes of data (its buffer size) from audiobuffer (which is already available because of the prefill mechanism), BUT the console prints the following:

Input Period Size = 400
Successfully written 400 Bytes, write_pos = 400
Successfully written 400 Bytes, write_pos = 800
Successfully written 400 Bytes, write_pos = 1200
Successfully written 400 Bytes, write_pos = 1600
Successfully written 400 Bytes, write_pos = 2000
Successfully written 400 Bytes, write_pos = 2400
Successfully written 400 Bytes, write_pos = 2800
Successfully written 400 Bytes, write_pos = 3200
Output Period Size: 600
Output Status: ActiveState  *Why don't you read data?!?!*
Successfully written 400 Bytes, write_pos = 3600
Successfully written 400 Bytes, write_pos = 4000
Successfully written 400 Bytes, write_pos = 4400
Successfully written 400 Bytes, write_pos = 4800
Successfully written 400 Bytes, write_pos = 5200
Successfully written 400 Bytes, write_pos = 5600
Successfully written 400 Bytes, write_pos = 6000
Successfully written 400 Bytes, write_pos = 6400
Successfully written 400 Bytes, write_pos = 6800
Reading Data
Successfully read 3000 Bytes, write_pos = 3800
Successfully written 400 Bytes, write_pos = 4200
Successfully written 400 Bytes, write_pos = 4600
Successfully written 400 Bytes, write_pos = 5000
Successfully written 400 Bytes, write_pos = 5400
Successfully written 400 Bytes, write_pos = 5800
Successfully written 400 Bytes, write_pos = 6200
Successfully written 400 Bytes, write_pos = 6600
Successfully written 400 Bytes, write_pos = 7000
Successfully written 400 Bytes, write_pos = 7400
Reading Data
Successfully read 600 Bytes, write_pos = 6800
Successfully written 400 Bytes, write_pos = 7200
Successfully written 400 Bytes, write_pos = 7600
Successfully written 400 Bytes, write_pos = 8000
Reading Data
Successfully read 1200 Bytes, write_pos = 6800
Successfully written 400 Bytes, write_pos = 7200
Successfully written 400 Bytes, write_pos = 7600
Reading Data
Successfully read 600 Bytes, write_pos = 7000
Successfully written 400 Bytes, write_pos = 7400
Successfully written 400 Bytes, write_pos = 7800
Reading Data
Successfully read 1200 Bytes, write_pos = 6600
Successfully written 400 Bytes, write_pos = 7000

As you can see, the Output starts after 400 bytes were written prefill=8 times (I choose 8 so that there are 3000 Bytes available at the start of the output), but the output still waits 9! times, till it reads for the first time.
It's exactly this behaviour which ruins the latency.
The value of write_pos is a good measure for the latency, because it represents the shift between where the current audio is written and where it is read (at write_pos=0 / beginning of array). With ~7000 Bytes = 3500 Samples it's about 73ms at 48kHz. Thats more than enough for a noticable annoying delay for realtime audio playback.

Don't get me wrong, I am perfectly aware, that QT is not the best tool for realtime audio and that it was not designed with that in mind, but this seems to me like a such a simple fix! Just tell the Output to immediately start reading data when it's started and voila! write_pos will we at around 3000 Bytes, which translates to 31 ms, which would be perfectly fine for my application.

I know, I know, 70ms or 30ms, it's kind of unnecessary trouble I bring myself (and you guys) into, but it just bothers me, because it seems so trivial.

Maybe you guys could help me with this one, I would be really grateful.
Feel free to critisize my overall concept of handling the data. I'm really new to QT and C++ in general. I'm really really happy, that my code works at all at this point.

Thanks for your time,
Kind Regards.

SGaist

Hi,

That's something you should look into at the backend level. Depending on the OS you are on, it might not be possible.

Since you want real-time, you should consider PortAudio which is likely better suited.

maucher

Hi,
yes, the output buffer cycle depends on your backend. And in every audio backend QT sets the internal buffer size for audio output, which is set to the preferred output Buffersize of the actually used Audiosystem (for example: core audio,alsa etc). QT5 and QT6 is checking this size, and ignores buffersizes beyond this value. So you can't set buffersize with audiosink:setbuffersize that are smaller than for example 8192 bytes in iOS, which means QT audio applications for IOS can't have audiolatency lesser than about 30 - 40 ms. Many audio app developers can't accept this, because iOS can realize latencies of 3 ms. In QT6 the audiobackends are private. So there is no possibility to optimize the audio plugin. I did this in QT5, so i was able optimize latency for all audiobackends(Windows,Unix,macOs,Ios,Android). I changed setbuffersize in this way, that very low buffersizes are accepted by all audiobackends. The result is that minimal internal audio output cycles are possible. This means that qaudiosink can realize realtime audio output.