QtConcurrent vs QThread CPU Usage



  • Hey everyone,

    I've been using QtConcurrent::run to implement a live video playback / processing application with a typical thread implemented this way:

    QFutureWatcher<void>* inProcessWatcher = new QFutureWatcher<void>;
    
        QFuture<void> inProcess = QtConcurrent::run(this, &Mixer::processInput,vid);
    
        QObject::connect(inProcessWatcher, &QFutureWatcher<void>::finished, this, [=](){inputsProcessed(vid);delete inProcessWatcher;});
    
        inProcessWatcher->setFuture(inProcess);
    

    Everything seems to be running normally, I have 5 of these running constantly but not in parallel. What I mean is that each of my two video inputs has a thread which just gets an input frame and calls a next thread to process it so a maximum of one thread running per video.

    There is an output stage thread which constantly checks if there are processed frames in the buffer of each video so it can do some more stuff with them and send them out to be displayed. So overall there are 5 threads but only 3 running at the same time / parallel. My CPU usage is through the roof (~40-60% of an intel i7) and its threads are contemplating a strike.

    Could it be that the implementation above (a few short-lived threads every cycle) itself is more costly than having a couple of worker QThreads that live longer and perform the same function? Or is this related to the priority of a QtConcurrent thread not being set the same way as a QThread?

    Cheers!


  • Lifetime Qt Champion

    Hi,

    How do you know that you only have 5 of these ?

    For a video with a classic 25 frames per second frame rate, I would expect to have 50 of them created during a second since you have two videos.

    Unless they can do their processing below the duration of a frame, I wouldn't be surprised about your result.



  • @SGaist Each cycle or frame would have 5 created but the number you mentioned sounds right adjusted for the framerate. For 30fps it would be 150 of them but still there would be 3 threads for each frame since frame reading / processing for each video has 2 threads working in series and output processingh has its own thread.

    So the large number of threads created every second is what drives CPU usage so high? If that's the case it would make sense to use the worker QThread model and have 3 of them start to finish.


  • Lifetime Qt Champion

    I don't know how exactly your pipeline works so I can't really comment on that.

    Did you try to use a profiler to see what is happening where in your application ?

    Did you try to measure the performance of the methods you apply to video frame processing ?



  • @SGaist I've some benchmarking using std::chrono to see how much time each section of the process takes but I'm not familiar with profiling tools. Any tools you would recommend? I came across this page:

    https://doc.qt.io/qtcreator/creator-cache-profiler.html



  • Maybe you need to provide more detailed content and data.....





  • Thanks guys I'll check out the cache profile. To give you a bit more detail about the overall process:

    void Mixer::mix(unsigned int vid)
    {
        videoProcessStart[vid] = std::chrono::high_resolution_clock::now();
    
        QFutureWatcher<void>* inProcessWatcher = new QFutureWatcher<void>;
    
        QFuture<void> inProcess = QtConcurrent::run(this, &Mixer::processInput,vid);
    
        QObject::connect(inProcessWatcher, &QFutureWatcher<void>::finished, this, [=](){videoEffectsFinished(vid);delete inProcessWatcher;});
    
        inProcessWatcher->setFuture(inProcess);
    }
    
    void Mixer::processInput(unsigned int vid)
    {
        videoFrameRead[vid] = readInput(vid);
        addVideoEffectsMultiChannel(vid);
    }
    
    void Mixer::videoEffectsFinished(unsigned int vid)
    {
        mix(vid);
    }
    

    In Mixer::addVideoEffectsMultiChannel, I process each frame using OpenCV functions (main operations here are cv::split, cv::addWeighted and cv::merge to allow for RGB processing of a 3-channel cv::Mat). This function was time-consuming if called in series for both frames so I decided to split the processing into two parallel threads.

    After each frame is processed its pushed onto a "buffer" (an std::dequecv::Mat) so the output stage doesn't need to wait for both frames to be finished. The output stage functions essentially create a constantly spinning thread the same way above that takes the oldest frame from each buffer and mixes them using cv::addWeighted and emits its data array to be displayed.

    So I have 3 threads running in parallel at this point. I've confirmed that each one is unique by comparing the IDs of the threads and also the priority of each thread does not affect CPU usage.

    At this point I think all the per-element OpenCV operations are causing the high load since if you have let's say two frames of 1920x1080 being processed constantly, that would be pretty CPU-heavy. If anyone has any tips / ideas I would definitely appreciate them.

    Cheers!


  • Lifetime Qt Champion

    Did you consider moving the image processing stuff to the GPU ?



  • @SGaist Yes, I think that's most likely the route I have to take. I'm building OpenCV with Cuda today. I think I totally took the wrong approach from the start. I'll let you know how it turns out.



  • I CUDA not build it unfortunately. CUDA is only supported in Visual Studio for Windows so I guess its back to good old uncle GL ...