Slow application with low CPU usage

Linhares

I'm working on a program that processes biological signal with an input rate of 256 samples per second.
I've created several classes of objects which include graphs, oscilloscopes and other objects that have a graphic interface., whereas other objects have no UI. Now I'm making connections between the objects to creating recording designs.
Some of my designs run smoothly, but others get glitchy and make the program freeze.
It looks like the glitches happen in designs that have more objects and more connections. For example: a design that has 30 objects and 45 connections runs fine, but one that has 55 objects and 100 connections gets very slow.
However, in either case the CPU usage is about 20%.

I have a processor that is good enough, with 8 cores.

I've already used other programs that have the same objects with the same functionality as my application, but they are usually able to handle hundreds (or even thousands) of objects with no problem.

Here are some questions:

Is it possible that Qt is so heavy that it just can't handle a large number of objects? I don't think so.
What would explain the low CPU usage, even when the program is clearly demanding more processing ?
All my classes include a subclass that runs on a separate thread. This way, an object receives the signal as an input and sends it to the threaded processing object. The processing object then send the output to the overall class and this result is emited as a signal that can be connected to other objects on the design. Am I doing it the right way?

Pl45m4

@Linhares said in Slow application with low CPU usage:

Is it possible that Qt is so heavy that it just can't handle a large number of objects? I don't think so.

Unless you are talking about hundred of thousands objects that need to be re-drawn in a short intervals, no.

For example: a design that has 30 objects and 45 connections runs fine, but one that has 55 objects and 100 connections gets very slow.

What is a "design"? A QWidget?!
What classes are you using? QGraphicsView - Framework?

What would explain the low CPU usage, even when the program is clearly demanding more processing ?

CPU not at its limit, could be a code/design/structure issue.

All my classes include a subclass that runs on a separate thread. This way, an object receives the signal as an input and sends it to the threaded processing object. The processing object then send the output to the overall class and this result is emited as a signal that can be connected to other objects on the design. Am I doing it the right way?

ALL classes?! We don't know about the actual code, but I don't think every class of your program needs its own thread.

Linhares

@Pl45m4 said in Slow application with low CPU usage:

What is a "design"? A QWidget?!
What classes are you using? QGraphicsView - Framework?

Sorry that I haven't explained. In signal processing, a design means a bunch of objects connected among themselves.
In my application, the design is a QWidget. The objects that have a graphic interface are QWidget's as well. Other objects are QObject's.
I'm not using QGraphicsView (sould I be using that?)

@Pl45m4 said in Slow application with low CPU usage:

CPU not at its limit, could be a code/design/structure issue.

I wonder what could be causing that.

@Pl45m4 said in Slow application with low CPU usage:

ALL classes?! We don't know about the actual code, but I don't think every class of your program needs its own thread.

I have 15 different classes that use the structure I described. Some examples of objects are:

With UI: oscilloscope, trend graph, bar graph, numeric display. The object thread processes data and sends a signal to update the UI from time to time.
No UI: filter, expression evaluator, time transform.

Since the input rate is high, I decided to have each of these objects working on its own thread so the main thread doesn't get clogged.

SimonSchroeder

@Linhares You can only call UI functions from the main thread. Otherwise your program might crash. Are you drawing the oscilloscope, trend graph, etc. yourself or are you using a library?

From my experience, once the graphs have a lot of data to display it takes a long time to draw everything. I updates are requested too frequently this might stall your app.

J.Hilk

@Linhares how frequently do you emit those updates from the threads?

You may be overloading the event system with tons of signals each second!

Linhares

@SimonSchroeder and @J-Hilk

Are you drawing the oscilloscope, trend graph, etc. yourself or are you using a library?

I'm drawing the objects myself. However, update is not called very frequently, I think. The interval between updates for each object ranges from 15 to 50 milli-seconds. Is this a very low interval?

SimonSchroeder

@Linhares said in Slow application with low CPU usage:

Is this a very low interval?

It depends. If you collect more and more data over time and your x-axis is the time, you have to draw more and more pixels during every update (if you don't do anything smart). At a certain point (I don't know how many million data points) a single draw call can take more than one second. In that case 50ms is way too high frequency.

There have been several post discussing to join several datapoints for display because the resolution of your monitor will not show that much detail. This would speed up your drawing routine.

You should measure how long drawing takes and then make an informed decision if 50ms is too often.

Linhares

I've played around with some adapted versions of the objects and I found the application gets slow even if there's no GUI.

I've been reading about how signals and slots work and now I suspect the problem is that I'm calling emit too often across different threads -- thus making the connection a Qt::QueuedConnection.

For example, I've found this post that shows a huge (100x) difference in performance when direct connections are used rather than queued connections.

Also, here someone proposed some simple solutions to an issue like mine:

emit the signal not every time but only every 100th time (or so)
just open the file in a normal function call, update the gui and call QCoreApplication::processEvents() now and then
do not use a thread
use a plain callback (do not forget to synchronize access! and don't forget you must not call gui code from another thread!)

#1 doesn't apply to my program as I need to process every single piece of data that comes in.
I don't think I understand #2.
Regarding #3, I had a prior version in which I was not using threads and the processing was too slow (for example, it was taking 4 minutes to read and process a recording that was supposed to have 3 minutes).
And, if I use #4, my understanding is that the function is going to run in the main thread instead of a separate thread, and this would result in the same issue as #3.

Is there a different solution to make the connections more efficient?

JonB

@Linhares said in Slow application with low CPU usage:

use a plain callback (do not forget to synchronize access! and don't forget you must not call gui code from another thread!)

And, if I use #4, my understanding is that the function is going to run in the main thread instead of a separate thread, and this would result in the same issue as #3.

No, the opposite. If you invoke a function directly from your thread then it runs in that thread, regardless of where the function is ion code. That is why point #4 warns you not to invoke any UI code from the "callback". It won't help your situation because you do want to draw.

I would want to know where the "bottleneck" appears to be, by trying some tests.

I am unclear how many threads you are using? It may not be productive to be spawning 100 threads on an 8 core processor.
Take out the "drawing". Have the main UI thread simply count the signals received, and perhaps debug out the count. Does that keep up in real time? If not, don't send signals from threads just have them count, do they keep up? Create a shared counter protected by a mutex, have them increment that one counter, does that keep up?
I'm not sure what your "connections" are. Why do 55 objects have 100 connections? Separate connections for different graphs you want drawn?? If you have object count low but connection count high and compare against object count high but connection count low any difference? If a thread receives data does it just emit one signal or multiple ones? If the latter maybe it could emit just one cross-thread signal and have the slot in UI thread handle the multiplexing instead?
You might "buffer" the signal emission somewhere. For example although you need to collect all data to the main thread you might emit a signal from a given thread with multiple data points as parameter by "delaying" when you do the emit for a short time, thereby reducing how many cross-thread calls you are making.
Similarly maybe the UI side could buffer its UI changes, e.g. plot multiple points at a time.

And so on!

SimonSchroeder

There is a common approach in Qt, but I can't really find the right words to put into Google.

You know how QWidget::update() works? If update() is called quickly in succession Qt will consolidate the multiple calls into a single call to repaint(). Unfortunately, Qt does not provide any functionality to us to use this kind of approach for other slots as well. But, you can "easily" rebuild this approach yourself using two timers.

Basically, instead of connecting your signal directly to the slot, you put another slot and signal in between. The new slot that is connected to your original signal will use a time with a short interval, e.g. 10ms, and its timeout is connected to fire your new signal which in turn is connected to the original slot. Every time your new slot is called it will restart this timer. If you call the slot quicker than this timeout, you will delay firing this new signal. Only problem is that this might delay indefinitely. This is why you need a second timer, e.g. with an interval of 50ms. This timer is single shot and only started if it is not running (this is managed in the same slot as the other timer). On its timeout it triggers the same functionality as the other timer. This makes sure that it does not take too long to send out the signal (but only once). Typically, the two timers are called minTimer and maxTimer.

Maybe someone recognizes this concept and can provide a link.

If your slot takes arguments, it might be a little harder. We use this for a slot that takes a bool. In our case if any of the original signals set the bool to true, the new signal will also be emitted with true. In your case you might have objects to be appended to an array or something. So, you could have this slot first collect multiple objects and then provide functionality to add more than one object at a time instead of your original slot.

J.Hilk

@Linhares said in Slow application with low CPU usage:

I suspect the problem is that I'm calling emit too often across different threads

seems like my initial suggestions returns true :P

#1 doesn't apply to my program as I need to process every single piece of data that comes in.

process? yes. display ? probably not

I don't think I understand #2.

Ignore it, its bad advice to begin with.

Regarding #3, I had a prior version in which I was not using threads

good, many people jump to multi threading without testing single first.

use a plain callback (do not forget to synchronize access! and don't forget you must not call gui code from another thread!)

this is probably the solution. Qt::QueuedConnections, the default for cross thread always, ALWAYS copy the arguments and thats potentially what's costing you time/ performance.

I would suggest emitting a signal, signalling data is processed, and then a mutex guarded getter function, that allows the other thread to access the required data.

For gui, I would suggest a timer in a reasonable interval (5ms maybe?) that is calling update/repaint of your widgets. during that custom repaint function also access the mutex guarded getters to update your UI

Linhares

Thanks for your suggestions, @J-Hilk

@J-Hilk said in Slow application with low CPU usage:

I would suggest emitting a signal, signalling data is processed, and then a mutex guarded getter function, that allows the other thread to access the required data.

I'm not sure I understand what you mean. Should each object have a mutex-guarded getter function? Should the objects be in different threads? In this case, wouldn't the emitted signal get queued?
I'd appreciate it if you could provide me an example and/or some material for further reading.

Another idea that has occurred to me is that I could use two threads, one for processing and one for GUI (this one updated just from time to time). Do you think it would make sense? My concern is that the processing thread must be fast enough to complete its cycle before a new piece of data comes in.
If this is the case, would it be necessary to use a mutex anyway?