Qt algorithm benchmark app , weird result
-
wrote 4 days ago last edited by
I am creating an app to visualize algorithms and perform benchmark on them, I am having trouble figuring out why I am getting weird result. The problem is that I start benchmark on BFS algorithm and the results look like this.
source code : https://github.com/krystian5642/Algorithms.git
The time complexity should be O(V+E) and it is but not always. Third run is O(V+E) and I am not sure what is happening in the first and the second run, the result is always the same (after third run, it is not always O(V+E)). The whole benchmark runs as a separate thread (I got the same results when it was main thread).
-
wrote 4 days ago last edited by
multi-processing timesharing machine OSs are not good for benchmarking events with tight timing windows. The schedulers cannot be considered deterministic in how they schedule processes or how much CPU time each thread gets.
This is generally why simulations on linux and windoze platforms use a pseudo-clock/counter, so that the sim provides reproducable data.
-
multi-processing timesharing machine OSs are not good for benchmarking events with tight timing windows. The schedulers cannot be considered deterministic in how they schedule processes or how much CPU time each thread gets.
This is generally why simulations on linux and windoze platforms use a pseudo-clock/counter, so that the sim provides reproducable data.
wrote 4 days ago last edited by cronos4455@Kent-Dorfman is there something I can do about this to mitigate it or It has to be like this ? As far as I remember when I tested algorithms in console app , I didn't have this problem so I assumed I could be something Qt-releted
-
@Kent-Dorfman is there something I can do about this to mitigate it or It has to be like this ? As far as I remember when I tested algorithms in console app , I didn't have this problem so I assumed I could be something Qt-releted
wrote 4 days ago last edited by@cronos4455 For start - it appears that you are using
QElapsedTimer
to measure algorithm execution time, and that is a poor choice for your use case. The main reason being that what it measures is elapsed wall clock time, which has the highest variance as it is the most sensitive to whatever else is happening on the system at the same time (it is very possible that during a good chunk of the elapsed time, your application is not even running - as pointed out by @Kent-Dorfman)If nothing else, what you should measure is elapsed user time - just the time the thread running the algorithm actually spent on-CPU executing application (and not kernel-level) code. To my knowledge Qt has no API for that, so you'll need to use OS API directly (e.g on Windows you'll use the
GetThreadTimes
API) or take a dependency on a C++ benchmarking library that has the requisite APIs.Note that even using user time will still have some noticeable variance - there are effects related to content of CPU cache lines and branch prediction misses (which are most visible in the first time the code runs). The best way to deal with it is to do each run multiple times (as much as required so the total running time is at least a few seconds) and take the average time for each input size - possibly without the first one or two repetitions of each run. By the law of large numbers - this will be much closer to the "true" run time.
-
@cronos4455 For start - it appears that you are using
QElapsedTimer
to measure algorithm execution time, and that is a poor choice for your use case. The main reason being that what it measures is elapsed wall clock time, which has the highest variance as it is the most sensitive to whatever else is happening on the system at the same time (it is very possible that during a good chunk of the elapsed time, your application is not even running - as pointed out by @Kent-Dorfman)If nothing else, what you should measure is elapsed user time - just the time the thread running the algorithm actually spent on-CPU executing application (and not kernel-level) code. To my knowledge Qt has no API for that, so you'll need to use OS API directly (e.g on Windows you'll use the
GetThreadTimes
API) or take a dependency on a C++ benchmarking library that has the requisite APIs.Note that even using user time will still have some noticeable variance - there are effects related to content of CPU cache lines and branch prediction misses (which are most visible in the first time the code runs). The best way to deal with it is to do each run multiple times (as much as required so the total running time is at least a few seconds) and take the average time for each input size - possibly without the first one or two repetitions of each run. By the law of large numbers - this will be much closer to the "true" run time.
wrote 3 days ago last edited by@IgKh I could not use GetThreadTimes because it has low precision, but I used QueryThreadCycleTime and got the same results just like QElapsedTime
-
@IgKh I could not use GetThreadTimes because it has low precision, but I used QueryThreadCycleTime and got the same results just like QElapsedTime
wrote 3 days ago last edited by@cronos4455 said in Qt algorithm benchmark app , weird result:
I could not use GetThreadTimes because it has low precision
Documentation says that it has 100ns precision, no? If that is too low, that's a strong indicator that the runtime of what you are measuring isn't long enough, and needs to be inflated to get a stable reading. Micro-benchmarking is really hard... In my experience measuring things that take less than around 10ms is hard to do reliably, but that's very architecture and environment related.
@cronos4455 said in Qt algorithm benchmark app , weird result:
I used QueryThreadCycleTime and got the same results just like QElapsedTime
That's surprising. I'd expect the effect to be mitigated at least somewhat. I'd hazard a guess that the main effect would be then that at some input size ranges the generated code for the algorithm being tested runs into CPU pipeline stalling or the like. It is difficult to pinpoint why, but as I said the best way to get through with this is to run the algorithm with each given input size multiple times and take the average minus outliers.
-
@cronos4455 said in Qt algorithm benchmark app , weird result:
I could not use GetThreadTimes because it has low precision
Documentation says that it has 100ns precision, no? If that is too low, that's a strong indicator that the runtime of what you are measuring isn't long enough, and needs to be inflated to get a stable reading. Micro-benchmarking is really hard... In my experience measuring things that take less than around 10ms is hard to do reliably, but that's very architecture and environment related.
@cronos4455 said in Qt algorithm benchmark app , weird result:
I used QueryThreadCycleTime and got the same results just like QElapsedTime
That's surprising. I'd expect the effect to be mitigated at least somewhat. I'd hazard a guess that the main effect would be then that at some input size ranges the generated code for the algorithm being tested runs into CPU pipeline stalling or the like. It is difficult to pinpoint why, but as I said the best way to get through with this is to run the algorithm with each given input size multiple times and take the average minus outliers.
wrote 3 days ago last edited by cronos4455@IgKh I mean I had this issue with GetThreadTimes
https://stackoverflow.com/questions/30432982/getthreadtimes-returning-0-zero-as-the-kernel-and-user-time-for-all-the-thr
Returned values were either 0 or 15,625ms.I was was surprised too, especially that first run takes up to 1 min , second 40s and third less than 20s but what I tested algorithms in a console app then created a chart in excel from results I waited 2-3min for the first run , and the results were normal (linear), I mean a console app without Qt stuff, just C++ console app in visual studio
Now , I will just run it multiple times.
-
@Kent-Dorfman is there something I can do about this to mitigate it or It has to be like this ? As far as I remember when I tested algorithms in console app , I didn't have this problem so I assumed I could be something Qt-releted
wrote 3 days ago last edited by@cronos4455 said in Qt algorithm benchmark app , weird result:
is there something I can do about this to mitigate it or It has to be like this ? As far as I remember when I tested algorithms in console app , I didn't have this problem so I assumed I could be something Qt-releted
What sort of an answer are you expecting here, given that nobody knows anything about what you code does? People won't know what is specific about the first, second or third runs.
Qt does not mean anything has to be graphical or UI. Why do you suspect a Qt issue in your case? You can write Qt console programs just as much as UI ones, so have you tested your algorithm using that?
You seem to run your test three times. What happens when you run it a hundred times? Does it settle down into something consistent between runs? There is, for example, a lot of "cache hitting" which can go on in runs of code, especially in the first few times.
-
@cronos4455 said in Qt algorithm benchmark app , weird result:
is there something I can do about this to mitigate it or It has to be like this ? As far as I remember when I tested algorithms in console app , I didn't have this problem so I assumed I could be something Qt-releted
What sort of an answer are you expecting here, given that nobody knows anything about what you code does? People won't know what is specific about the first, second or third runs.
Qt does not mean anything has to be graphical or UI. Why do you suspect a Qt issue in your case? You can write Qt console programs just as much as UI ones, so have you tested your algorithm using that?
You seem to run your test three times. What happens when you run it a hundred times? Does it settle down into something consistent between runs? There is, for example, a lot of "cache hitting" which can go on in runs of code, especially in the first few times.
wrote 2 days ago last edited by@JonB I managed to solve it, It was a problem with QSet, function ,,insert'' took a lot of time event though I used ,,reserve''. I didn't have this problem in my console app because I used stl containers.
-
4/9