QMutex big performance differences windows vs linux os

Q139 · 2 Mar 2017, 10:57

Hi,
For some reason QMutex is getting very big performance differences windows vs linux os, I think it may be related to mutex triggering speed or some other aspect.

Code is for organizing data in ram between vectors and its identical for Linux version or windows.
But on windows something slows program down to use only 2-3 threads efficiently while on Linux it can use 10+ threads efficiently.
I still start 10+ thread but on windows max efficiently working 3 threads , if 5-6 already it bottleneck to just using about 12% cpu(2 threads) on task manager vs on linux all threads 100% usage and much faster speed.

Problem began after starting to use mutexes but mutexes are needed to prevent crashes if same vectors are writen/read by multiple threads.

For loading it needs one vector to use it as reference to organize new data loaded by all other threads and all data loaded to other multidimensional vector.
All threads use mutexes at 7-8 looped code positions and it gets looped more trough 3-4 mutexes on each thread.

Could the problem be in windows kernel and not user configurable or is there some way to optimize similar bottlenecks on windows?

jsulm · Q Q139 3 Feb 2017, 10:44

@Q139 Well, without code it is hard to say.

bnogal · wrote on 3 Feb 2017, 12:35

I dont remember well.... but windows 7 had some limitations with mutex, solved in newer versions or in linux.

I dont remember the specific term... neither why it happens

Q139 · 2 Mar 2017, 19:28

@bnogal
Got better performance with win 7 if put bigger segmants of code in 1 mutex instead of 2 for smaller specific regions.
It probably is due to too many times of using mutex and too repetative code.

kshegunov · 2 Apr 2017, 15:01

As @jsulm said you should first provide the actual code. Additionally you have to say which version of Qt this is.

@Q139 said in QMutex big performance differences windows vs linux os:

It probably is due to too many times of using mutex and too repetative code.

I really doubt it, but it'll depend on the actual use case. The QMutex implementation uses futexes on Linux and (if memory serves me) mutex handles on windows, however the Qt implementation does some special handling before passing the responsibility to the underlying OS functions. So for example uncontested mutex locks are very fast, which isn't exactly the standard behaviour, however I have to dig up the exact article so take this with a grain of salt ...

Q139 · 2 Apr 2017, 18:35

I am not great on multithreaded design so it may as well be due to some coding/implementing mistakes.

Qt version is latest 5.7.1 , tested both compilers msvc2015 and mingw
Used 2 extern mutexes and multiple threads for sorting data in ram or loading in data from files
Also UI thread locks extraLoadMutex about 5-20x per second if updating ui

#// In .h file
extern QMutex extraLoadMutex;
extern QMutex dataLoadMutex;

#//Simplified function that mostly uses mutex locks  in .cpp file , not much heavy cpu use code between the lines and mostly 1-2 lines relating to extraDataVec used in single execution.
 dataLoadMutex.lock();
        data_struct previousData = dataVec[at]; // getting data from single vector with all threads to sort data correctly
        data_struct NextData = dataVec[at+1];
dataLoadMutex.unlock();
#//...
extraLoadMutex.lock(); // locking mutex to add to multidimansinal vector
#//...
data_struct dataToModify = extraDataVec[indexToMod][modAt];
#//...
extraDataVec[indexToMod][modAt]=dataToModify;
#//...
extraDataVec[index][extraDataVec[index].size()-1]=ver;
#//...
extraDataVec[index].push_back(ver);
extraLoadMutex.unlock();

Tryed also using more mutexes near code where extraDataVec manipulated but it didnot do good on windows 7.
Maybe itl be good to recode it to use seperate vectors and pointers to vectors, then can reduce by 1 mutex for loading threads, if to wont no crash.
Multidimensional vector without mutexes caused crash with high probability.

kshegunov · 2 May 2017, 00:17

Looking at this code ... well are mutexes really needed? You can read one part of the array and write another part of the array safely without locking at all. Without the full code it isn't clear if this is the case, but as a rule of thumb the code will perform much better if you split the array in chunks between the threads and each thread operates on its own part instead of locking the whole array.