Why QThread cannot speedup my program on Intel core 2 duo Q6700
-
I wrote a simple program computing pi, which uses integration 4 / (1 + x * x) for x from 0 to 1.
@// The serial code:
double computePi(){
int n = 2000000000;double result = 0.0; double temp; double step = (double)1.0 / n; for (int i = 0; i < n; i++){ temp = step * i; result += 4 / (1 + temp * temp); } result *= step; return result;
} // it cost 23.618s on Q6700
// Multithread code:
class ThreadPi : public QThread{Q_OBJECT
public:
void setData(int index, int p, int n);
void addResult(double &adding);protected:
void run();private:
int index;
int p;
int n;double result;
};
void ThreadPi::setData(int index, int p, int n){
this->index = index;
this->p = p;
this->n = n;
}void ThreadPi::addResult(double &adding){
adding += result;
}void ThreadPi::run(){
int i;
result = 0;
int interval = n / p;
int startpoint = index * interval;
double temp;
double step = (double)1.0 / n;
for (i = 0; i < interval; i++){
temp = step * (i + startpoint);
result += 4 / (1 + temp * temp);
}
result *= step;
}
class QMulThreadPi : public QObject{
Q_OBJECTpublic:
QMulThreadPi();
~QMulThreadPi();double Result(); void Compute();
private:
int p;
int n;
ThreadPi *TP;
double result;
};
QMulThreadPi::QMulThreadPi(){
p = 4;
n = 2000000000;TP = new ThreadPi[p]; result = 0;
}
QMulThreadPi::~QMulThreadPi(){
delete [] TP;
}double QMulThreadPi::Result(){
return result;
}void QMulThreadPi::Compute(){
for (int i = 0; i < p; i++){
TP[i].setData(i, p, n);
TP[i].start();
}
for (int i = 0; i < p; i++){
TP[i].wait();
TP[i].addResult(result);
}
}
// it cost 32.324s on Q6700@
However, on my laptop(Intel core 2 duo T9300) the serial code cost 16.532s, while the multithread code only cost 10.137s. -
[quote author="Franzk" date="1282279430"]Please update your code with code tags (@) so it becomes readable.[/quote]
Thanks for your remind. it's my first time starting a discussion.
i tried it in another i7 920 computer currently. It works well and four threads made up 2.82x speedup. So i think it's Q6700's problem.
-
And what times are on Q6700? I can't believe that something that parallels ok at T9300 and i7 strangely breaks at Q6700.
-
[quote author="Denis Kormalev" date="1282287379"]And what times are on Q6700? I can't believe that something that parallels ok at T9300 and i7 strangely breaks at Q6700.[/quote]
Serial code cost 23s, while parallel code cost 32s. I tried it many times.
But when i increase p from 4 to 32 or 64 on Q6700, it will speedup 3.7x...
-
How do you actually run the multithreaded code? Are you doing something stupid there that forces a serialization again?
-
@PigiRoN: Little OT, but just out of curiosity...
Would it be possible to use "QtConcurrent::mappedReduced":http://doc.qt.nokia.com/4.6/qtconcurrentmap.html#mappedReduced instead of almost all the code you have written?
I think that the "QtConcurrent":http://doc.qt.nokia.com/4.6/threads-qtconcurrent.html module contains a lot of goodies for doing this kind of stuff.
-
mario: Yeap, that is what I would have tried, too. Let's try to help with the current solution though.
-
There are can be really many causes for this behaviour. It should be investigated by yourself which one you have encountered. It can be bind with the cache related issues(missing, races etc.) and it can be just an overhead of using thread. The best answer you can get from the thorough profiling of your code.
-
[quote author="ixSci" date="1282324950"]There are can be really many causes for this behaviour. It should be investigated by yourself which one you have encountered. It can be bind with the cache related issues(missing, races etc.) and it can be just an overhead of using thread. The best answer you can get from the thorough profiling of your code.
[/quote]More strange that it works ok for author at another multicore processors. So source architecture problems and possible race conditions are hardly the reason.
-
[quote author="Denis Kormalev" date="1282326556"][quote author="ixSci" date="1282324950"]There are can be really many causes for this behaviour. It should be investigated by yourself which one you have encountered. It can be bind with the cache related issues(missing, races etc.) and it can be just an overhead of using thread. The best answer you can get from the thorough profiling of your code.
[/quote]More strange that it works ok for author at another multicore processors. So source architecture problems and possible race conditions are hardly the reason. [/quote]
As long as increasing number of threads to 64, it will speedup 1.98x on dual core cpu or 3.79x on quad core cpu stably.