Simple big-looped program break down

JonB

@J-Hilk said in Simple big-looped program break down:

Maybe the compiler optimizes a lot and the loop is not executed at all 🤔

If he had been filling the elements with a constant value that would have been possible. But because he/you is filling with changing i each time, I can't see any optimization of that is possible, it will have to loop! You could look at the disassembly... :)

J.Hilk

@JonB I did, even with only level 1 optimization, the loop is removed 🤷‍♂️

smart things these compilers 😉

JonB

@J-Hilk
Noooooo!!!

xor eax, eax
ret

Those two lines just load eax with 0 (quicker than literal "load with 0") and return it as result --- it's an implicit return 0 at the end of main()!

Believe me, somewhere there above it is the code for the loop :) Otherwise, if the compiler has really decided there are no side-effects because pa is never referenced so it will go on strike and do nothing, put a return pa[10] or something after your loop.

J.Hilk

@JonB actually, when I change the compiler to MSVC, the loop will never be removed, even with O3
🤨 Windows, am I right!?

JonB

@J-Hilk
Yep, that's more like it, good old MSVC! As I said, retry gcc with return pa[10] after the loop and then see?

J.Hilk

@JonB said in Simple big-looped program break down:

return pa[10]

you're right by returning pa[10] from main, the release timings look much more like I would have expected:

211 84 52 52

Please_Help_me_D

@JonB I'm trying to understand how to adjust QVectror::iterator to assign values to an array. I did the following:

    QVector<int> a = {10, 20, 30, 40, 50};
    QVector<int>::iterator it = a.begin();

    clock_t start_time =  clock();
    while (it != a.end())
    {
        cout << *it << endl;
        it++;
    }

but that just showed me what is the iterator. How to modify the code to fill an empty QVector with numbers?

JonB

@Please_Help_me_D said in Simple big-looped program break down:

    int i = 0;
    while (it != a.end())
    {
        *it = i++;
        it++;
    }

Please_Help_me_D

@JonB So the execution time for the code:

    QVector<int> a(100000000);
    QVector<int>::iterator it = a.begin();
    int i = 0;

    clock_t start_time =  clock(); // начальное время
    while (it != a.end())
    {
        *it = i++;
        it++;
    }
    clock_t end_time = clock(); // конечное время
    clock_t d_time = end_time - start_time;
    cout << d_time << endl;

is 8-9 seconds
By the way, the same operation in Matlab with single (float) precision accuracy takes 0.3 seconds. That means that Matlab loops not so slow as I thought:

// Matlab code
a = single(zeros(100000000,1)); // preallocate the memory
tic // start time
for n = 1:100000000
    a(n) = single(n); // assign avlue to each position
end
toc // end time

J.Hilk

@Please_Help_me_D
If anything, than you should take away from my test, that there's a difference between release and debug builds in c++

build and run your test in release, it will drop way below 1 sec

JonB

@Please_Help_me_D
Please do as @J-Hilk has said before, if you're compiling or running for debug there will be a vast difference from release/optimized

Also separately, verify what the MATLAB does in your code with

for n = 1:100000001

Please_Help_me_D

@J-Hilk sorry I forgot that!

    int *pa = new int [100000000];
    clock_t start_time =  clock(); 
    for (int i = 0; i < 100000000; i++)
    {
        pa[i] = i;
    }
    clock_t end_time = clock(); 
    clock_t d_time = end_time - start_time;
    cout << d_time << endl;

is 0.14 second

    QVector<int> a(100000000);

    clock_t start_time =  clock();
    for (int i = 0; i < 100000000; i++)
    {
        a[i] = i;
    }
    clock_t end_time = clock(); 
    clock_t d_time = end_time - start_time;
    cout << d_time << endl;

is also about 0.14 second

    QVector<int> a(100000000);
    QVector<int>::iterator it = a.begin(); /
    int i = 0;

    clock_t start_time =  clock(); 
    while (it != a.end())
    {
        *it = i++;
        it++;
    }
    clock_t end_time = clock(); 
    clock_t d_time = end_time - start_time; 
    cout << d_time << endl;

is also about 0.14 second
So Matlab loops is about twice slower :)
Thank you very much! It's good to know such difference in perfomance in Debug and Release mode

Please_Help_me_D

@JonB Matlab automatically increases the vector size by 1. But if I dont preallocate the size of the vector and with each iteration the vector size increases then it takes way much time. Here I do the iteration without preallocation and variable "a" is created when first iteration is performed:

// Matlab code
tic
for n = 1:100000000
       a(n) = single(n);
end
toc

35 seconds to preform

JonB

@Please_Help_me_D
For presumably similar slowness with QVector, try creating it empty with just QVector<int> a and use a.append(i) in the loop. I anticipate some slowness compared to pre-sizing, but (hopefully) not as bad as it could be:

This operation is relatively fast, because QVector typically allocates more memory than necessary, so it can grow without reallocating the entire vector each time.

Please_Help_me_D

@JonB very interesting result as for me:

     QVector<int> a;

     clock_t start_time =  clock();
     for (int i = 0; i < 100000000; i++)
     {
         a.append(i);
     }
     clock_t end_time = clock();
     clock_t d_time = end_time - start_time;
     cout << d_time << endl;

in Debug mode it takes 5.5-6 seconds
in Release mode it takes 0.7 second (while in Matlab as I previously said it takes 35 seconds)
So QVectror is really fast even if it works with <appending mode>
Thank you for this example!

jsulm

@Please_Help_me_D said in Simple big-looped program break down:

So QVectror is really fast even if it works with <appending mode>

As far as I know QVector allocates more memory as currently is need and on resizing it allocates again more than "current_size + 1".

Please_Help_me_D

@jsulm yes it does.
I tested how much my program weigh with Windows Task Manager for different number of loops (say n) and two mode of QVector (normal and <append>). Here is the result (Debug mode):

    QVector<int> a(100000000);

    clock_t start_time =  clock();
    for (int i = 0; i < 100000000; i++)
    {
        a[i] = i;
    }
    clock_t end_time = clock();
    clock_t d_time = end_time - start_time; 
    cout << d_time << endl;

n = 100000000 -> 392464 kilobytes
n = 100000000/2 -> 196780 kilobytes
n = 100000000/4 -> 98936 kilobytes
n = 100000000/8 -> 50008 kilobytes
!_____________________________!

     QVector<int> a;

     clock_t start_time =  clock();
     for (int i = 0; i < 100000000; i++)
     {
         a.append(i);
     }
     clock_t end_time = clock(); 
     clock_t d_time = end_time - start_time; 
     cout << d_time << endl;

n = 100000000 -> 527456 kilobytes
n = 100000000/2 -> 264804 kilobytes
n = 100000000/4 -> 133468 kilobytes
n = 100000000/8 -> 67804 kilobytes

If I use simple C++ array insead of QVector then for n = 100000000 -> 391780 kilobytes (almost the same as simple QVector).

So the conclusion I see is that QVector in <append> mode weighs 1.35 times simple QVector or C++ array

J.Hilk

@Please_Help_me_D said in Simple big-looped program break down:

So the conclusion I see is that QVector in <append> mode weighs 1.35 times simple QVector or C++ array

IIRC, than its 1.5, same as std::vector

JonB

@jsulm said in Simple big-looped program break down:

@Please_Help_me_D said in Simple big-looped program break down:

So QVectror is really fast even if it works with <appending mode>

As far as I know QVector allocates more memory as currently is need and on resizing it allocates again more than "current_size + 1".

Which is why I originally quoted from https://doc.qt.io/qt-5/qvector.html#append:

This operation is relatively fast, because QVector typically allocates more memory than necessary, so it can grow without reallocating the entire vector each time.

As for the exact timing: if, say, it re-allocates by powers of 2, I make that something log2(100000000) == 27 reallocations? :)

kshegunov

@JonB said in Simple big-looped program break down:

As for the exact timing: if, say, it re-allocates by powers of 2, I make that something log2(100000000) == 27 reallocations?

It doesn't. If memory serves me it has a special progression for the first 128 elements or so, and after that it doubles its size on each realloc.