Returning QList as reference and value has difference in performance

Sivan

Hello everyone,

I am currently working on improving my application's performance and stumbled upon this issue which I don't understand why it is affecting the performance.

According to Qt's documentation, it is stated that QList is implicitly shared for as long as NO non-const function is called on that list. However, I do see a significant difference in the following case. The below example is an abstraction of my code, in which I can see the performance difference as well

#include <QApplication>
#include <QElapsedTimer>
#include <QDebug>

//#define USE_REF
//#define USE_FOREACH
class MyList
{
public:
    MyList()
    {
        for (int i = 0; i < 10; ++i)
        {
            topList.append(i);
        }

        for (int i = 0; i < 2000; ++i)
        {
            midList.append(i);
        }

        for (int i = 0; i < 10; ++i)
        {
            bottomList.append(i);
        }
    }

#ifdef USE_REF
    const QList<int>& GetTopLevelList() const { return topList; }
    const QList<int>& GetMidLevelList() const { return midList; }
    const QList<int>& GetBottomLevelList() const { return bottomList; }
#else
    QList<int> GetTopLevelList() const { return topList; }
    QList<int> GetMidLevelList() const { return midList; }
    QList<int> GetBottomLevelList() const { return bottomList; }
#endif

private:
    QList<int> topList;
    QList<int> midList;
    QList<int> bottomList;
};

int main(int argc, char *argv[])
{
        MyList myList;
        QElapsedTimer timer;
        timer.start();
#ifdef USE_FOREACH
        Q_FOREACH(int i, myList.GetTopLevelList())
        {
            Q_FOREACH(int j, myList.GetMidLevelList())
            {
                Q_FOREACH(int k, myList.GetBottomLevelList())
                {
                    int w = i + k + j;
                }
            }
        }
#else
        const QList<int>& topLevel = myList.GetTopLevelList();
        for (int i = 0; i < topLevel.size(); ++i)
        {
            const QList<int>& midLevel = myList.GetMidLevelList();
            for (int j = 0; j < midLevel.size(); ++j)
            {
                const QList<int>& lowLevel = myList.GetBottomLevelList();
                for (int k = 0; k < lowLevel.size(); ++k)
                {
                    int w =  i + k + j;
                }
            }
        }
#endif
        qDebug() << "<<<<< Elapsed = " << timer.elapsed();
}

Here is the table for the elapsed time for the combination of USE_REF and USE_FOREACH

USE_REF = 0, USE_FOREACH = 0 ==> 1ms
USE_REF = 0, USE_FOREACH = 1 ==> 3ms
USE_REF = 1, USE_FOREACH = 0 ==> 0ms
USE_REF = 1, USE_FOREACH = 1 ==> 3ms

I am specifically interested in the case of USE_REF = 0 and USE_FOREACH = 0, where I am getting the elapsed time as 1ms. Why is this different from USE_REF = 1, and USE_FOREACH = 0, when I am not calling any non-const function (which means there should not be any deep copy)?

I have also included the case USE_FOREACH and would like to know on why it is impacting the performance so much.

Tested on: Qt-Version: Qt 5.4.1 MacOS 10.12.2

Thank you in advance :-)

jsulm

@Sivan Well, even if QList is implicitly shared it is still an object. If you return it by value then new QList is created and copy constructor of the old one is called, this is way slower than returning a reference (which is technically same as returning a pointer).

Sivan

@jsulm, Hmm, I just find that 1ms for the above case is too long since Qt's documentation mentioned that this contructor should be very fast. Below is the note from QList Copy Constructor,

QList::QList(const QList<T> &other)
Constructs a copy of other.

This operation takes constant time, because QList is implicitly shared. This makes returning a QList from a function very fast. If a shared instance is modified, it will be copied (copy-on-write), and that takes linear time.

But anyway, let me convince myself that 1ms is due to the accumulation of the number of time the object is created and the copy constructor is called.

What about the performance of Q_FOREACH? Why is it different from the normal for loop?

JKSH

@Sivan said in Returning QList as reference and value has difference in performance:

USE_REF = 0, USE_FOREACH = 0 ==> 1ms

USE_REF = 0, USE_FOREACH = 1 ==> 3ms

USE_REF = 1, USE_FOREACH = 0 ==> 0ms

USE_REF = 1, USE_FOREACH = 1 ==> 3ms

These time differences are very very small. Are you sure they're not caused by jitter in your computer? (For example, the OS might have decided to do some background maintenance just as you started the test, which could add a few ms to your code execution)

To do a proper test, do both of the following:

Use a large dataset.
- 2 000 integers is nothing to a modern CPU. Try 200 000 instead.
Do repeated measurements and find the average:

QElapsedTimer timer;
QVector<qint64> measurements;
for (int x = 0; x < 10000; ++x) {
    timer.start();

#ifdef USE_FOREACH
    ...
#endif

    measurements << timer.elapsed();
}

double totalTime = 0;
for (time : measurements)
    totalTime += time;

qDebug() << "Average time =" << totalTime / 10000;

// TODO: Export the measurement values to a file.
// Use a spreadsheet program to plot a histogram

Note: It is possible that the first value in measurement is always larger than the other 9999. If this is the case, remove the 1st value before calculating the average.

J.Hilk

@Sivan said in Returning QList as reference and value has difference in performance:

I am specifically interested in the case of USE_REF = 0 and USE_FOREACH = 0, where I am getting the elapsed time as 1ms. Why is this different from USE_REF = 1, and USE_FOREACH = 0, when I am not calling any non-const function (which means there should not be any deep copy)?

I have also included the case USE_FOREACH on why it is impacting the performance so much.

Tested on: Qt-Version: Qt 5.4.1 MacOS 10.12.2

Thank you in advance :-)

Were your tests in debug or release mode?

Also qt's foreach function is a bad idea. It creates by definition a copy of each element to work with. That's going to impact your performance.

You should rather sue the "new" for syntax

per reference to modfy the entries.

//all ints to 0
for(int &i : myList)
     i = 0;

or const refence to work with the values

int sum(0);
for(const int &i : myList)
     sum +=i;

VRonin

I think you are focusing on the wrong problem tbh.

changing for (int i = 0; i < topLevel.size(); ++i) into for (int i = 0, maxVal=topLevel.size(); i < maxVal; ++i) probably has a bigger impact on performance than the allocation of a shared data pointer.

Sivan

@JKSH, yes the time difference in this case is very small but it helps in improving the performance of my application to achieve 60fps for example.

I tried the test as you mentioned. FYI, my initial code had 3 nested "for loops" with the second loop as 2000. I changed it to 200, 000 as you said.
I ran the test for 1000 times instead of 10, 000 times as it was taking too long. Removing the first loop's case and taking the average of 999, here is what I have got:

USE_REF = 1, USE_FOREACH = 0 ==> 31ms
USE_REF = 0, USE_FOREACH = 0 ==> 85ms

Sivan

@J.Hilk I am running my test in debug mode. Yes, going forward, I am avoiding Q_FOREACH.

Sivan

@VRonin That DOES help in improving the performance. But there are still significant difference between the two functions. Applying this method on the test requested by @JKSH, I got the following results:

list itemUSE_REF = 1, USE_FOREACH = 0 ==> 14ms
list itemUSE_REF = 0, USE_FOREACH = 0 ==> 66ms

VRonin

The difference you are seeing is the burden of calling a copy constructor and destructor inside the nested loops (even if it's cheap it still has to go on the stack, increase an integer and decrease it again)

The below should show no significant difference for return vale and ref.

const QList<int>& topLevel = myList.GetTopLevelList();
const QList<int>& midLevel = myList.GetMidLevelList();
const QList<int>& lowLevel = myList.GetBottomLevelList();
        for (int i = 0, maxi=topLevel.size(); i < maxi; ++i)
        {

            for (int j = 0, maxj=midLevel.size(); j < maxj; ++j)
            {
                for (int k = 0, maxk=lowLevel.size(); k < maxk; ++k)
                {
                    int w =  i + k + j;
                }
            }
        }

Sivan

@VRonin , Yes, agree that the above code would show NO significant difference between the return value and ref. I intentionally wrote that code in such a manner because in the original issue that I am trying to solve, each loop depends on its outer loop (i.e the second loop list depends on the first loop list).

I will take your's and @jsulm answer on why there is a performance difference between calling the function which return by value and ref .
Thanks @VRonin