Performance of QVector append

Kanndalf

Hi,

I have the task of adding multiple data classes to QVector (Qt 5.14.2) and experiencing performance problems.

The data class consists of multiple members (see code below).

The code is running on an iMX6 processor (ARM Cortex A9 Quad), and is much slower than on a desktop computer (factor 10).

My question: How can I optimize the append, either by another container or by optimizing the data class (reordering of the members would be possible)?

I already tried QList and QLinkedList but that does not make a big difference.

Thanks for any ideas.

#ifndef ITEM_H
#define ITEM_H

#include <QtCore>

class Item
{
public:
	typedef QByteArray Path;
	typedef QByteArray Name;
	typedef QByteArray Value;
	typedef QByteArray Time;
	typedef bool GoodQuality;

	explicit Item() {}
	explicit Item(const Path &p, const Name &n, const Value &v, const Time &t, const GoodQuality gq)
	{
		m_path = p; m_name = n; m_value = v; m_time = t; m_goodQuality = gq;
	}

private:
	Path m_path;
	Name m_name;
	Value m_value;
	Time m_time;
	GoodQuality m_goodQuality = false;
};

typedef QList<Item> ItemList;
// typedef QVector<Item> ItemList;

#endif // ITEM_H

// main.cpp
#include "main.h"

#include <QDebug>
#include <QElapsedTimer>

#include "item.h"
#include "qelapsedtimer.h"

ItemList Example::generateSubItems(const QByteArray &path)
{
	static int j = 0;
	const ItemList result = {
		Item(path + ":" + QByteArray::number(++j), "Name", "Value", "Time", true),
		Item(path + ":" + QByteArray::number(++j), "Name", "Value", "Time", true),
		Item(path + ":" + QByteArray::number(++j), "Name", "Value", "Time", true),
		Item(path + ":" + QByteArray::number(++j), "Name", "Value", "Time", true),
		Item(path + ":" + QByteArray::number(++j), "Name", "Value", "Time", true),
		Item(path + ":" + QByteArray::number(++j), "Name", "Value", "Time", true),
		Item(path + ":" + QByteArray::number(++j), "Name", "Value", "Time", true),
		Item(path + ":" + QByteArray::number(++j), "Name", "Value", "Time", true),
		Item(path + ":" + QByteArray::number(++j), "Name", "Value", "Time", true),
		Item(path + ":" + QByteArray::number(++j), "Name", "Value", "Time", true),
		Item(path + ":" + QByteArray::number(++j), "Name", "Value", "Time", true),
		Item(path + ":" + QByteArray::number(++j), "Name", "Value", "Time", true),
		Item(path + ":" + QByteArray::number(++j), "Name", "Value", "Time", true),
		Item(path + ":" + QByteArray::number(++j), "Name", "Value", "Time", true),
		Item(path + ":" + QByteArray::number(++j), "Name", "Value", "Time", true),
		Item(path + ":" + QByteArray::number(++j), "Name", "Value", "Time", true)
	};

	return result;
}

ItemList Example::generateItems(const QByteArray &path)
{
	const Item foo("Path", "Name", "Value", "Time", true);

	const ItemList bar = {foo};
	return generateSubItems(path) + generateSubItems(path) + bar;
}

int Example::run()
{
	QElapsedTimer timer;
	ItemList result;

	timer.start();
	result << Item();
	for (int i = 0; i < 1000; ++i) {
		result << generateItems(QByteArray::number(i));
	}
	result << generateItems("end");
	const int time = timer.nsecsElapsed() / 1000;
	qDebug() << "Needed" << time << "us for" << result.size() << "elements.";

	return 0;
}

int main()
{
	Example example;
	example.run();
}

#ifndef MAIN_H
#define MAIN_H

#include <QCoreApplication>
#include <QObject>

#include "item.h"

class Example : public QObject
{
	Q_OBJECT
public:
	explicit Example(QObject *parent = nullptr) {Q_UNUSED(parent)};

	int run();
private:
	ItemList generateSubItems(const QByteArray &path);
	ItemList generateItems(const QByteArray &path);
};

#endif // MAIN_H

JonB

@Kanndalf said in Performance of QVector append:

The code is running on an iMX6 processor (ARM Cortex A9 Quad), and is much slower than on a desktop computer (factor 10).

For a difference as large as a factor of 10 are you sure you are not testing a Debug vs a Release version? Make sure you are using the same compiler and Release builds on both?

Kanndalf

Hello,
thanks for your idea. I have now double-checked it. Both builds are set to "release" and it is compiled with the compiler option -O2. A crosscompiler is used for the imx6.

x64:
Needed 13816 us for 33034 elements.

arm:
Needed 275473 us for 33034 elements.

SimonSchroeder

I am not sure if the append is your problem. A general optimization approach is to use reserve() to avoid additional re-allocations when adding items one-by-one. However, you are constructing intermediate ItemLists and append those at once. This should not pose a huge problem.

However, you are constructing a lot of QByteArray objects. I don't know if these have a small string optimization. But, in general you'd have to expect an allocation for each QByteArray. Furthermore, you create a QByteArray object when you pass the parameters and then once again you copy (instead of move) the QByteArray. (And you do that even inside the constructor's body instead of the member initializer list which means you first default construct all the QByteArrays.) My bet is that your allocator on the embedded platform is slow. Avoid it wherever possible. And learn to use a profiler to figure out where all that time is actually spent.

Kanndalf

Thank you for your thoughts.

First of all: The example code here is greatly simplified. In my production code, an Item consists of different classes. I have taken your comment as an opportunity to improve the constructor and copy-constructor. The production code now runs much better, thank you for that @SimonSchroeder.

As a next step I would like to use a move-constructor to assemble the result vector. As I understand it, this should already be implemented for QByteArray. The QVector can also use this for appending individual elements. For the class Item I have now also provided this, hopefully correctly. But it seems that it is not used when two QVectors are concatenated. I hope for further tips regarding this optimization.

Below is the modified code:

//item.h
#ifndef ITEM_H
#define ITEM_H

#include <QtCore>

class Item
{
public:
	typedef QByteArray Path;
	typedef QByteArray Name;
	typedef QByteArray Value;
	typedef QByteArray Time;
	typedef bool GoodQuality;

	explicit Item() {}
	explicit Item(const Path &p, const Name &n, const Value &v, const Time &t, const GoodQuality gq) :
		m_path(p), m_name(n), m_value(v), m_time(t), m_goodQuality(gq) {}
	Item (const Item &i) noexcept :
		m_path(i.m_path),
		m_name(i.m_name),
		m_value(i.m_value),
		m_time(i.m_time),
		m_goodQuality(i.m_goodQuality) {}
	Item (Item &&i) noexcept:
		m_path(std::move(i.m_path)),
		m_name(std::move(i.m_name)),
		m_value(std::move(i.m_value)),
		m_time(std::move(i.m_time)),
		m_goodQuality(std::move(i.m_goodQuality)) {}

	// Item &operator=(const Item &i) = default;
	// Item &operator=(Item &&i) = default;
	Item &operator=(const Item &i) noexcept
	{
		m_path = i.m_path;
		m_name = i.m_name;
		m_value = i.m_value;
		m_time = i.m_time;
		m_goodQuality = i.m_goodQuality;
		return *this;
	}

private:
	Path m_path;
	Name m_name;
	Value m_value;
	Time m_time;
	GoodQuality m_goodQuality = false;
};

// typedef QList<Item> ItemList;
typedef QVector<Item> ItemList;

#endif // ITEM_H

//main.cpp
[...]

ItemList Example::generateItems(const QByteArray &path)
{
	const Item foo("Path", "Name", "Value", "Time", true);

	ItemList bar = {foo};
	return std::move(generateSubItems(path)) << std::move(generateSubItems(path)) << std::move(bar);
}

int Example::run()
{
	QElapsedTimer timer;
	ItemList result;
	result.reserve(35000);

	timer.start();
	result << Item();
	for (int i = 0; i < 1000; ++i) {
		result.append(std::move(generateItems(QByteArray::number(i))));
	}
	result << generateItems("end");
	const int time = timer.nsecsElapsed() / 1000;
	qDebug() << "Needed" << time << "us for" << result.size() << "elements.";

	return 0;
}

[...]

x64:
Needed 12071 us for 33034 elements.

arm:
Needed 217584 us for 33034 elements. <- Better, but I had hoped for more.

SeDi

A QVector<Item*> is not feasible?

SimonSchroeder

@SeDi said in Performance of QVector append:

A QVector<Item*> is not feasible?

Pointers mean more allocations. And allocations are slow. If you don't have any specific reason for pointers, prefer QVector<Item> over QVector<Item*>.