C++ fastest sequential containers

Q139

Are Qt containers faster in sequential or random acess than std::vector?

Would :

double arg[50];
for(uint at=0; at<50; at++){
double get = arg[at];
}

Work significantly faster than:

vector <double>vec[50];
for(uint at=0; at<50; at++){
double get = vec[at];
}

Or the differences in performance are minimal?

Konstantin Tokarev

Inside std::vector<double> there is the same double[] array. Overhead occurs when you use at(), because it checks out of bound access, and it's more costly to construct, destruct, and copy, because it has additional data fields. There is std::experimental::dynarray which will become a part of next C++ stnard version, but you can already start using it in some implementations

SGaist

Hi,

Here you have a very good article about Qt vs STL containers.

? Offline

Most important for "fast" is that it fits into the processor's cache and is accessed linearly.

Q139

Thanks for the internal info about vector and long materials about containers.
Not sure yet how does compiler optimize the containers for cache and what coding techiniques to use for overriding what goes to cache.

Konstantin Tokarev

If you are not running some weird microcontroller, you don't have control over what goes to cache and what does not. All you can do is to make efficient caching possible, e.g. sequential data array smaller than cache line size has a good chance

? Offline

Youtube has a nice talk on this, Scott Meyers: Cpu Caches and Why You Care.

Q139

Good info on caches , It disencourages use of pointers and wise tips for parralel threads.
Is there profiler to simulate cache/ram acess times also?

Konstantin Tokarev

cachegrind

? Offline

RightMark Memory Analyzer

Chris Kawa

Some have suggested lists or trees - I would highly discouraged you from using these. Traversal in these is sloooow because of constant pointer chasing and cache misses caused by that. Also each insertion is a memory allocation in many implementations. Terrible for performance.

I'd say any linear structure is fine, be it std::vector, std::array, QVector or anything like that. I'd avoid naked C arrays just because it's easier to make out-of-bound access mistakes with them.
With vectors it's important that you don't allocate each time you insert. Since you know your cap (50 items) reserve the memory up front (reserve or reserve). With std:array just make it big enough to hold those 50 items.
If you use std::vector use a custom allocator not to request memory directly from OS each time you create a vector. You can find some cool allocators that use pooling or stack.

It's also very important to know your access patterns. Design your algorithms so that they access the data in a linear fassion i.e. items close to each other (ideally consecutive). If you reserve the space insertion is almost as cheap as in the array. If you use std::vector prefer emplace_back instead of push_back, as it creates an item in place instead of copying it into the vector.

Also be aware of your const usage, e.g. double get = vec[at]; If you're not modifying get later make it const. Since your vector is not const itself you're using a non-const operator[]. Use std::as_const or qAsConst in these cases. This is especially important with Qt containers like QVector, because they use implicit sharing and you might make an accidental detach (copy) by using non-const member and that would be very costly.

Q139

Started testing with linux profilers and found that vmware and native linux boot perform almost same.
And more interestingly found that running software under openSUSE or ubuntu under vmware under win 7 is significantly faster then just run in win7x64.
What could be the cause , is it linux kernel benefit and does vmware bypass windows kernel under VT-x virtualization mode?
Is there other linux compilers to try also besides gcc?

alt text
Some tests multiple times repeated and results vary very little between runs.
Software for the benchmark results is running 1 minute for each test and resut is number of calculations made.

Q139

With valgrind when using callgrind , is it possible to see overall times for functions instead of call counts also?
All logging features are enabled in settings.

mjsurette

@Q139
The clang compiler is available for Linux. I believe there are others. It also has some neat tools for formatting and doing static analysis of your code. It generally performs quite well and produces good code.

As for what container, benchmarking different ones will be the true test. @Chris-Kawa gave good advice though.

Mike

Q139

@Chris-Kawa If emplace_back is faster , is there any scenario when pushback would be needed instead?
If i have right info , emplace_back writes new item directly to array , while pushback makes new object and then writes it to array and probly destructs object later

Chris Kawa

@Q139 emplace_back is not always faster. It can be faster in certain scenarios ;)

If you have a vector of basic types (ints, floats, pointers etc.) you won't see a difference because no matter which one you use there's just gonna be a value written to memory location (probably just right from a CPU register so you can't go faster than that).

Now for the cases it does matter. Lets say you have a big, expensive to construct or copy structure:

struct BigStruct {
    BigStruct(int) { qDebug() << "Construction";  }
    BigStruct(const BigStruct&) { qDebug() << "Copy construction";  }
    BigStruct& operator=(const BigStruct&) { qDebug() << "Copy assignment"; return *this; }
};

Consider you push_back an element:

std::vector<BigStruct> foo;
foo.reserve(100);

foo.push_back(BigStruct(42));

// prints:
// Construction
// Copy construction

so a temporary instance is created, copied to the vector and then destroyed.
You might be smart and provide move semantics for your type to make it less expensive:

struct BigStruct {
   ...
   BigStruct(BigStruct&&) { qDebug() << "Move construction";  }
   BigStruct& operator=(BigStruct&&) { qDebug() << "Move assignment"; return *this; }
};

and now you have:

std::vector<BigStruct> foo;
foo.reserve(100);

foo.push_back(BigStruct(42));

// prints:
// Construction
// Move construction

This is better, but not all types are movable and you're still doing two things while you could be doing just one:

std::vector<BigStruct> foo;
foo.reserve(100);

foo.emplace_back(42);

// prints:
// Construction

This doesn't copy or move anything into the vector. It creates the thing already inside (usually via placement new). If the constructor takes parameters (like in this example 42) they are passed using perfect forwarding so everything is nice and optimizable.

For a scenario a push_back is needed - that's whenever you have an item already created and you want to put it in a vector. For example a usual case:

void SomeClass::addItem(const Item& item) {
   some_items_container.push_back(item);
}

which makes a copy of the item via copy construction.