How stupid is QImage implemented??

QT-static-prgm

I thought i use Qt and not my own implementation to load an tga file. Now i notice that:

from 5 loadings of the same file:
QImage: ~428 ms
My implementation: ~116,4 ms

So i wonder that my way is more efficient than Qt.

IDK who made those implementation, but if Qt is interested in my one you can use it.

? Offline

Hi! I'm interested, where can I get the source code?

QT-static-prgm

here you go : https://git.rwth-aachen.de/carstenf/OpenGL/blob/master/QtMeshViewer/Header/tga.h

Chris Kawa

Hi, thanks for sharing.

I just skimmed through your code but by the looks of it you completely omit any kind of even basic error checking. I'd wager that's where the speedup comes from. It might be somewhat useful if you control the data but it makes your implementation unusable in an environment that can't verify the source or integrity of the loaded data. Your code is wide open to all sorts of out of bounds memory access so I don't see it used by Qt or anyone who doesn't fully control the data.

Also you do mix a lot of Qt and std code, which further complicates things. As a basic example this filePath.toStdString().c_str() makes it unusable with paths containing unicode characters on some platforms (e.g. Windows). I'd consider switching to QFile and sticking to QString paths which don't have this problem.
You also seem to assume the system is little endian in couple of places (when using reinterpret_cast), which is not always the case.
There's also no gamma correction when the file specifies the value for it.

As for the implementation details just a couple notes that caught my eye:

don't pass the filePath by value. It's always better to pass strings as const reference if you're only going to read.
You're creating a 32bit QImage even if the file is 24 bit, which can be wasteful for larger images.
you're duplicating a lot of the same calculations, e.g.

int valr = vui8Pixels.at(y * ui32Width * ui32BpP / 8 + x * ui32BpP / 8 + 2);
int valg = vui8Pixels.at(y * ui32Width * ui32BpP / 8 + x * ui32BpP / 8 + 1);
int valb = vui8Pixels.at(y * ui32Width * ui32BpP / 8 + x * ui32BpP / 8);

You're reading color components as chars, converting to ints, converting to QColor and then using setPixel, which is the slowest possible way to transfer the values. You should be using QImage::scanline and copying the bytes directly (taking care of endianness if needed).
Just for the sake of consistency I'd stick to one library i.e if you're using Qt anyway you might as well use it everywhere. For example instead of using std::uint8_t you could use quint8. Instead of std::fstream you can use QFile. Why make your code depend on two libraries if you can get by with one.
You should put your code in a .cpp file if you want it used in more than one place, otherwise you've got one definition rule violation.

jsulm

@QT-static-prgm You shouldn't use words like "stupid" - you insinuate that Qt developers are stupid, this is rude. Especially without knowing why QImage is implemented in the way it is working. Don't you think there might be good reasons for that ( @Chris-Kawa pointed out some of them)?

BjornW

The burn ^_^

kshegunov

As a side question, where's TARGA loading implemented in Qt? I peeked at the source yesterday but couldn't find neither a implementation in the QImageReader class nor a relevant plugin.

mtrch

As a side question, where's TARGA loading implemented in Qt?

It is in QtImageFormats module/subrepo (not qtbase): http://code.qt.io/cgit/qt/qtimageformats.git/tree/src/plugins/imageformats/tga

kshegunov

I wanted to see what's so different, but I guess I didn't look quite far enough to find it. Thanks for the link!

? Offline

@QT-static-prgm Thx!

QT-static-prgm

@Chris-Kawa yes I know the code is not perfect. I used it in an older version, before I used qt. When I switched to qt I used qimage but it does not support all my tga files, so the fastest solution was to copy over my old code in case that qimage failed. And then I noticed that my code was much faster. I'll look into your comments about my code and gonna rework it, since it is the bottleneck of my program.

@jsulm I was just surprised that my imperfect code was that faster than the qt solution. You are right that stupid is the wrong word for it and I'm sorry for that choose. I hope that no one feels hurt because of that. It was not my intention. You guys from qt are doing a great job and I'm always impressed that for every problem I get, the solution is an already implemented qt function (calculating transform matrix for normal is in qt so much easier then in glm.)

@kshegunov I'd be interested in the different, too. If you find it, please let me know. As I mentioned above there are some tga files even not loaded by qimage, but my code did.

@Wieland you're welcome. I'm pretty sure I'm gonna improve this further, so you may wanna stay tuned on that link. You can get the future updates that way.

Chris Kawa

Looking at the Qt's code it seems it plays it just a little too safe. Apart from the necessary bounds and header integrity checking it reads the data byte by byte and verifies each and every one of those reads. That's most likely what's killing it. That implementation could use some love too.

QT-static-prgm

i could improve the code using quint, qfile,.. and by handling the unsuccessful tries an other way. This decreases the time from ~142,5 to ~102,4.
Now i used scanline (thanks for the tipp) and now i'm somewhere around 89. But the images are no longer loaded correctly.

Maybe someone of you find the problem:

The Colors are wrong and it looks a bit wrong positioned, too.

Here the old code:

for (unsigned int y = 0; y < ui32Height; y++)
{
	for (unsigned int x = 0; x < ui32Width; x++)
	{
		int valr = vui8Pixels.at(y * ui32Width * ui32BpP / 8 + x * ui32BpP / 8 + 2);
		int valg = vui8Pixels.at(y * ui32Width * ui32BpP / 8 + x * ui32BpP / 8 + 1);
		int valb = vui8Pixels.at(y * ui32Width * ui32BpP / 8 + x * ui32BpP / 8);
		int vala = 255;
		if (ui32BpP == 32)
			vala = vui8Pixels.at(y * ui32Width * ui32BpP / 8 + x * ui32BpP / 8 + 3);

		QColor value(valr, valg, valb, vala);

		img.setPixel(x, ui32Width - 1 - y, value.rgba());
	}
}

and here the new one

for (unsigned int y = 0; y < ui32Height; y++)
{
	QRgb* imgLine = (QRgb*)img.scanLine(y);
	for (unsigned int x = 0; x < ui32Width; x++)
	{
		int valr = vui8Pixels.at(y * ui32Width * ui32BpP / 8 + x * ui32BpP / 8 + 2);
		int valg = vui8Pixels.at(y * ui32Width * ui32BpP / 8 + x * ui32BpP / 8 + 1);
		int valb = vui8Pixels.at(y * ui32Width * ui32BpP / 8 + x * ui32BpP / 8);
		int vala = 255;
		if (ui32BpP == 32)
			vala = vui8Pixels.at(y * ui32Width * ui32BpP / 8 + x * ui32BpP / 8 + 3);

		QColor value(valr, valg, valb, vala);

		imgLine[x] = value.rgba();
	}
}

==EDIT==

i fixed it. For some reason the picture was mirrored and R and B was swapped. now it just takes ~78MS vs Qt 428MS o.O
https://git.rwth-aachen.de/carstenf/OpenGL/blob/master/QtMeshViewer/Header/tga.h

if you have further ideas about how to improve, let me know.

kshegunov

@QT-static-prgm said in How stupid is QImage implemented??:

QRgb* imgLine = (QRgb*)img.scanLine(y);

This is still a reinterpret cast, so you may run into problems on OSX vs Linux/Windows (see Chris's 3rd or 4th point on byte ordering).

QColor value(valr, valg, valb, vala);

Don't convert into color (which is much more complex), rather use the qRgba() function.

Chris Kawa

If you want to go full speed ahead I would do a couple things:

further specialize the algorithm for 24 and 32 bit
explicitly handle the endianness so you don't have to go through it on a byte by byte level
avoid additional allocations and buffers. Do everything in-place.

Here's a sample for the uncompressed 32bit little endian variant:

if (ui32BpP == 32) {
    int lineWidth = ui32Width * ui32BpP / 8;
    QImage img(ui32Width, ui32Height, QImage::Format_RGBA8888);
    for (int i = ui32Height - 1; i >= 0; --i)
        file.read(reinterpret_cast<char*>(img.scanLine(i)), lineWidth);

#ifdef Q_LITTLE_ENDIAN
       return qMove(img).rgbSwapped(); //qMove makes it use in-place swap instead of a copy
#elif Q_BIG_ENDIAN
       swapTheBytesInOtherWay(img);
       return img;
#endif
}

For 24 bit image you can use QImage::Format_RGB888 internal format so you can do it as fast.
I got another 4x speedup compared to your version. You could probably optimize the rgb swap further with some clever SSE intrinsics.

QT-static-prgm

Gonna try this tomorrow. But I think swapping r and b won't be the problem. But tga holds the data bgra and qrgb is defined argb so it's not done by swapping just r and b. A needs to be moved from back to front

Chris Kawa

@QT-static-prgm said in How stupid is QImage implemented??:

But tga holds the data bgra and qrgb is defined argb so it's not done by swapping just r and b. A needs to be moved from back to front

It doesn't matter how QRgb stores the data. I'm not using it.
tga stores the bytes in little endian ARGB so the actual order of bytes in the file is [0]BB [1]GG [2]RR [3]AA.
I'm using internal format Format_RGBA8888 and on little endian platform this means the actual bytes in memory are stored as [0]RR [1]GG [2]BB [3]AA. As you can see to convert one to the other swapping R and B is enough.
If you want to go through QRgb then yes, you need extra work to put the bytes in right order, but it's unnecessary here and makes it confusing.

QT-static-prgm

wow, thank you. i'm always confused by those format things.

This is the final code. it can decide whether it is 32 or 24 and chooses the correct internal format.
i don't really wanna touch the compressed part, since it is not relevant for me, but maybe i'll speed it up in the future.

if (ui32PicType == 2 && (ui32BpP == 24 || ui32BpP == 32))
{
	img = QImage(ui32Width, ui32Height, ui32BpP == 32 ? QImage::Format_RGBA8888 : QImage::Format_RGB888);
	int lineWidth = ui32Width * ui32BpP / 8;
	for (int i = ui32Height - 1; i >= 0; --i)
		file.read(reinterpret_cast<char*>(img.scanLine(i)), lineWidth);
}

This code makes ~3-4ms!!!!

QT-static-prgm

@Chris-Kawa Any ideas how to read directly to QString??

Before i had something like this:

struct{
    char name[5];
    ...
}

...
m_file.read(F2V(tmp_header->name[0]), sizeof(tmp_header->name) - 1);

so i have my struct with an array of the size 5 and read in a string of the size 4 (last need to be the \0 character)

Now i'd like to change it to this:

struct{
    QString name;
    ...
}

...
tmp_header->name.reserve(4);
m_file.read(F2V(tmp_header->name), sizeof(tmp_header->name) );

not sure if it needs to be 4 or 5, but non of them do their work. Already tried QBytearray, but this didn't worked either.

Chris Kawa

QString uses variable length Unicode representation with 16bit codepoints. Since you've got an 8bit string you can't read directly into a QString. You'll need to convert.
The easiest way to do it is:

tmp_header->name = QString::fromUtf8(m_file.read(4));     //if your string is Utf-8 or ASCII
//or
tmp_header->name = QString::fromLatin1(m_file.read(4));   //if your string is Latin-1 or ASCII

This variant of QFile::read will allocate new QByteArray every time so if you want to avoid that you can create one yourself and reuse it:

QByteArray arr(4, 0);

m_file.read(arr.data(), 4);
tmp_header->name = QString::fromUtf8(arr);

Btw. Many of Qt data classes like QByteArray or QString are non-trivial and use implicit sharing so just directly casting them to the data type pointer is an error (I assume your F2V does something like that?). You'll get a pointer that points to some internal implementation detail not to the data. Qt's storage classes have access methods (usually called data() and constData()) that give you pointers to the underlying storage.