Speed comparison QtQuick 2.0 against QtQuick 1.1

borut123

I have a simple QML app (one screen, about 20 elements on the screen, images, text, listview) written for Qt 4.8.
It runs quite nice.

Since Qt5 promises greater speed, especially for QML rendering, I have ported it to Qt5 (QtQuick 2.0).
Port was done without any problems.

But the app speed (repainting etc.) surprises me a lot. The speed degradation from Qt4.8 to Qt5 is really significant.
I can not find any good reason for that.

I'm using Qt5 RC, on Ubuntu platform. Anybody else has the same issues, or at least any clue what could be wrong (maybe
graphic card drivers)?

sierdzio

[quote author="borut123" date="1355224213"]I'm using Qt5 RC, on Ubuntu platform. Anybody else has the same issues, or at least any clue what could be wrong (maybe
graphic card drivers)?[/quote]

Very probable. Are using mesa or a "proper" graphic driver?

borut123

Yes, I use mesa.

sierdzio

That might be the issue, then. Mesa is using the CPU to paint OpenGL calls.

borut123

But, shouldn't performance be at least comparable to the one in Qt 4.8, which also
does not use GPU?

sierdzio

It will be when you use QtQuick1 module.

Although I have to say that I did notice QtQuick being slower, but that was just after Qt5 began taking shape, since then it got better and now I don't have any problems (last checked with beta1).

chrisadams

It's because of mesa.

With a real GPU/drivers, QtQuick2 is faster in every way: it instantiates less QObjects (eg, for binding expressions and bound signal handler expressions), it renders faster, and it animates more smoothly with a more predictable clock.

Of course, it is dependent on your GPU: if certain GL calls are really, really slow on your chipset, for example, you can notice stalls in the pipeline during filling the command buffer, which can result in skipped frames and thus worse perceived performance. But these cases are pretty rare.

Cheers,
Chris.

lgeyer

Have you tried using a software rasterizer like llvm-pipe instead?

feldifux

Are there any performance tests that measure the difference of Qt4 and Qt5 in terms of JS evaluation, bindings, signals&slots, property reads and writes from QObjects, etc.? So not rendering-related, but rather the QtCore and Declarative modules?

It would be interesting to see statements like "Qt5 is x% faster in evaluating this sample JS code"...

borut123

I have not tried llvm-pipe yet. thanks for the tip.

digital_aspirin

[quote author="feldifux" date="1355404129"]Are there any performance tests that measure the difference of Qt4 and Qt5 in terms of JS evaluation, bindings, signals&slots, property reads and writes from QObjects, etc.? So not rendering-related, but rather the QtCore and Declarative modules?

It would be interesting to see statements like "Qt5 is x% faster in evaluating this sample JS code"...[/quote]

You can use the Qml profiler tool in QtCreator and compare QtQuick1 vs QtQuick2. Qml profiler is a great tool.

chrisadams

@feldifux:

There used to be. I maintained a suite of benchmarks as part of an internal code coverage and performance analysis tool (a simple set of scripts using gcov for coverage, a preload lib to hook malloc/free/new/delete for memory usage analysis, and simple QElapsedTimer timings for rough perf estimates).

I don't think anyone has been running anything similar (for the declarative module, at least) since the Brisbane office was closed down.

The performance differences were interesting:

instantiating the object hierarchy was a fair bit faster (as we delayed a lot of things)
memory usage was MUCH lower (due to concerted effort to save bytes, everything from big changes like specialised and non-qobject-based expression classes, down to smaller changes like using bitfields and packing structs more effectively)
blocks of JavaScript code ran faster.... but property access and context lookups were basically the same speed as with JSC.

So, all in all, most JS evaluation was the same speed, tbh (including signal handler evaluation, property reads and writes to QObjects, and binding evaluation since that's basically a reflection of QObject property access speed). We did use specialised notifiers instead of normal QObject signal emission to trigger bindings, so it would have been a bit faster, but I couldn't tell you how much, exactly.

The major win in Qt5 over Qt4.x was the compiled bindings evaluator got much, much better. In QtQuick1, the QDeclarativeCompiledBinding could handle a few different types of expressions; in QtQuick2, it was improved and called (rather tongue-in-cheek) v4, and it currently handles a lot more expressions. The vast majority of property bindings should now be evaluated by v4 instead of V8, thus avoiding entering a JS evaluation context and doing property lookups / writes from JS (which is, as explained earlier, the main bottleneck in evaluating expressions).

Depending on the situation, you can get order-of-magnitude performance improvements by tweaking a few bindings so that they're v4-able. One example is styling: if you do styling via a .pragma library js resource, shared by all of your QML objects, no bindings will be v4able. Replace it with a QObject singleton type, and 95% of the styling bindings (color: MyTheme.highlightColor; etc) will be v4able. From memory, in some benchmarks we saw as much as a 25% performance improvement in creation time (including first-time binding evaluation) of a complex, styled component, by enabling v4.

I guess, like most things, there are lots of factors which apply, so talking about "performance" can be misleading (eg, we did some things to save bytes in QtQuick 2 which, on paper, look like they might cause a performance degradation due to, for example, dereferencing members of structs which are allocated on first use rather than on object instantiation. But then you have to take improved caching performance into account, and secondary effects of delayed instantiation, and things quickly become very fuzzy and situation-dependent).

In short - the performance of QML2 is much better than QML1 in some ways. QObject property lookup in JS is still incredibly slow, however, as we were never able to integrate properly with V8's inline caching mechanism. We tried to get around it by writing V4 to do as much JS evaluation as possible (and in the future, with V4VM we won't need to use V8 at all).

Cheers,
Chris.

feldifux

Hi Chris,
thanks a lot for this in-depth answer - that was the kind of information I was hoping for!

Are the performance benchmarks you mentioned anywhere available open-source?

Could you provide further informations about which bindings/expressions can be optimized in QML1? And which can be optimized in QML2 (i.e. v4-able)? With that information we could change our QML code to be most efficient. This would also be a great resource for the Qt docs for advanced performance optimization. The basic document about QML Performance is very high-level and rendering-related: http://doc.qt.digia.com/qt/qdeclarativeperformance.html

You mentioned in your post we should avoid property lookups/writes from JS, how should we access properties instead? Something like writing property lookups in C++ and expose a higher-level system and its API for QML?

Is there a general approximate factor how much slower QML & JS expressions are in QML1 / QML2 compared to C++? I think I did read something like 3 times slower on average, at least with QML1.

Cheers,
Christian

chrisadams

Hi,

No problems.

No, the benchmarks aren't available anywhere at the moment - although I don't see why they couldn't be, as they were all very simple. It certainly wasn't a complete set of a benchmarks, merely a useful one. I don't know if I still have them any more; I can take a look.

The "compilability" of bindings in QML1 was fairly non-systematic (basically, Aaron wrote the code for the big-win cases for which he could implement the code without too much effort, as there were bigger fish to fry). I couldn't tell you exactly which sorts of expressions work and which don't, but inspection of the qdeclarativecompiledbindings code should tell you everything you need to know.

In QML2 it was far more systematic; basically there are only a few things which, for various reasons, aren't compilable (unfortunately, an if statement without an else is one of those, currently ;-). Roberto and Aaron spent a lot of time and effort on the IR and the compilation, so it's pretty good. Lars and Simon are improving it further into a full JS engine, v4vm, but that's not finished yet.

For information about what makes something compilable in QML2, see http://doc-snapshot.qt-project.org/5.0/qtquick/qtquick-performance.html - it's pretty in-depth. In general, the Qt5 docs for QtQml/QtQuick are far and away much better than those from Qt4, but you need to be aware that some of the perf docs in particular don't apply to QtQuick1/QML1.

As far as avoiding property lookups and writes in JS, in my opinion your best bet is to simply minimise the number of them that you do, in JS expressions, rather than avoiding JS expressions altogether. For example, by caching object resolutions and property accesses.

I don't have any approximation of how much slower JS is than C++, in QML (either 1 or 2). It entirely depends on the situation. In QML2, with V8, big blocks of JS which don't interact with any Qt C++ classes are actually optimised aggressively at runtime, for example. Most small bindings are v4'd anyway, so it's only signal handlers (called bound signal expressions, in the code) and dynamic functions which are evaluated by the JS engine.

In general I'd probably say: JS is fast enough, that you don't have to implement "glue logic" or "UI logic" in C++, but for anything more (ie, business logic, heavy calculations etc) I'd use C++. After all, that's what QML is designed for: thin UI layer, separated from your application logic which gets implemented in C++ and exposed to QML.

Cheers,
Chris.

feldifux

Hi Chris,
yes that's what I experienced at C++ & QML development as well: most of the code can be written in QML, unless when it gets really performance-sensitive. E.g. something that needs to be calculated every frame per game object in a game, is not ideal to be done in QML, and also not what it was designed/created for.

As we understand better how to write v4able code in QML, we can use that information to create high-performance applications. Thanks for your hints about how to do that and for the performance doc of Qt5 - it indeed is much better than the Qt4 one and is a great resource.

Cheers,
Christian

elpuri

[quote author="chrisadams" date="1355702511"]Depending on the situation, you can get order-of-magnitude performance improvements by tweaking a few bindings so that they're v4-able.[/quote]

I'd be very interested in getting debug prints on bindings that are not evaluated by v4.

Since you probably know qtdeclarative like your own pockets, could you give me a hint roughly where in the code is it determined whether a binding is a-ok for v4?

chrisadams

Hi,

In QV4IRBuilder, look for calls to discard(). Basically, we attempt to v4 every binding expression, but if we hit something we can't resolve at compile time (or something which is deemed too complex to build an appropriate intermediate representation for) we call discard() which basically says "use v8 for this binding istead."

Setting QML_COMPILER_STATS env var will give you detailed information on the way in which different bindings are optimised. There are three different types of bindings currently in the QML engine:

v4 bindings - resolved at compile time, very fast. We parse the JS, convert it to an IR, and emit machine code to perform the operations.
shared-context bindings - run with v8, from a shared context. For every QML file with sharable (more on this later) non-v4able bindings, we build a js file at runtime (in memory) which contains just an array of functions which are the (rewritten) binding expressions. Eg, "property int d: if (b) 20" has the "if (b) 20" rewritten to something like: "function q_d() { if (b) return 20 }". Each binding function is accessed by index, and there is a single context shared by all of the binding expressions. These sort of bindings are slow because they use V8, but there isn't much per-binding overhead.
non-shared-context bindings - run with v8, from a binding-specific context. These are called "QScript bindings" for historical reasons. Basically, while we're rewriting any of the binding expressions for (2), if we hit something like "eval()" we abort, because we can't use a shared context (since the binding might modify the evaluation context). These sort of bindings are extremely slow, have a lot of overhead, and should be avoided.

Take the following example:

@
$ cat test2.qml
import QtQuick 2.0

Item {
property bool b: true
property int c: b ? e : 50
property int d: b ? 40 + e : 50
property int e: if (b)
20
else
30
property int f: if (b)
20 + c
else
30
property int g: if (b) 10
property int h: if (b) d + e
property int i: pfunc()
property int j: eval("pfunc()")
function pfunc() { return 120 }

Component.onCompleted: {
    console.log("b = " + b)
    console.log("c = " + c)
    console.log("d = " + d)
    console.log("e = " + e)
    console.log("f = " + f)
    console.log("g = " + g)
    console.log("h = " + h)
    console.log("i = " + i)
    console.log("j = " + j)
}

}
@

Run with compiler stats, we can see the sorts of things which are optimizable and which aren't. It's easy to see that v4 can still be improved a long way. The v4vm project in playground is making good progress on this front, but it's not in QtDeclarative yet.

@
$ QML_COMPILER_STATS=1 ~/Code/qt/qt5/qtbase/bin/qmlscene test2.qml
QML Document: "file:///home/chriadam/Code/test/conditionalbinding/test2.qml"
Component Line 3
Total Objects: 1
IDs Used: 0
Optimized Bindings: 2
(5:21) (7:21)
Shared Bindings: 5
(6:21) (11:21) (15:21) (16:21) (17:21)
QScript Bindings: 1
(18:21)
b = true
c = 20
d = 60
e = 20
f = 40
g = 10
h = 80
i = 120
j = 120
@

Also note that using Qt.binding() will result in a non-shared-context binding function being generated, and so should be avoided if possible.

Cheers,
Chris.

elpuri

Chris,

thanks a billion for this exceptionally informative answer! If you ever find the perfect idle moment, you should definitely write down series of blog posts about the guts of the QML engine.

feldifux

I absolutely agree with elpuri!

Cheers & merry christmas,
Christian

chrisadams

Thomas McGuire has already starting doing that - see http://www.kdab.com/category/blogs/qmlengineseries/ - and he recently gave a talk at DevDays about the internals (although I haven't found the time to watch his presentation yet).

The problem with writing blogs about how things work, is that the implementation of the language is constantly changing. Granted, the rate of change is much slower now than it was a few months ago, but it's still changing. For example: when Lars and Simon integrate v4vm in 5.1 or 5.2, the 3 different binding types will reduce down to just 1. In 5.3 or 5.4 when v4vm+engine is improved so that the generated dynamic metaobjects and bindings IR can be converted via AOTC into C++ code and compiled directly, everything will be different again. The QML typesystem is in a state of flux - Alan is currently making some changes which had been on my todo list for 5.1, which will mean that composite types (QML-document-defined types with dynamic metaobjects) will be able to be resolvable by typename from JS, with all that that implies.

I completely agree that knowing how the internals work is vital to be able to write maximally performant code... but to be honest, I don't know how much value there is in documenting some of this stuff, given how much it's likely to change. And some things are just horrible and shouldn't ever be documented (like the way value types work) - they're voodoo magic, and just work, ok? ;-P

The real problem is that the internal code is so badly commented / documented that it is almost impossible for outsiders to understand the nuances of the code from reading it. In general though, it's all pretty simple - if you read Thomas' blogs, and then take a look at the engine classes, you'll get a pretty clear idea of how it all fits together: the parser, the typeloader, the compiler (which generates dynamic metaobjects and a series of QML VME instructions to generate the object hierarchy and perform property initialisation), the VME (which executes the instructions to instantiate the hierarchy, initialise properties, and performs first time binding assignment).

From that point on, it's all just run-time interaction between objects, primarily via signals and signal handlers, and binding re-evaluation.

There are lots of areas of QML which can be greatly improved; I guess we'll just have to wait and see how it evolves over the 5.x series.

Cheers,
Chris.

Discover and share your #QtStories

Upcoming Forum Update May 2nd

Speed comparison QtQuick 2.0 against QtQuick 1.1

Felgo simplifies

What others say

Felgo simplifies

What others say

Felgo simplifies

What others say

Felgo simplifies

What others say