creating QOpenGLWidget performance issues
-
wrote on 23 Dec 2023, 03:28 last edited by
Hi there,
im trying to investigate why it is so slow to open a QOpenGLWidget. My application uses multiple of these for rendering onto various independant "windows". I've noticed opening one of them can take a very long time.
I decided to do some stack sampling to see where its spending its time, and its always in QOpenGLContext::create (under a resize event, for example).
This was surprising for me since Im using a global shared context.
QSurfaceFormat myFormat; myFormat.setDepthBufferSize(24); myFormat.setSwapInterval(0); QSurfaceFormat::setDefaultFormat(myFormat); QCoreApplication::setAttribute(Qt::AA_ShareOpenGLContexts);
My issue is on windows 10. I tried on MacOS, and this seems a lot faster, so my guess is the windows implementation for creating a context is very slow.
My questions are: how can I optimise this? Why is it even trying to create context when im trying to globally share it? Am I mis-understanding what this should be doing or even am I doing it wrong?
Thanks for any help.
-
Hi there,
im trying to investigate why it is so slow to open a QOpenGLWidget. My application uses multiple of these for rendering onto various independant "windows". I've noticed opening one of them can take a very long time.
I decided to do some stack sampling to see where its spending its time, and its always in QOpenGLContext::create (under a resize event, for example).
This was surprising for me since Im using a global shared context.
QSurfaceFormat myFormat; myFormat.setDepthBufferSize(24); myFormat.setSwapInterval(0); QSurfaceFormat::setDefaultFormat(myFormat); QCoreApplication::setAttribute(Qt::AA_ShareOpenGLContexts);
My issue is on windows 10. I tried on MacOS, and this seems a lot faster, so my guess is the windows implementation for creating a context is very slow.
My questions are: how can I optimise this? Why is it even trying to create context when im trying to globally share it? Am I mis-understanding what this should be doing or even am I doing it wrong?
Thanks for any help.
wrote on 23 Dec 2023, 09:29 last edited byUsing Qt::AA_ShareOpenGLContexts doens't mean that your QOpenGLWidgets are sharing a context.
It means they use a share group which is a mechanism whereby several OpenGL contexts can share (some) resources such as textures between each other.
-
Hi,
You should also share some more information such as:
- Qt version
- Windows version
- Hardware used on that machine
Providing a minimal compilable example that shows that behaviour would also help.
-
Using Qt::AA_ShareOpenGLContexts doens't mean that your QOpenGLWidgets are sharing a context.
It means they use a share group which is a mechanism whereby several OpenGL contexts can share (some) resources such as textures between each other.
wrote on 23 Dec 2023, 10:52 last edited by@SamiV123 thanks , that certainly answers why I am seeing contexts being created. So now just a matter of working out why it takes so long to make a new context with shared resources.
What I currently have is a part of a massive application Sgaist , but I'll try have a crack at preparing a minimal reproduce example to come soon.
-
wrote on 23 Dec 2023, 11:07 last edited by
It would not surprise me to know that the context creation is slow on Windows. After all you first must create a context in order to create a context, so depending how smart/stupid the Qt implementation is it it could be doing a lot of work.
(Backstory of this stupidity is that creating modern OpenGL context requires extensions which are only available through a context, so you must first create a context, then query it for WGL extensions and then use those extensions to create the modern context)
-
It would not surprise me to know that the context creation is slow on Windows. After all you first must create a context in order to create a context, so depending how smart/stupid the Qt implementation is it it could be doing a lot of work.
(Backstory of this stupidity is that creating modern OpenGL context requires extensions which are only available through a context, so you must first create a context, then query it for WGL extensions and then use those extensions to create the modern context)
wrote on 23 Dec 2023, 12:48 last edited by@SamiV123 That really does sound inefficient, and sounds like lots of work could be done on windows side there, not that I know enough on the topic for it.
A long shot if you know, but you suggested that Qt implementation might be able to work with this in a smarter way? I.e. are you aware of possible implementations Qt can use to get around the special windows OS behaviour? Or are we out of luck and stuck to the confines of what can be done in the provided WGL APIs.
FWIW you are right, I can see from stack traces a lot of the time its stuck in WGL calls.
-
Hi,
You should also share some more information such as:
- Qt version
- Windows version
- Hardware used on that machine
Providing a minimal compilable example that shows that behaviour would also help.
wrote on 23 Dec 2023, 12:54 last edited by NightShadeI@SGaist Hi Sgaist, got a minimal example i'll provide here, some notes first though:
- Some parts of this application might not have any production use (e.g. QOpenGLWidget doesn't do anything with OpenGL). The point is more to illustrate the slow initialisation.
- I create 50 widgets in any given go. In my application its more like 4-5 max, however still gets the point across
- QT version is 6.6.0
- OS in windows 10
- Can see for context creation stack trace it is using the onboard intel graphics card, as opposed to my NVIDIA RTX 2060 , I suppose this should be fine.
- Please change
std::list<ExampleWidget<QOpenGLWidget>> theChildren;
tostd::list<ExampleWidget<QWidget>> theChildren;
to compare the performance to a regular widget. The difference should be certainly noticeable.
template<typename ParentT> class ExampleWidget : public ParentT { public: ExampleWidget(QWidget* aParent) : ParentT{aParent} {} private: void onPaint() { QPainter myPainter{this}; myPainter.setBrush(Qt::red); myPainter.drawRect(0, 0, 5, 5); } void paintGL() { onPaint(); } void paintEvent(QPaintEvent*) { onPaint(); } }; class ExampleRoot : public QWidget { std::list<ExampleWidget<QOpenGLWidget>> theChildren; public: explicit ExampleRoot(QWidget* aParent) : QWidget{aParent} { this->setFocusPolicy(Qt::StrongFocus); } void keyReleaseEvent(QKeyEvent* aEvent) final { if (aEvent->isAutoRepeat()) return; switch(aEvent->key()) { case Qt::Key_Space: { theChildren.clear(); auto& myRootChild = theChildren.emplace_front(this); myRootChild.setGeometry(rand() % 1000, rand() % 500, 500, 500); for (int i = 0; i < 50; ++i) { auto& myChild = theChildren.emplace_front(&myRootChild); myChild.setGeometry(i*5, i*5, 5, 5); } myRootChild.show(); } break; default: break; } } }; int main(int aArgc, char* aArgv[]) { QSurfaceFormat myFormat; myFormat.setDepthBufferSize(24); myFormat.setSwapInterval(0); QSurfaceFormat::setDefaultFormat(myFormat); QCoreApplication::setAttribute(Qt::AA_ShareOpenGLContexts); QApplication* myApplication = new QApplication{aArgc, aArgv}; QMainWindow myMasterWindow; // We create a dummy root GL widget since the main window is destroyed // and recreated on first QOpenGLWidget setup. This hides the behaviour QOpenGLWidget myRootGlWidget{&myMasterWindow}; myRootGlWidget.hide(); ExampleRoot myExampleRoot{nullptr}; myExampleRoot.setGeometry(0, 0, 1000, 500); myExampleRoot.show(); myMasterWindow.setCentralWidget(&myExampleRoot); myMasterWindow.setWindowTitle("Qt forum example"); myMasterWindow.showMaximized(); return myApplication->exec(); }
-
@SGaist Hi Sgaist, got a minimal example i'll provide here, some notes first though:
- Some parts of this application might not have any production use (e.g. QOpenGLWidget doesn't do anything with OpenGL). The point is more to illustrate the slow initialisation.
- I create 50 widgets in any given go. In my application its more like 4-5 max, however still gets the point across
- QT version is 6.6.0
- OS in windows 10
- Can see for context creation stack trace it is using the onboard intel graphics card, as opposed to my NVIDIA RTX 2060 , I suppose this should be fine.
- Please change
std::list<ExampleWidget<QOpenGLWidget>> theChildren;
tostd::list<ExampleWidget<QWidget>> theChildren;
to compare the performance to a regular widget. The difference should be certainly noticeable.
template<typename ParentT> class ExampleWidget : public ParentT { public: ExampleWidget(QWidget* aParent) : ParentT{aParent} {} private: void onPaint() { QPainter myPainter{this}; myPainter.setBrush(Qt::red); myPainter.drawRect(0, 0, 5, 5); } void paintGL() { onPaint(); } void paintEvent(QPaintEvent*) { onPaint(); } }; class ExampleRoot : public QWidget { std::list<ExampleWidget<QOpenGLWidget>> theChildren; public: explicit ExampleRoot(QWidget* aParent) : QWidget{aParent} { this->setFocusPolicy(Qt::StrongFocus); } void keyReleaseEvent(QKeyEvent* aEvent) final { if (aEvent->isAutoRepeat()) return; switch(aEvent->key()) { case Qt::Key_Space: { theChildren.clear(); auto& myRootChild = theChildren.emplace_front(this); myRootChild.setGeometry(rand() % 1000, rand() % 500, 500, 500); for (int i = 0; i < 50; ++i) { auto& myChild = theChildren.emplace_front(&myRootChild); myChild.setGeometry(i*5, i*5, 5, 5); } myRootChild.show(); } break; default: break; } } }; int main(int aArgc, char* aArgv[]) { QSurfaceFormat myFormat; myFormat.setDepthBufferSize(24); myFormat.setSwapInterval(0); QSurfaceFormat::setDefaultFormat(myFormat); QCoreApplication::setAttribute(Qt::AA_ShareOpenGLContexts); QApplication* myApplication = new QApplication{aArgc, aArgv}; QMainWindow myMasterWindow; // We create a dummy root GL widget since the main window is destroyed // and recreated on first QOpenGLWidget setup. This hides the behaviour QOpenGLWidget myRootGlWidget{&myMasterWindow}; myRootGlWidget.hide(); ExampleRoot myExampleRoot{nullptr}; myExampleRoot.setGeometry(0, 0, 1000, 500); myExampleRoot.show(); myMasterWindow.setCentralWidget(&myExampleRoot); myMasterWindow.setWindowTitle("Qt forum example"); myMasterWindow.showMaximized(); return myApplication->exec(); }
@NightShadeI On my machine creating those 50 contexts takes about 500ms (in release build), out of which only about 1.5% time is spent in WGL. About 15% time is spent creating and destroying that dummy window needed for the dummy context. About 20% time is spent in logging to
qCDebug
though, which is just plain silly. I guess that implementation indeed could use some love.Anyways, 500ms all in all doesn't sound too bad for 50 contexts, so if it takes significantly longer on your machine and most of it is really spent in WGL then I'd blame the driver. Integrated Intel GPU drivers aren't exactly known for their great OpenGL implementation. Make sure you have the latest driver and if updating doesn't help you might want to look into how to force the selection of the NVIDIA adapter and if that is any better for you.
See here for how you might be able to do this.
-
@NightShadeI On my machine creating those 50 contexts takes about 500ms (in release build), out of which only about 1.5% time is spent in WGL. About 15% time is spent creating and destroying that dummy window needed for the dummy context. About 20% time is spent in logging to
qCDebug
though, which is just plain silly. I guess that implementation indeed could use some love.Anyways, 500ms all in all doesn't sound too bad for 50 contexts, so if it takes significantly longer on your machine and most of it is really spent in WGL then I'd blame the driver. Integrated Intel GPU drivers aren't exactly known for their great OpenGL implementation. Make sure you have the latest driver and if updating doesn't help you might want to look into how to force the selection of the NVIDIA adapter and if that is any better for you.
See here for how you might be able to do this.
wrote on 25 Dec 2023, 07:28 last edited by@Chris-Kawa Merry christmas. Thanks for responding on this, always find your responses very helpful :)
Very interesting discoveries, and I am wondering if the way I am currently tracing the stack might be misleading. Given I am using the poor-mans method of finding performance issues as discussed here. No flamegraphs or anything fancy. Stopping GDB execution 5ish times always seems to land on the GL context creation.
I took your advice in heart and tried using NVIDIA , unfortunately it didn't change anything for me. Did verify it was used via task manager and the stack tracing method I shared above. I did notice NVIDIA was calling into some other windows-specific DLL which might be the bottleneck , didn't look too closely into it.
In the end what I did was refactor "some" of my QOpenGLWidgets to just be regular QWidgets, given initialisation time is important to me for UX. Now that i've done that , response time to open a widget is essentially instant.
Perhaps someday i'll stumble on the need to improve the performance again for the GL widgets, however this solves it in the meantime.
Again thanks for your insights!
-
@Chris-Kawa Merry christmas. Thanks for responding on this, always find your responses very helpful :)
Very interesting discoveries, and I am wondering if the way I am currently tracing the stack might be misleading. Given I am using the poor-mans method of finding performance issues as discussed here. No flamegraphs or anything fancy. Stopping GDB execution 5ish times always seems to land on the GL context creation.
I took your advice in heart and tried using NVIDIA , unfortunately it didn't change anything for me. Did verify it was used via task manager and the stack tracing method I shared above. I did notice NVIDIA was calling into some other windows-specific DLL which might be the bottleneck , didn't look too closely into it.
In the end what I did was refactor "some" of my QOpenGLWidgets to just be regular QWidgets, given initialisation time is important to me for UX. Now that i've done that , response time to open a widget is essentially instant.
Perhaps someday i'll stumble on the need to improve the performance again for the GL widgets, however this solves it in the meantime.
Again thanks for your insights!
Merry christmas
Thanks, same to you.
Stopping GDB execution 5ish times always seems to land on the GL context creation.
Oh, that's absolutely the wrong way to do it. You might as well roll a dice. Use a profiler. That's what they're for. Since you're on Windows Visual Studio has a basic profiling tool that's more than enough for this kind of task.
In the end what I did was refactor "some" of my QOpenGLWidgets to just be regular QWidgets
I'm glad you found a solution that works for you. If you ever get into it again make sure to measure it properly. Debugging GPU related performance issues is tricky. Intuition goes out the window pretty quickly and reliable numbers are the best way to deal with it.
1/10