Efficient Video Playback with Qt

Nase · wrote on 18 Jun 2022, 13:09

I would like to know, how I can efficiently playback video streams. Specially, assume that I have the individual video frames already in memory and I wish to present them to the user.

Right now, I'm only considering Linux. Qt may behave differently on other platforms.

From the looks of it, QVideoWidget together with QVideoFrame seemed like a good idea. Here's what I did so far:

I created a main window with a QVideoWidget
I created a couple of QVideoFrames. In a round-robin fashion I do the following with each video frame:
- map it write-only
- memset its contents to a specific value
- unmap it
- call present of the QVideoWidget's video surface

This works for the most part, when I am using RGB24, for example.

I noticed that the QVideoSurfaceFormat allows me to specify the QAbstractVideoBuffer::HandleType (GL, EGL, Pixmap, etc.). So I queried the QVideoWidget's video surface for the supported pixel formats. Here are the results for each handle type:

NoHandle: RGB32, RGB24, ARGB32, RGB565
GLTextureHandle: none
XvShmImageHandle: none
CoreImageHandle: none
QPixmapHandle: : RGB32, RGB24, ARGB32, RGB565
EGLImageHandle: none

As far as I can tell, hardware accelerated playback does not seems to be available. AFAIK, pixmaps are rendered in software. RGB24 is fine for me, but it stands out that all YUV formats are missing. Typical video codecs decode to YUV rather than RGB. So for the general case of video playback, QVideoWidget might already be inefficient. But in my case, I really have RGB data to start with.

Furthermore, I observe that CPU changes proportionally to the size of the QVideoWidget, i.e., when the widget is small, the CPU load is low. When the widget fills almost my entire screen (4k), the CPU load is high. Note that the size of the QVideoFrame stays the same at all times. So certainly some software scaling is going on here. If the rendering of the video would be hardware accelerated, the scaling would not influence CPU load.

Is QVideoWidget accelerated on other platforms but not Linux? The API was promising in the sense that it could have offered GL accelerated video rendering. But I don't see how it can work if the list of supported pixel formats is empty. Maybe I do have to enable OpenGL first?

What is the best way to efficiently playback videos with Qt? Is there something built-in or would I have to resort to something else? Maybe the Qt-gstreamer bindings?

Thanks in advance!

SGaist · 19 Jun 2022, 22:09

Hi and welcome to devnet,

Which version of Qt are you using ?

wrosecrans · 19 Jun 2022, 23:06

Basically "efficient video playback" and "I already have the frames decoded in memory" are opposite goals. And, you have pretty much ignore stuff like QVideoWidget if you aren't going to use the whole Qt QMultimedia decode and presentation stack.

Basically, QVideoWidget and all the complexity of a QVideoFrame is built around the assumption that what you really want to is just make a player where you take advantage of hardware decode. Hardware video decode often happens on the GPU and decodes directly into video memory. Or on platforms like Android, the GPU is the same chip and the CPU and there is no separate video memory. In either case, you can present the decoded video frame with zero-copy. That's what all of those "handle" types mean under the hood. It's a handle that potentially references a texture/buffer/surface/image (different hardware and API's have slightly different jargon, but it's all the same idea) that is already local to the GPU so it doesn't need to be copied for display.

If you are decoding into CPU memory, and you need to display it, it basically doesn't really matter exactly how you upload it to the GPU. It's the same sort of "distant" copy operation on any platform with dedicated GPU memory. (And at least a copy into some region the MMU handles very differently on a system with an integrated GPU which is potentially still pretty expensive, but not as bad as copying over PCIe)

So, what does this get us? IMO, if your internal API depends on decoding to CPU memory, you can just ignore the complexity of the Qt video API's. They basically just aren't built for your sue case. You have already opted-out of the fast path at that point. If you really want OpenGL to handle stuff like the scaling and presentation, and to have multiple images "buffered" onto the GPU ahead of time,

Wrap your data with a QImage. Make a QOpenGLTexture with your QImage. Uploading the OpenGL texture puts it in GPU memory.
Make a QOpenGLWidget. Draw your texture. QOpenGLTextureBlitter is super useful for this. Annoyingly, while QPainter can do accelerated drawing on a QOpenGLWidget, it won't draw a QOpenGLTexture like it will do with a QImage, despite the fact that it might be using the QOpenGLTExture under the hood. shrug.

If you don't want to deal with that, Use a QPainter to draw a QImage onto your widget. This won't be strictly optimal because it'll have to upload the QImage at paint time instead of ahead of time, so you lose some opportunity for concurrency. But on modern hardware this often has no problem keeping up with real time, even with large images. This is the easiest way to present your images from memory, and you may find it's perfectly adequate and you have to focus on other performance problems in your video app as a higher priority.

And obviously, think real hard about whether or not decoding to the CPU memory is the right move. I'm not saying you are wrong to take that approach. If you are doing anything "interesting" with the video data, you may need to pass it to other API's and things before presentation. Doing something like a VFX compositing program probably requires this approach, and you may be able to infer from my answer here that I have some experience doing it this way because I find it useful. But if you can do what you need without decoding into CPU/Host memory, there may be a more efficient path that fits with QtMultimedia "easy path" better, even if it is a bit counterintuitive.

Nase · S SGaist 18 Jun 2022, 17:55

@SGaist said in Efficient Video Playback with Qt:

Hi and welcome to devnet,

Which version of Qt are you using ?

Sorry, forgot to mention the Qt version. I'm using latest Qt 5 on Arch Linux (5.15.4 at the time of writing).

Nase · wrote on 19 Jun 2022, 23:06

@wrosecrans said in Efficient Video Playback with Qt:

Basically "efficient video playback" and "I already have the frames decoded in memory" are opposite goals. And, you have pretty much ignore stuff like QVideoWidget if you aren't going to use the whole Qt QMultimedia decode and presentation stack.

I did not anticipate that. Looking at the QVideoFrame API, it pretty much seemed to cover the use-case that the video frame would have to downloaded to the GPU. It's even mentioned in the documentation.

Though I'm surprised that I should ignore QVideoWidge and friends, I appreciate your help.

Basically, QVideoWidget and all the complexity of a QVideoFrame is built around the assumption that what you really want to is just make a player where you take advantage of hardware decode. Hardware video decode often happens on the GPU and decodes directly into video memory. Or on platforms like Android, the GPU is the same chip and the CPU and there is no separate video memory. In either case, you can present the decoded video frame with zero-copy. That's what all of those "handle" types mean under the hood. It's a handle that potentially references a texture/buffer/surface/image (different hardware and API's have slightly different jargon, but it's all the same idea) that is already local to the GPU so it doesn't need to be copied for display.

Yes, I understand that. But depending on the available hardware decoders, even the QMultimedia framework would have to resort to software decoding and ends up with a video frame in RAM. This would then have to be downloaded to the GPU for faster presentation. Are you saying that this functionality is under the hood somewhere, but not exposed?

If you are decoding into CPU memory, and you need to display it, it basically doesn't really matter exactly how you upload it to the GPU. It's the same sort of "distant" copy operation on any platform with dedicated GPU memory. (And at least a copy into some region the MMU handles very differently on a system with an integrated GPU which is potentially still pretty expensive, but not as bad as copying over PCIe)

OK

So, what does this get us? IMO, if your internal API depends on decoding to CPU memory, you can just ignore the complexity of the Qt video API's. They basically just aren't built for your sue case. You have already opted-out of the fast path at that point. If you really want OpenGL to handle stuff like the scaling and presentation, and to have multiple images "buffered" onto the GPU ahead of time,

I agree with you, that top-efficiency would be hardware decoding of a video. But my input is not a video file.

Having multiple frames buffered inside the GPU would be my ultimate goal though.

Wrap your data with a QImage. Make a QOpenGLTexture with your QImage. Uploading the OpenGL texture puts it in GPU memory.
Make a QOpenGLWidget. Draw your texture. QOpenGLTextureBlitter is super useful for this. Annoyingly, while QPainter can do accelerated drawing on a QOpenGLWidget, it won't draw a QOpenGLTexture like it will do with a QImage, despite the fact that it might be using the QOpenGLTExture under the hood. shrug.

I was hoping to use some abstraction that would take away the complexity of that. Drawing a texture may result in terrible quality and I guess I have to be careful with setting the right interpolation options at least. I'm not sure how it all works under the hood these days. I've been doing OpenGL 15 years ago.

If you don't want to deal with that, Use a QPainter to draw a QImage onto your widget. This won't be strictly optimal because it'll have to upload the QImage at paint time instead of ahead of time, so you lose some opportunity for concurrency. But on modern hardware this often has no problem keeping up with real time, even with large images. This is the easiest way to present your images from memory, and you may find it's perfectly adequate and you have to focus on other performance problems in your video app as a higher priority.

I might do that for a start, just to see how it performs. Are you saying that QPainter can work with a QOpenGLWidget? Or are you saying that I should paint on a regular QWidget? Would that be accelerated or unaccelerated again?

And obviously, think real hard about whether or not decoding to the CPU memory is the right move. I'm not saying you are wrong to take that approach. If you are doing anything "interesting" with the video data, you may need to pass it to other API's and things before presentation. Doing something like a VFX compositing program probably requires this approach, and you may be able to infer from my answer here that I have some experience doing it this way because I find it useful. But if you can do what you need without decoding into CPU/Host memory, there may be a more efficient path that fits with QtMultimedia "easy path" better, even if it is a bit counterintuitive.

My input is not a compressed video file. I only have uncompressed video frames and they sit in RAM already. To be honest, this should be a thousand times simpler than having a compressed video, because I don't need to setup the whole video decompressed, color space conversion, or whatnot.

Thanks for the pointers! I will let you know how it goes.

Nase · wrote on 19 Jun 2022, 23:39

I have found an example for efficient 2D painting with QOpenGLWidget. I will study it.
https://doc.qt.io/qt-5/qtopengl-2dpainting-example.html