Performance issues with multiple visible Windows

Zmote89

Hello qt forum,

I'm currently working on my pre-study at the university and am required to program a logging tool, which has following significant specifications:

being able to process logs in 20ms intervals (lower bar)
being able to visually show up to 20 Clients logging in tabbed view
being able to visually show up to 20 Clients in Floating Windows

Now, I've successfully covered the first two parts with Threads processing the logs and signaling the appropriate slots in the GUI Thread for adding the log entry to the appropriate table (QTreeView with Custom Table Model, inherited from QAbstractTableModel). This works fine for tabbed views, as only one tab is visible at all times, so the update of the view doesn't cause the GUI Thread to freeze.

But as soon as I have 20 Windows open, the GUI isn't responsive at all anymore.

I can't use buffering for the logs, because the logs may arrive at varying intervals and it needs to show arrived logs "instantly". What I considered trying was an updater thread, that would tell the views to draw themselves every 200ms, which for the enduser would still be immediate, but would allow the GUI to have more processing time for responding to UI inputs. This approach improved the responsiveness of the tabs, but didn't much improve the floating windows scenario. I stretched the updating time to 1s and had a significant improvement in the responsiveness of the floating windows, so I think this is related to the view update when beginInsertRows /endInsertRows is called.

So my question is: Is there a way you could suggest I could try to improve my responsiveness for 20 Windows being updated almost simultanously? The GUI Thread being the only thread able to update the Views is kind of a bottleneck I'm having a hard time to figure out a way out from my current situation. I tried updating the windows from seperate window threads, but I think because the QTreeView's were created in the GUI Thread, they are connected to the main event loop of the QApplication and I don't really know how I would go about defining an own event loop for the floating windows and their threads if I wanted the event handling to be done in the window threads themselves (ie. when beginInsertRows is called, it would notify only the floating windows UI to redraw).

Now, following, are code excerpts of relevant parts of my app:
This Code segment is executed in the Client Threads, preparing the log entry, after the log entry is ready, it's signaled to the GUI
Thread via client.signalReady(...):

QList<QString> CommonLogUtils::prepareLogEntry(RT_Client& client, const REP::PROTO::LogMsg& msg)
{
	QList<QString> row{};
	row.append(QString::number(client.itemCount()));
	//TODO abklären ob wirklich dateLocal hier
	auto dateWrapper = msg.timestamp().dateLocal();
	row.append(CommonStringUtils::prepareTimeString(1900 + dateWrapper.tm_year, dateWrapper.tm_mon, dateWrapper.tm_mday, "-"));
	row.append(CommonStringUtils::prepareTimeString(dateWrapper.tm_hour, dateWrapper.tm_min, dateWrapper.tm_sec, ":"));
	row.append(client.getGroupName());
	row.append(client.getClientName());
	row.append(resolveMessageKind(msg.kind()));
	row.append(msg.message().c_str());
	return row;
}

void CommonLogUtils::routeOutput(RT_Client& client, const REP::PROTO::LogMsg& msg)
{
	if (client.live())
	{
		client.increaseItemCount();
		client.signalReady(prepareLogEntry(client, msg));
	}

	if (client.toParent())
	{
		client.signalParentReady(prepareLogEntry(client, msg));
	}

	if (client.toFile())
	{
		QString filename = client.getClientName() + ".txt";
		QFile file{filename};
		if (file.open(QIODevice::ReadWrite | QIODevice::Append | QIODevice::Text))
		{
			//TODO adjust output
			QTextStream stream(&file);
			stream << msg.timestamp().str().c_str() << " " << client.getGroupName() << " " << client.getClientName() << " " << msg.message().c_str() << endl;
		}
	}

	if (client.toRingFile())
	{

	}
}

Within the client structure, singalReady looks like this:

void RT_Client::signalReady(QList<QString> row)
{
	emit logProcessed(clientName, row);
}

When the clients first arrive, they send out a register message, before logging, via this register message, the client structures are created and the connections made:

	if (!modell.getClientsList().contains(msg.client().c_str()))
	{
		modell.updateClientList(msg.client().c_str());
		modell.getClientsModel()->appendRow(new QStandardItem(msg.client().c_str()));

		//Add client
		modell.addClient(msg.id(), msg.client().c_str(), msg.group().c_str());
		connect(&modell.getClient(msg.client().c_str()), &RT_Client::logProcessed, this, &RT_Main::handleProcessedLogs, Qt::ConnectionType::QueuedConnection);
		connect(&modell.getClient(msg.client().c_str()), &RT_Client::logForParentProcessed, this, &RT_Main::handleProcessedLogsForParent, Qt::ConnectionType::QueuedConnection);
	}

So when a client signals "signalReady", it emits a "logProcessed" signal, which is connected to the GUI's handleProcessedLogs slot, which looks like this:

void RT_Main::handleProcessedLogs(QString clientName, QList<QString> row)
{
	RT_Client& client = modell.getClient(clientName);
	client.addLog(row);
}

Within the addLog function, the entry is added to the model:

void RT_Client::addLog(QList<QString> row) const
{
	CustomTableModel * model = static_cast<CustomTableModel *>(table->model());
	if (isNewestOnTop) {
		model->insertOnTop(row);
	}
	else {
		model->appendRow(row);
	}
	if (isScrollToBottom) {
		scroll();
	}
}

When creating a floating window, I switch the connections, as such:

void RT_Main::createLogWindow(QString tableName, QPoint startPosition)
{
	if (!subWindowExists(tableName))
	{
		RT_Client& client = modell.getClient(tableName);
		if (client.getClientName() == RT_Constants::UNSPECIFIED)
		{
			client = modell.getGroupClient(tableName);
		}
		QThread* thread = new QThread();
		RT_LogWindowWorker* worker = new RT_LogWindowWorker(client, false, startPosition);
		worker->moveToThread(thread);
		connect(static_cast<CustomQTabWidget*>(modell.getTabWidget()), &CustomQTabWidget::dragFinishedWithSuccess, worker->logWindow, &RT_LogWindow::closeWindow);
		connect(worker->logWindow, &RT_LogWindow::logWindowClosed, this, &RT_Main::logWindowClosed);
		connect(this, &RT_Main::windowCloseRequested, worker->logWindow, &RT_LogWindow::closeWindow);
		disconnect(&client, &RT_Client::logProcessed, this, &RT_Main::handleProcessedLogs);
		connect(&client, &RT_Client::logProcessed, worker, &RT_LogWindowWorker::handleProcessedLogs);
		thread->start();
		subWindows.insert(tableName, worker);
	}
}

The other connection are for drag'n'drop functionality. So what happens is the prepared logs are now sent directly to the worker thread and basically are handled the same way:

void RT_LogWindowWorker::handleProcessedLogs(QString clientName, QList<QString> row) const
{
	client.addLog(row);
}

But because the client and it's QTreeView were created in the GUI Thread, the beginInsertRows signal emitted is processed by the GUI Thread and I'm at square zero. Even if they weren't, I'd need an own event loop in the threads, and I don't know if that's the way to go or how to do that:

ie. How to reconnect the beginInsertRows signals
How to draw the QTreeView from within the threads

If you know of a better solution, I'm all ears for it. I'd really appreciate the help!

This is a picture from the problematic situation, even with about 8-10 windows, it laggs horribly(ie. mouse events, click events take a long time to be registered, when dragging windows they don't follow smoothly etc.)

alt text

SGaist

Hi and welcome to devnet,

From a quick look at your code, you are doing many useless copies. Use const references for your non trivial function parameters if you don't modify them e.g. all QString and QList related parameter. Both are implicitly shared class so you would gain some CPU avoiding these copies.

Then why all the calls to c_str() ? That again makes useless conversions and copies.

As for the 20 windows, what about checking the visibility of the widget to avoid doing updates when they are not fully shown ?

Zmote89

I tried to make QList and QStrings' const but I keep getting compile errors that it can't reinterpret_cast const. I'll try once more , as I also believe it would improve performance by avoiding needless copying.

I thought about enabling/disabling updating for windows that aren't visible, but as per use case, they should be able to be seen all at the same time (several monitors) per requirement. I'll check again if all 20 have to be visible at the same time or if it's only meant that 20 windows should be able to be open and only visible ones should render. But I'm fairly certain they should be able to be visible at the same time as their old reporting tool is capable of doing so (which was written with a different framework than qt).

As for all the c_str()'s, I'm using those because the data arrives as std::string and I need QString's for the UI.

Thanks for the tipps, I'll run with it as far as I can.

SGaist

Where exactly do you get that error ?

I must say that if somehow has to actually watch 20 live feed widgets at the same time there's likely going to be problem in the workflow.

That the architecture allows to take a look at more then one log at a time doesn't sounds unreasonable but 20 in parallel is not something that a human can efficiently parse especially if there's no visual clue as to what he should be looking for (e.g. colour specific entries like blue for info, red for critical etc.) but even so the quantity of information will likely be too much.

Can you explain what is monitored like that ?

Zmote89

@SGaist

Hi and sorry for the late response!

The application is a real-time logging tool, where a logging library is used across the board in various other applications and sub-applications, which all log their logs via the network to a central computer, where they are managed. The ones using this software usually display the log windows on multiple monitors and keep them running 24/7. Sometimes the logging library is used by a life video feed, which can at times output a lot of logs in a very short amount of time (the 20ms lower bar requirement). In a real-time situation, there can be multiple video feeds running in parallel, thus the 20 Clients at the same time requirement. This tool will be used in the testing environement of a military surveying simulation, where the afore mentioned situation can be simulated. It is used to track down errors and other issues within the other applications they are developing.

When I declare const QList<const QString> as function parameters or signal/slots I get the error of being unable to cast QString to const QString (the same with QList), also the error usually says the reinterpret_cast would invalidate or something like that the constness.

This happens even if I use const QList<const QString> across the bar and usually the error is shown to be in qlist.h, where there is a node assignment like:

current->v = new T(reinterpret_cast<T>(src->v)); (I think that's in node_copy function)

Also, I sometimes get the reinterpret_cast errror like "can't cast to const QList<const QString> * ", eventhough I'm not using any pointers.

UPDATE 1:

Quick update on the const error:
Schweregrad Code Beschreibung Projekt Datei Zeile Unterdrückungszustand
Fehler C2440 "=": "const QString *" kann nicht in "void *" konvertiert werden rad-reportingtool-project-library C:\Qt\Qt5.6.0\5.6\msvc2015\include\QtCore\qlist.h 452

"const QString *" kann nicht in "void *" konvertiert werden says, "const QString *" can't be converted to "void *". Which is weird, as I'm not really using any pointers.

Zmote89

UPDATE2:

Ok now I have done several optimizations to the code, which have improved the performance considerable, but it's still not enough when I have multiple windows open. (I put const and pass by reference whereever it made sense, as for the signals/slots, using QueuedConnection(Threads) apparently it doesn't make any difference if passed by ref or value, as Qt will internally convert passed by ref to pass by value when ConnectionType is set to QueuedConnection))

void RT_Main::handleProcessedLogs(const QMap<QString, QList<QList<QString>>> rows) const
{
	QElapsedTimer timer;
	timer.start();
	for (auto it = rows.cbegin(); it != rows.cend(); ++it)
	{
		const RT_Client& client = modell.getConstClient(it.key());
		if (client.live())
		{
			client.addLog(it.value());
		}
		if (client.toParent())
		{
			const auto& groupClient = modell.getConstGroupClient(client.getGroupName());
			if (groupClient.live())
			{
				groupClient.addLog(it.value());
			}
			if (groupClient.toParent())
			{
				const auto& globalClient = modell.getConstClient(RT_Constants::GLOBAL);
				globalClient.addLog(it.value());
			}
		}
		if (client.toGlobal())
		{
			const auto& globalClient = modell.getConstClient(RT_Constants::GLOBAL);
			globalClient.addLog(it.value());
		}
	}
	qDebug() << "handleProcessedLogs: " << timer.nsecsElapsed() / 1000 << "micro seconds";
}

This is the main GUI Window's function, which processed the readied logs and adds them to the appropriate views.
The measurements show, with 20 clients visible and every 20ms 4 logs arriving yields a UI Time of 1-2 ms

The logs are prepared in a worker thread like this:

void RT_CaptureWorker::signalLogsReady()
{
	if(!entries.isEmpty())
	{
		emit logsProcessed(entries);
		entries.clear();
	}
}

void RT_CaptureWorker::handleLog(const QString clientName, const QString groupName, const REP::PROTO::LogMsg & msg)
{
	if (entries.contains(clientName))
	{
		entries[clientName].append(CommonLogUtils::prepareLogEntry(clientName, groupName, msg));
	}
	else
	{
		entries.insert(clientName, QList<QList<QString>>{CommonLogUtils::prepareLogEntry(clientName, groupName, msg)});
	}
}

The CaptureWorker stores logs its readied in an internal container and when a signal to update arrives, signals the UI function mentioned above.

The UpdateWorker looks like this:

inline void RT_UpdateWorker::startUpdater()
{
	while(true)
	{
		emit signalUpdate();
		QThread::msleep(200);
	}
}

Basically, it sends every 200ms a signal to update to the capture worker, which in turn, if it has something to update, signals the UI Thread handleProcessedLogs.

The Connections for the CaptureThread and UpdateWorker look like this (in the Constructor of the RApplication class):

	{
		thread = new QThread();
		worker = new RT_CaptureWorker();
		worker->moveToThread(thread);
		connect(this, &RApplication::logReceived, worker, &RT_CaptureWorker::handleLog);
		connect(this, &RApplication::binaryLogReceived, worker, &RT_CaptureWorker::handleBinaryLog);
		thread->start();
		updaterThread = new QThread();
		updateWorker = new RT_UpdateWorker();
		updateWorker->moveToThread(updaterThread);
		connect(this, &RApplication::startUpdater, updateWorker, &RT_UpdateWorker::startUpdater);
		connect(updateWorker, &RT_UpdateWorker::signalUpdate, worker, &RT_CaptureWorker::signalLogsReady, Qt::ConnectionType::QueuedConnection);
		updaterThread->start();
		emit startUpdater();
		m_timestamp = CL::monoClk().now();
	}

Also, I've changed the model to perform bulk row updates, like this:

bool CustomTableModel::appendRow(const QList<QList<QString>> & rows)
{
	if(listOfEntries.size() >= logHistoryLimit)
	{
		beginRemoveRows(QModelIndex(), 0, rows.size());
		listOfEntries.removeFirst();
		endRemoveRows();
	}
	beginInsertRows(QModelIndex(), listOfEntries.size(), listOfEntries.size() + rows.size());
	for(const auto& row : rows)
	{
		listOfEntries.append(row);
	}
	endInsertRows();
	return true;
}

The performance is pretty good from my measurements with QElapsedTimer, as mentioned, at 20 clients, with logs arriving at 20ms intervalls, the UI Time (ie. the itme handleProcessedLogs CPU Time) fluctuates between 1 and 2 ms.

The performance problem of multiple visible windows remains, albeit not anymore as terrible as it was before. Now I can comfortable switch between the Windows, access UI Elements etc., but when I move the window, the lag is still considerable,also when I try to scroll the QTreeView, it's lagging still.

I ran a profiling session with Visual Studio and it reports to me that the slowest path is within the Qt5GUI.dll and Qt5Widgets.dll. Thus, I arrived at the conclusion that the problem must be within the Qt Frameworks drawing engine.

The Question now is: How can I optimize View updates to solve this performance issue? Do I have to subclass paint() function to achieve that or is there a better solution? Would using QGraphicsView instead of QTreeView help in any way?

Thanks for your help in advance!

SGaist

When dealing with queued connection Qt will indeed do copies, however you should always code to minimise copies anyway.

On the small things that you should also take into account you should avoid calling the same methods several times. For example in appendRow you call listOfEntries.size() three times where two would be enough.

Another thing that may help: optimise your views to to only show what make sense e.g. only what is visible on screen or have a fixed number of entries visible at any time. This will lower the number of queries done to your models.

Zmote89

Another thing that may help: optimise your views to to only show what make sense e.g. only what is visible on screen or have a fixed number of entries visible at any time. This will lower the number of queries done to your models.

Could you elaborate on this? How would I go about doing that?

Furthermore, I know that copying is bad, but I don't see how I would go about using Qt's Threads any other way, using DirectConnection causes the function to run in the callers thread, not in the callees. I'll adjust some of the connection, where it doesn't make much difference (ie. quick operations), but yeah, could you elaborate on this as well, what alternatives I'd have besides QueuedConnection when using Threads.

Thanks for the help, it's very much appreciated!

SGaist

Sorry I wasn't clear, what I meant is to always code with minimal copy in mind Currently, AFAIK, you can't avoid this copy however that doesn't mean you shouldn't use e.g. const QString &myString as parameter signature there might be other optimisation going on that you might loose otherwise. The fact that Qt does a deep copy when using queued connections should rather be considered an implementation detail although you have to take it into account when using hight volume of data.

So by all mean, don't try to use direct connections by hand, you'll likely shoot yourself in the foot at some point without even noticing.

One other thing you can do since you are using a QTreeView is to set uniformRowHeights to true if it's the case, you might gain some milliseconds again.

I just thought of something: how much memory are you consuming ? With that many models being update so frequently you might also be hitting the swap at some point.

Zmote89

@SGaist

So I measured memory consumption over 1 minute with 20 Clients (Windowed), it starts about 50mb and htis ~ 400mb after 1 minute. There will be a logLimit, which per requirement, is to be 10'000 logs per Model, which should stabilize memory consumption at some point.

I took snapshots of the heap:
alt text

What exactly is that swaping issue? Could you elaborate?

Furthermore, I've already activated setUniformRowHeights, so I think I'm good there. I'm currently trying to optimize to const ref's where possible.

Is there any other way to increase view update speed? Like for threads, you can set QThread::currentThread()->setPriority(QThread::HighPriority) to prioritize a thread, is there a setting like that for views? Does Qt not provide ways to write your own Views or it's update cycle?

Thanks for all the help, I've achieved significant performance boost already, but this 20 Windows lagging thing is still present and it could be a deal breaker for my project, so I'm all ears for any further tipps, even unorthodox ones! :)

SGaist

The swap partition is used when application are starting to use up all amiable memory, the system then start to swap in/out memory pages between the RAM and the partition on the hard drive.

By the way, why a QTreeView ? Are you using any feature from it ?

Zmote89

@SGaist
Ah ok, now I get what you mean with swap.

I used QTreeView because 1) It seems to perform better, 2) the default look works in my favor (I don't want the borders of the table).

As my current system has active 12GB ram, that shouldn't be that much of a problem. Furthermore, I once ran the application to 100'000 log entries for each of the 20 Clients and it went up to 2GB~. With the logLimit of 10'000 logs per Model memory shouldn't become much of a problem.

SGaist

Wouldn't a QListView do what you want ?

Zmote89

@SGaist

As far as I know, QListView doesn't support columns and I need columns.

SGaist

For filtering or just for formatting ?

Zmote89

@SGaist

Filtering and formatting. Some columns need to be able to be turned off and on and the "log table" needs to be able to be filter according to columns.

SGaist

In your custom model, are you returning the minimal from your data method ?

Zmote89

@SGaist

Tha's how my current data function looks in my model:

QVariant CustomTableModel::data(const QModelIndex & index, int role) const
{
	if (!index.isValid())
		return QVariant();

	if (index.row() >= listOfEntries.size() || index.row() < 0)
		return QVariant();

	if (role == Qt::DisplayRole) {
		if (index.column() < 0 || index.column() > 6)
			return QVariant();
		if(index.column() == 0)
		{
			return QString::number(index.row());
		}
		return listOfEntries.at(index.row()).at(index.column()-1);
	}
	return QVariant();
}

SGaist

Looks good.

IIRC, you can avoid the QString::number call and just return index.row() or something along the lines of:

QVariant CustomTableModel::data(const QModelIndex & index, int role) const
{
    if (!index.isValid() || role != Qt::DisplayRole)
        return QVariant();

    const int column = index.column();
    const int row = index.row();
    if (column < 0 || column > 6)
        return QVariant();
    if(column == 0) {
        return row;
    } else {
        return listOfEntries.at(row).at(column-1);
    }
    return QVariant();
}

Zmote89

For those who are interested, I now how a working solution, thanks to the help of my advisors at the HSR.

As the main problem lay with the updating of multiple views, I first tried to change updates to bulk updates, with an updater thread signaling when to update the GUI.
Meaning:

A CaptureWorker in a Thread gathered logs until it was signalled by
A UpdateWorker, which in a set amount of time, sent an "update" signal to the CaptureWorker, which in turn signaled, with the logs,
RT_Main, ie., my MainWindow's handleProcessedLogs

This meant, that for example every 200ms multiple clients where updated at once.

The problem that issued from this solution, was that too many update view signals where created at the same time, which froze the GUI.

The new solution basically spreads the updates in the given interval on a per client basis, ie. there are still 20 Clients updated within the 200ms interval, but now they are progressively updated during the 200ms intervall and not at once every 200ms. The current solution looks like this:

Main Window:

void RT_Main::handleProcessedLogs(const QString & clientName, const QList<QList<QString>> & rows) const
{
	handleClientLog(modell.getConstClient(clientName), rows);
}

void RT_Main::handleClientLog(const RT_Client& client, const QList<QList<QString>>& logs) const
{
	if (client.live())
	{
		client.addLog(logs);
	}
	if (client.toFile() || client.toRingFile())
	{
		signalToFileLog(logs, client);
	}
	if (client.getClientId() != RT_Constants::GLOBAL_ID && client.toParent())
	{
		handleClientLog(modell.getConstGroupClient(client.getGroupName()), logs);
	}
	if (client.toGlobal())
	{
		modell.globalClient.addLog(logs);
	}
}

void RT_Main::adjustUpdateInterval()
{
	int windowCount = modell.getWindowedClientsCount();
	modell.updateInterval(windowCount > 1 ?
		RT_Constants::baseUpdateInterval + windowCount * (RT_Constants::baseUpdateInterval / 4)
		: RT_Constants::baseUpdateInterval);
}

The adjustUpdateIntervall is called whenever a window is created or closed, there is a base Interval, ie. the lowest update frequency, whcih is at 200ms when there are no additional log windows, and gets increased with every new window (ie. UpdateWorker's signal interval ranges from 200ms to 1250ms at 0 - 20 Windows). The modell.updateInterval is thread-safe, ie. Mutex-Locked for writing. Maybe I should do one for reading as well?

My CaptureWorker, where logs are processed and gathered until and update signal arrives:

void RT_CaptureWorker::signalLogsReady()
{
	if(!entries.isEmpty())
	{
		int sleepTime = modell.updateInterval() / entries.keys().count();
		for (const auto & key : entries.keys()) {
			emit logsProcessed(key, *entries.constFind(key));
			QThread::msleep(sleepTime);
		}
		entries.clear();
	}
}

void RT_CaptureWorker::handleLog(const QString & clientName, const QString & groupName, const REP::PROTO::LogMsg & msg)
{
	if (entries.contains(clientName))
	{
		entries[clientName].append(CommonLogUtils::prepareLogEntry(clientName, groupName, msg));
	}
	else
	{
		entries.insert(clientName, QList<QList<QString>>{CommonLogUtils::prepareLogEntry(clientName, groupName, msg)});
	}
}

The UpdateWorker, which in given intervals, signals the CaptureWorker to update:

inline void RT_UpdateWorker::startUpdater()
{
	while(true)
	{
		emit signalUpdate();
		QThread::msleep(modell.updateInterval());
	}
}

I hope this "approach" may be an inspiration for others who struggled with a similar problem. I'm open for any further improvements!