How to update a large model efficiently

JonB

@ozcanay
My theory is:

When you edit() directly it is passed an ID to find the row to edit.
That requires a linear search of the base model, which you say is (too) slow.
The QSortFilterProxyModel, assuming it is sorted by row ID, can service a binary search lookup via its row numbers (not those of the source model), which should be "fast".
Once you have found the row index in the proxy model, you can update the source model/make it emit dataChanged() either by using mapToSource() which maps proxy QModelIndexes to source QModelIndexes or by calling the proxy's setData() directly which will use that internally.

ozcanay

As I mentioned before, I was storing "Order" objects in my model. Instead of QSortFilterProxyModel approach, I went for QPersistentModelIndex approach to keep proxy model as thin as possible. I added following data structure to the model (not the proxy):

std::unordered_map<QString, QPair<Order, QPersistentModelIndex>> order_index_map_;

This is how data is stored in the model:

QVector<Order> orders_;

void OrdersModel::process(const Order& order) {
    const auto& order_id = order.order_id_;

    if(order_index_map_.find(order_id) == order_index_map_.end()) {
        beginInsertRows({}, orders_.count(), orders_.count());
        orders_.push_back(order);
        endInsertRows();

        order_index_map_[order_id] = {order, std::move(QPersistentModelIndex(index(orders_.size() - 1, 0)))};
        emit orderEntryAdded(&order);
    } else {
        if(auto it = order_index_map_.find(order_id); it != order_index_map_.end()) {
            const auto p_model_index = it->second.second;
            const int row_index = p_model_index.row();

            orders_[row_index] = order;
            emit dataChanged(index(row_index, 0), index(row_index, column_count_));
        } else {
            qWarning() << "This should not have happened.";
        }
    }
}

This seems to be working so far. However, I am not sure how much faster this version is or even faster at all.

JonB

@ozcanay said in How to update a large model efficiently:

However, I am not sure how much faster this version is or even faster at all.

:) :(

if(order_index_map_.find(order_id) == order_index_map_.end())

} else {
if(auto it = order_index_map_.find(order_id); it != order_index_map_.end()) {

Save one lookup here, assign auto it = order_index_map_.find(order_id) right at the start. Not that I think lookups will be slow.

Go back to basics and find out just what took long in your original code. Did you even test anything? Might be a case of "premature optimization"....

ozcanay

Yes, I realized redundant lookups that you pointed. Another approach I tried:

void OrdersModel::process(const Order& order) {
    const auto& order_id = order.order_id_;

    if(auto it = order_index_map_.find(order_id); it == order_index_map_.end()) {
        auto order_ptr = std::make_shared<Order>(order);
        beginInsertRows({}, orders_.count(), orders_.count());
        orders_.push_back(order_ptr);
        endInsertRows();

        order_index_map_[order_id] = {order_ptr, std::move(QPersistentModelIndex(index(orders_.size() - 1, 0)))};

        emit orderEntryAdded(&order);
    } else {
        *((*it).second.first) = order; 
        int row_index = (*it).second.second.row();
        emit dataChanged(index(row_index, 0), index(row_index, column_count_));
    }
}

with data structures changed to:

QVector<std::shared_ptr<Order>> orders_;
std::unordered_map<QString, QPair<std::shared_ptr<Order>, QPersistentModelIndex>> order_index_map_;

However, this is not fast as well. Data structures have pointers so there is probably performance hit from indirection and also cache locality is probably suffering. But I wanted to give it a try anyways.

I will probably profile the code with linux perf. Also instead of std::unordered_map, I will go for absl::flat_hash_map to see if there are any performance improvements.

JonB

@ozcanay
In outline, I still wonder what evidence you have any of this lookup is a performance bottleneck? Just how much looking up are you doing anyway, thousands/hundreds of thousands per second? What else does you code do which might take time rather than assuming/blaming it on lookup?

I will probably profile the code with linux perf.

Good luck! That (perf) would actually be something you could teach me! :) I tried recently and could not make head nor tail of its output/statistics. In the past I used gprof and was very happy with that, could understand its output fine; perf just gave me headache, with lots of "graphs" and "hotspots" I couldn't fathom... :(

Christian Ehrlicher

Simply holding a QVector<data> and a QHash<ID, int> in the model where the value of the QHash<ID, int> is the index in the vector would have been much easier... but good when only the programmer who did the work understand it - then noone else can replace him :D

ozcanay

"QHash<ID, int> is the index in the vector would have been much easier" -> would that even work? Then why is QPersistentModelIndex needed?

Christian Ehrlicher

@ozcanay said in How to update a large model efficiently:

would that even work?

Why not? You're the owner of the model so you know when you modify the order and can update the hash. Don't see why you need persistent indexes here at all. They're needed for QSFPM to make sure that they're properly mapped to the new index after a sorting/filtering

ozcanay

My worry was that Qt itself under the hood could be messing up with the model indices, changing them from time to time maybe. Especially considering that I wrap my model via QSortFilterProxyModel to filter and sort, I thought that I would need QPersistentModelIndex.

Christian Ehrlicher

@ozcanay said in How to update a large model efficiently:

My worry was that Qt itself under the hood could be messing up with the model indices, changing them from time to time maybe.

Again: you are the owner, you create the indexes, you modify the data - how should Qt be able to modify something in your structures without your knowing?

And QSortFilterProxyModel is there to NOT have the need to modify the source model.

ozcanay

Yes, you are right.

I made the following changes and it works:

QVector<Order> orders_;
std::unordered_map<QString, QPair<Order, std::size_t>> order_index_map_;

void OrdersModel::process(const Order& order) {
    const auto& order_id = order.order_id_;

    if(auto it = order_index_map_.find(order_id); it == order_index_map_.end()) {
        beginInsertRows({}, orders_.count(), orders_.count());
        orders_.push_back(order);
        endInsertRows();

        order_index_map_[order_id] = {order, orders_.size() - 1};

        emit orderEntryAdded(&order);
    } else {
        const int row_index = (*it).second.second;
        orders_[row_index] = order;
        emit dataChanged(index(row_index, 0), index(row_index, column_count_));
    }
}

However, I aim to do this processing in a different thread other than UI thread to not to freeze UI, when there are lots of updates. I want to use QtConcurrent::run for this purpose.

Christian Ehrlicher

@ozcanay said in How to update a large model efficiently:

However, I aim to do this processing in a different thread other than UI

This will not work.

ozcanay

Can you elaborate on why that won't work? If there is heavy processing to do on UI thread what should I do then?

Christian Ehrlicher

@ozcanay said in How to update a large model efficiently:

Can you elaborate on why that won't work?

You must not modify any UI stuff outside the ui (main) thread. emiting dataChanged() is therefore not possible outside the main thread.
Do your calculations outside the main thread, modify the model in the main thread - when modifying a model (which is more or less just a simple copy in your case) will lock your main thread then you're doing something wrong. The only thing I can think of is a too high data rate for your incoming data - but then it's somehow useless to update the model that frequently because noone can see the changes on the ui at all.

ozcanay

@Christian-Ehrlicher
I think you are absolutely right as users won't be even able to see updates processed at this rate. So, I think that I need a way to throttle view updating. I am planning to have a std::set of processed indices, and I will use those indices to emit dataChanged signal every X seconds (or milliseconds) and reset that set at every timer firing. In theory, this should work as sometimes a single order gets updated 100 times in matter of seconds that reforces a repaint on view via dataChanged signal.

Christian Ehrlicher

I would do it in a similar way. Collect the updated data in a separate thread and batch-update the model every second or so. Can be even done in your model with a custom setter for the second thread to collect the data and put it in another container which gets read every second. Don't forget to use a mutex for this container access.

ozcanay

I have overriden timerEvent of the model, and used startTimer() to trigger the timer event every second. I created a public function named acknowledge(...) that just stores the order in a map of format <order_id, order>. If an price update to a order occurs multiple times, only the last update will be taken into consideration. Let's call this map batched_orders. In timer event callback, there is a private method called batchProcess, that basically iterates over batched_orders and processes them. This is not the most ideal solution to the problem (the best solution is what @Christian-Ehrlicher suggested), however, this seems to be doing the job for my application. CPU usage reduced dramatically. @JonB is it possible to change the solution for this post?