Opening a large CSV file in a QTableWidget is really slow.



  • Hello everybody;

    I have a Qt application fully working. One of the tasks that the app does is opening and displaying CSV files in a QTableWidget. Although the app doesn't crash with large files, it becomes really slow.

    I've tried to open these files also with OpenOffice, and he also takes his time (of course not as much as my app)

    I'm wondering if this can be optimized, or it's just that large files requires more time to be proccessed.

    The file I'm experiencing this lag with has 10749 rows and 265 columns, in 4.2Mb.

    I leave here my code snippet to manage the file:

    @
    void CsvViewer::openCsvFile ()
    {
    QMessageBox msgBox ( this );
    msgBox.setStandardButtons( QMessageBox::NoButton);
    msgBox.setWindowTitle( "Loading..." );
    msgBox.setText( "Loading CSV file..." ); // This text doesn't appear when the app freezes...

    QObject *signalEmitter = sender();
    if(signalEmitter == ui->actionOpen )
    {
        _openedFile = QFileDialog::getOpenFileName (0, "Open CSV file",QDir::currentPath(),"CSV Files(*.csv)");
    
        if (!_openedFile.endsWith (".csv"))
        {
            QMessageBox::warning (0,"Error","Selected file is not a CSV File","Ok");
            return;
        }
    }
    
    QFile file (_openedFile);
    
    msgBox.show ();
    
    if (file.open(QIODevice::ReadOnly | QIODevice::Text))
    {
        QString data = file.readAll();
        file.close ();
        data.remove( QRegExp("\r") );
        QChar character;
        QTextStream textStream(&data);
    
        while (!textStream.atEnd())
        {
            textStream >> character;
    
            if (character == ';')
            {
                _cellData<<_readBuffer;               
                _readBuffer.clear();                        
                _countCells++;
            }
            else if (character == '\n')
            {
                _cellData<<_readBuffer;              
                _readBuffer.celar();                        
                _countCells++;
                _countRows++;
            }
            else if (textStream.atEnd())
            {
                _readBuffer.append(character);       
            }
            else
                _readBuffer.append(character);      
        }
    }
    else
        return;
    
    
    ui->tableWidget->setRowCount          ( _countRows );
    ui->tableWidget->setColumnCount    ( (_countCells/_countRows) );
    
    
    for(unsigned int r = 0; r < _countRows; r++)
        for(unsigned int c = 0; c < (_countCells/_countRows); c++)
            ui->tableWidget->setItem(r, c, new QTableWidgetItem(_cellData[c + (_countCells/_countRows) * r]));
    
    msgBox.setAttribute( Qt::WA_DeleteOnClose ); 
    msgBox.close ();
    

    }
    @

    If there's no solution to this, I'm trying to show a QMessageBox warning the user to wait til the file is loaded (I've tried QProgressDialog with no success, it was making the app even slower). But I get no text in the box, as the whole app is freezing while loading the file. Only get the windowTitle.

    Thanks for help, really appreciated.


  • Moderators

    Hi,

    I haven't done any benchmarking, but I believe that reading 1 QChar at a time is inefficient.

    Try QString::split() (it also makes your code a lot simpler):

    @
    QString data = file.readAll();
    file.close ();

    QStringList rowData = data.split('\n');
    _countRows = rowData.count();

    for (const QString& row : rowData)
    {
    QStringList cells = row.split(',');
    _countCells += cells.count();

    // TODO: Store these strings in the table
    

    }
    @

    Note: data.remove( QRegExp("\r") ); is not necessary. Since you opened the QFile with QIODevice::Text, Qt automatically removes all \r characters when you read the file.



  • What JKSH says is absolutely true. Reading char by char (or byte by byte) is the slowest you can be. You should read in buffer chunks that match your filesystem for best performance or at least read line by line.

    Doing file.readAll() into a QString could be equally inefficient if you had a really large file. Since you have no idea what the file size could be what if someone gave you a 1gb csv file to your program. Doing file.readAll() would not be happy. And then when you added it to your view it would essentially double the memory used by your app.

    Also doing a QString::split() on a large QString would be super slow as well.

    Your best bet is reading in line by line. It is not the most optimized for speed but is a good balance between performance and memory usage.

    Something like:

    @
    while (!file.atEnd())
    {
    QString line = file.readLine();
    // process csv line here, as JKSH says using split is great for this
    QStringList tokens = line.split(',');
    // now you have a list of each item in the csv, you will need to handle
    }
    @

    In the example above make sure to handle commas in the data. They are usually quoted or escaped. That is beyond the scope of the question though.

    Finally, it could be the view that is being slow. There are ways to deal with views of large datasets that are highly optimized. Adding a ton of data to a single GUI view will make things crawl. You can add "windows" into the data, having just the data in the viewport rendered and part of the object. This will make it a ton faster. It can get complicated though.


  • Moderators

    Very good points. Thanks, ambershark!



  • Use QTableView and a custom model, not QTableWidget for large data sets.



  • Ok, thank you all.

    I will make all the changes that you suggest, and let you know if I get better results...



  • Hi again;

    I've made all the changes without any success. Well, with reading line by line and spliting the code is much cleaner and elegant than before, that's for sure. But in terms of effiency, I'm less than a second faster, which is not an improvement at all.

    Concerning the Model-based table, this 5.7 Mb CSV-file is taking 11 seconds with the QTableWidget, and 34 with the QTableView and QStandardItemModel,... so definitely, I don't see how this can solve the problem...

    And agreeing with ambershark, is populating the view which is being slow (Commenting out the for-loop eliminates the delay)... I do not see how to optimize it.

    And also I still having this issue with the QMessageBox, that doesn't show the text in it, just the window title.

    Leave the fixed code here for you to have a look, thanks again for your help and tips:

    In the constructor:
    @
    model = new QStandardItemModel();
    ui->tableView->setModel(model);
    @

    The function:
    @
    void CsvViewer::openCsvFile ()
    {
    QMessageBox msgBox ( this );
    msgBox.setStandardButtons( QMessageBox::NoButton);
    msgBox.setWindowTitle( "Loading..." );
    msgBox.setText( "Loading CSV file..." ); // This text doesn't appear when the app freezes...

        QObject *signalEmitter = sender();
        if(signalEmitter == ui->actionOpen )
        {
            _openedFile = QFileDialog::getOpenFileName (0, "Open CSV file",QDir::currentPath(),"CSV Files(*.csv)");
     
        if (!_openedFile.endsWith (".csv"))
            {
                QMessageBox::warning (0,"Error","Selected file is not a CSV File","Ok");
                return;
            }
        }
     
        QFile file &#40;_openedFile&#41;;
     
        msgBox.show ();
     
    if (_openedFile.open(QIODevice::ReadOnly | QIODevice::Text))
    {
        while (!_openedFile.atEnd())
        {
            QString line = _openedFile.readLine();
            line.remove( QRegExp("\n") );
            _cellData.append (line.split(';'));
            _countRows++;
        }
        _countCols =  _cellData.count () / _countRows;
    }
    else
        return;
     
     
        model->setRowCount ( _countRows );
        model->setColumnCount ( _countCols );
     
     
        for(unsigned int r = 0; r < _countRows; r++)
            for(unsigned int c = 0; c < _countCols; c++)
                model->setItem(r, c, new QStandardItem(_cellData[c + (_countCols) * r]));
     
        msgBox.setAttribute( Qt::WA_DeleteOnClose );
        msgBox.close ();
    }
    

    @

    Thanks. Best regards.



  • @pepita how was _cellData, _countRows and _countCols declared?


  • Qt Champions 2016

    @pepita
    Hello, I'll pitch in with some basic suggestions:
    Firstly do not use QStandardItemModel, instead subclass the QAbstractItemModel class and do your processing in the select() also employing the signals for beginning/ending insertion of rows/columns to have the process better optimized.

    for(unsigned int r = 0; r < _countRows; r++)
        for(unsigned int c = 0; c < _countCols; c++)
            model->setItem(r, c, new QStandardItem(_cellData[c + (_countCols) * r]));
    

    I'm pretty sure that these two take most of the time and not the string splitting. Imagine the amount of allocations you're doing for such a dataset! On each new allocation the OS will go and try to find free memory to put your object in the heap, the shear number of objects will make this slow.

    Additionally, If you are really after speed, you can consider threading the processing. For example you could start a single thread that will read the file and put the lines in a thread safe queue, and have 2-3 threads (depending on the number of cores) process a chunk of for example 100 rows per thread at a time. If the order is important you still are going to need to put a barrier so the worker threads will be providing the output in the order of the input, but you will get better performance.

    Additionally, and probably most importantly, do not go through the data multiple times. You in fact don't need to know the number of rows beforehand, do you? Just go through the data set once and put all the data in the model.

    I hope these pointers help.
    Kind regards.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.