Qt Forum

    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Unsolved

    Call for Presentations - Qt World Summit

    Large text file import into sqlite

    General and Desktop
    3
    4
    7973
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      DaGeek247 last edited by

      So I have been working on a program which imports the contents of a text file into a sqlite database, divided by line. This is my first foray into the wonderful world of QT5, but not my first rodeo with C++. I have spent a goodly amount of time Googling this all, and looking through the documentation. I am not a stranger to working through my own problems and figuring out what is wrong.

      In fact, this program is working. I am not asking for help fixing an error, Google does a dang good job of answering those questions for me. I am asking for your experienced advice, simply because I am still pretty new at this.

      Back to my 'issue'. Like I said, i am using sqlite to store a lot of small amounts of text. <30GB sizes of text. (worst case scenario, but still). The program has one functioning button, which allows a user to import a text file into the database. I have got that to work. My problem is that it can import (on my pretty good computer) about 4 lines of text per second. If i have a a 1.2GB file with >200,000,000 lines of text, this becomes an issue.

      Here is my relevant code;
      @void MainWindow::on_pushButton_clicked() {
      //open personal sqlite database
      QSqlDatabase db = QSqlDatabase::addDatabase("QSQLITE");
      db.setDatabaseName("./data.sqlt");

      //make sure the database opened properly
      if(!db.open()) { //just in case
          qFatal("Failed to connect to the database");
      } else {
          qDebug("Succesfully Connected to the database");
      }
      QSqlQuery qry;
      
      //create database with table if it doesn't exist
      qry.prepare("CREATE TABLE IF NOT EXISTS pw (pass VARCHAR(63), value INT(256))");
      if(!qry.exec&#40;&#41;) {
          qDebug() << qry.lastError();
      } else {
          qDebug() << "Succesfully Connected to the table";
      }
      
      //open dialgue for finding new wordlist to import
      QString database = QFileDialog::getOpenFileName(this, tr("Open Database"),"",tr("Password Database (*.*)"));
      QFile fdatabase(database);
      
      if (!fdatabase.open(QIODevice::ReadOnly | QIODevice::Text)) { //if the file doesn't exist
          qDebug("User cancelled alphabetical wordlist import");
      } else {
          //show the dialogue for choosing the wordlist to import
          QProgressDialog progress("Gathering information...", "Abort Import", 0, 0, this);
              progress.show();
              progress.setWindowModality(Qt::WindowModal);
      
          //count the words in the wordlist
          int numo = 0;
          int numo2 = 100;
          while(!fdatabase.atEnd()) {
              //read the text file
              fdatabase.readLine();
      
              //count the lines of text
              numo++;
      
              //keep the user informed
              if(numo2 == numo) {
                  progress.setLabelText("Gathering information...\nFound " + QString::number(numo) + " new passwords");
                  numo2 += 100;
              }
      
              //ensure the program doesn't freeze
              qApp->processEvents(QEventLoop::ExcludeUserInputEvents);
      
              //the user can cancel importing wordlist if he wants
              if (progress.wasCanceled()) {
                  break; }
          }
      
          //restart the database because i don't know how to read the file again
          fdatabase.close();
          fdatabase.open(QIODevice::ReadOnly | QIODevice::Text);
      
          //Update the progress dialogue and set everything ready for actual import
          QString tnumPass;
          QString xnumPass;
          tnumPass = QString::number(numo);
          progress.setLabelText("Importing 1 of " + tnumPass + " new passwords...");
          progress.setMaximum(numo);
          qDebug() << "Counted" << numo << "passwords in selected file" <<  fdatabase.fileName();
          numo = 0;
      
          //read the wordlist into personal sqlite database
          while(!fdatabase.atEnd()) {
              QString line = fdatabase.readLine();
      
              //show progress
              numo++;
              xnumPass = QString::number(numo);
              progress.setValue(numo);
              progress.setLabelText("Importing " + xnumPass + " of " + tnumPass + " new passwords...\nImporting Password: " + line);
      
              //insert new password
              qry.prepare("INSERT INTO pw (pass, value) values ('" + line + "', '1')");
      
              //if the password was not inserted correctly
              if (!qry.exec&#40;&#41;) {
                  qDebug() << qry.lastError();
              }
      
              //ensure the program doesn't freeze
              qApp->processEvents(QEventLoop::ExcludeUserInputEvents);
      
              //the user can cancel importing wordlist if he wants
              if (progress.wasCanceled()) {
                  break;
              }
          }
      //close the sqlite database
      db.close();
      }
      

      }@

      My question, or plea for advice, then, is this. How do I speed up this process? If I import a 10GB text file, going line by line, at about 4 lines a second into my sqlite database, it will take for ever for it to finish. I can't load it all into memory and then push it to the sqlite database, because I can't guarantee that the end user will have the amount of RAM needed to fit it all.

      Is there a faster way to store this type of data that i am unaware of? Am I doing something, really really, wrong that is causing this to put the data into the sqlite file?

      If you want to compile this yourself, here is the source code (run in debian based linux, on a 3.2ghz dual core AMD processor with a 1TB 7200RPM hard drive) http://www.sendspace.com/file/7t89ta.

      I would love to hear your opinion about what is the best way to work with VERY large text files, and the best way to store them in a easy access format. Any and all contributions would be greatly appreciated.

      1 Reply Last reply Reply Quote 0
      • S
        Santosh Reddy last edited by

        Loading and saving multiple lines will help fast up the process. The question is what would be optimum size of the lines which should be loaded, now this cannot is answered completely, as the file on disk may be fragmented, and reading multiple line may not always help in reading, but loading multiple line in memory will defenetly help in writing to database faster.

        I would say do some trails on different computers and see what numer of lines/memory looks fairly good. As as already said, loading multiple line into memory will help, but again it depends on how fast you could read and write back to disk, as end of the operation each record(line) has to be saved to database file which again is on the same disk. So for large file (if using HDD) you disk head has to switch positions quite often. Also look if you could configure sqlite page/commit size, and I belive you could also manage using a manual commit instead of auto commit so that you have control when data is written to disk.

        SS

        1 Reply Last reply Reply Quote 0
        • D
          DaGeek247 last edited by

          Awesome, I will look into multi-threading with qt. It seems there are some good tutorials / examples already out there. for others googling this, here is one such tutorial. http://www.johanpaul.com/blog/2011/09/easy-threading-with-qtconcurrentrun/

          However, I do not know how I would set up write caching for sql within the qt framework. Do you have any links or search terms that I could use to get that started?

          1 Reply Last reply Reply Quote 0
          • P
            panosk last edited by

            Wrap your insert code inside a transaction and you will see huge speed increase. You should probably put
            @
            db.transaction()
            @

            before the while statement and

            @
            db.commit()
            @

            after the loop.

            I would also suggest to reorganize a bit your code into more functions.

            1 Reply Last reply Reply Quote 0
            • First post
              Last post