Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Parse CSV with comma inside fields
Forum Updated to NodeBB v4.3 + New Features

Parse CSV with comma inside fields

Scheduled Pinned Locked Moved General and Desktop
11 Posts 5 Posters 12.7k Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • L Offline
    L Offline
    luca
    wrote on last edited by
    #1

    Hi all,
    I have a csv file that can contains some of the following lines:
    @
    "123","ciao"
    34,"ciao mondo"
    "12345","ciao, mondo"
    @

    It's easy to parse the firsts two lines.

    The problem is with the third one.
    If I split the string using comma character I have a problem with "ciao, mondo"...

    Have you got some suggestions...?

    1 Reply Last reply
    0
    • G Offline
      G Offline
      goetz
      wrote on last edited by
      #2

      Real world CSV files are very hard to parse, given all the exceptions and quirks. Maybe the "QT4 CSV file reader":http://sourceforge.net/projects/qcsv/ project can help you here.

      PS: a tag search on "CSV":/search/tag/csv could bring you some other hints too, I didn't check.

      http://www.catb.org/~esr/faqs/smart-questions.html

      1 Reply Last reply
      0
      • L Offline
        L Offline
        luca
        wrote on last edited by
        #3

        [quote author="Volker" date="1338284853"]Real world CSV files are very hard to parse, given all the exceptions and quirks. Maybe the "QT4 CSV file reader":http://sourceforge.net/projects/qcsv/ project can help you here.

        PS: a tag search on "CSV":/search/tag/csv could bring you some other hints too, I didn't check.[/quote]

        I already tried it but it doesn't works as expected, I cant find documentation and I'm not very familiar with regular expressions.
        For exmple if I try this:
        @
        QString str = ""1234","aswere"";
        CSV csv(str);
        qDebug() << csv.parseLine();
        @
        I get an empty list...

        1 Reply Last reply
        0
        • L Offline
          L Offline
          luca
          wrote on last edited by
          #4

          I find this regular expression that seems to works in some cases:
          @
          QString str = ""1234","asw,ere"";
          QRegExp rx("(?:^|,)(\"(?:[^\"]+|\"\")\"|[^,])");

          int pos = 0;
          int count =0;
           while ((pos = rx.indexIn(str, pos)) != -1) {
          
               pos += rx.matchedLength();
               qDebug() << rx.cap(count);
               ++count;
           }
          

          @
          This way I get:
          "1234"
          "asw,ere"

          so it works.

          Now the problem is with numeric fields as in this example:
          @
          "1234","asw,ere",23.34
          @

          1 Reply Last reply
          0
          • D Offline
            D Offline
            DerManu
            wrote on last edited by
            #5

            Why not just write a parser yourself. I.e. a function that walks character by character and keeps track of the state.

            So it has a bool variable called "inQuote" and if it encounters a quote character, it flips the inQuote value. If it encounters a comma, it only sees it as a field separator if inQuote is false. That's it. should be no more than... 15 lines or so. And that will be way faster and more flexible than applying a regex.

            1 Reply Last reply
            0
            • G Offline
              G Offline
              goetz
              wrote on last edited by
              #6

              DerManu is right. On a quick thought, I wouldn't say that using a regex does catch all possible corner cases of parsing a CSV. Despite being cumbersome to write and maintain.

              http://www.catb.org/~esr/faqs/smart-questions.html

              1 Reply Last reply
              0
              • L Offline
                L Offline
                luca
                wrote on last edited by
                #7

                I know the possibility to parse "by hand" my string but because of CSV is a kind of "standard" I hoped someone already solved my problem with regular expressions... :-)

                1 Reply Last reply
                0
                • C Offline
                  C Offline
                  chrismit7
                  wrote on last edited by
                  #8

                  Since there doesn't really appear to be a standard delimiter in use, you could try the lowest common denominator approach. Keep a count of the # of items resulting from a comma split, if you encounter a new line with fewer splits, reiterate over the data container to make those entries have the same # of entries, and then continue parsing.

                  1 Reply Last reply
                  0
                  • S Offline
                    S Offline
                    Skyrim
                    wrote on last edited by
                    #9

                    hi, Luca try this
                    I read csv in QTableWidget
                    @
                    QFile file("file_csv.csv");
                    QStringList listA;
                    int row = 0;
                    if (file.open(QIODevice::ReadOnly)){
                    while (!file.atEnd()){
                    QString line = file.readLine();
                    listA = line.split(",");
                    ui->listWidget->addItems(listA);
                    ui->spinBox_col->setValue(listA.size());
                    ui->tableWidget->setColumnCount(listA.size());
                    ui->tableWidget->insertRow(row);
                    for (int x = 0; x < listA.size(); x++){
                    QTableWidgetItem *test = new QTableWidgetItem(listA.at(x));
                    ui->tableWidget->setItem(row, x, test);
                    }
                    row++;
                    }
                    }
                    file.close();
                    @

                    1 Reply Last reply
                    0
                    • D Offline
                      D Offline
                      DerManu
                      wrote on last edited by
                      #10

                      Luca: No, Regular Expressions are the wrong tool here, that's why nobody has done it (or was sucessful with it). CSV files with quoting/escaping have a Chomsky type 2 grammar but regular expressions can only work on languages with Chomsky type 3 grammars. Hence it can not work. (If you're not familiar with the terminology, it means that all thinkable regular expressions will be still too dumb to parse CSV.) And even if you dumbed down your CSV quoting rules (e.g. quotes only allowed at field boundaries), your regular expression would become incredibly ugly and thus non-readable for others (or yourself in six months). Do yourself a favor and write a small parser :).

                      Skyrim: Your provided code doesn't work. It will break on @"12345","ciao, mondo"@ for example

                      1 Reply Last reply
                      0
                      • L Offline
                        L Offline
                        luca
                        wrote on last edited by
                        #11

                        [quote author="Skyrim" date="1338737081"]hi, Luca try this
                        I read csv in QTableWidget
                        [/quote]

                        Thanks Skyrim but as DerManu wrote, It will break on
                        @
                        "12345","ciao, mondo"
                        @

                        DerManu, I didn't know nothing about "Chomsky grammars". Thanks for describing me this.

                        So as you said, the only solution will be to parse by hand my CSV?

                        1 Reply Last reply
                        0

                        • Login

                        • Login or register to search.
                        • First post
                          Last post
                        0
                        • Categories
                        • Recent
                        • Tags
                        • Popular
                        • Users
                        • Groups
                        • Search
                        • Get Qt Extensions
                        • Unsolved