Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. QRegExp to parse a CSV file
Forum Updated to NodeBB v4.3 + New Features

QRegExp to parse a CSV file

Scheduled Pinned Locked Moved Solved General and Desktop
10 Posts 5 Posters 742 Views 3 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Offline
    M Offline
    Merlino
    wrote on 17 Sept 2020, 10:08 last edited by Merlino
    #1

    Hello,

    I'm trying to use a regular expression to parse a simple CSV file which has this form:

    01;3.6.1;A;C;HELLO;1: quit;UINT8;N.A.;0.7;4.5;"Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua."
    03;5.4.2;F;K;GOODBYE;0: stay;UINT8;N.A.;0.0;1.2;Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.
    

    I've found this reg exp:

    (\;|\n|^)(?:"([^"]*(?:""[^"]*)*)"|([^"\;\n]*))
    

    I have tested it here regexr.com and it does the job.

    const QRegExp regExp("(\\;|\\n|^)(?:""([^\"]*(?:\"\"[^\"]*)*)\"|([^\"\\;\\n]*))");
    
    if (!regExp.isValid())
      qDebug() << "Regular expression error " << regExp.errorString();
    
    QString line = csvFile.readLine();
    QStringList fields = line.split(regExp);
    

    But when I run it in my code, only a list of empty string (in wrong number) is returned.

    Can anybody tell me why?

    G P 3 Replies Last reply 17 Sept 2020, 10:13
    0
    • M Merlino
      17 Sept 2020, 10:36

      @aha_1980 I have changed my code with QRegularExpression, but the problem is still present

      A Offline
      A Offline
      aha_1980
      Lifetime Qt Champion
      wrote on 17 Sept 2020, 10:51 last edited by
      #8

      Hi @Merlino,

      for a start, try this:

      #include <QDebug>
      #include <QRegularExpression>
      
      int main(int argc, char *argv[])
      {
          const QString s = R"(
      01;3.6.1;A;C;HELLO;1: quit;UINT8;N.A.;0.7;4.5;"Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.
      03;5.4.2;F;K;GOODBYE;0: stay;UINT8;N.A.;0.0;1.2;Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.
      )";
          const QRegularExpression regExp(R"x((\;|\n|^)(?:"([^"]*(?:""[^"]*)*)"|([^"\;\n]*)))x");
      
          QRegularExpressionMatchIterator matchIt = regExp.globalMatch(s);
          while (matchIt.hasNext()) {
              const QRegularExpressionMatch match = matchIt.next();
              qDebug() << match.capturedTexts();
          }
      
          return  0;
      }
      

      You will need to fine-tune it, but it goes in the correct direction.

      Output:

      ("\n01", "\n", "", "01")
      (";3.6.1", ";", "", "3.6.1")
      (";A", ";", "", "A")
      (";C", ";", "", "C")
      (";HELLO", ";", "", "HELLO")
      (";1: quit", ";", "", "1: quit")
      (";UINT8", ";", "", "UINT8")
      (";N.A.", ";", "", "N.A.")
      (";0.7", ";", "", "0.7")
      (";4.5", ";", "", "4.5")
      (";", ";", "", "")
      ("\n03", "\n", "", "03")
      (";5.4.2", ";", "", "5.4.2")
      (";F", ";", "", "F")
      (";K", ";", "", "K")
      (";GOODBYE", ";", "", "GOODBYE")
      (";0: stay", ";", "", "0: stay")
      (";UINT8", ";", "", "UINT8")
      (";N.A.", ";", "", "N.A.")
      (";0.0", ";", "", "0.0")
      (";1.2", ";", "", "1.2")
      (";Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.", ";", "", "Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.")
      ("\n", "\n", "", "")
      

      Regards

      Qt has to stay free or it will die.

      1 Reply Last reply
      4
      • M Merlino
        17 Sept 2020, 10:08

        Hello,

        I'm trying to use a regular expression to parse a simple CSV file which has this form:

        01;3.6.1;A;C;HELLO;1: quit;UINT8;N.A.;0.7;4.5;"Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua."
        03;5.4.2;F;K;GOODBYE;0: stay;UINT8;N.A.;0.0;1.2;Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.
        

        I've found this reg exp:

        (\;|\n|^)(?:"([^"]*(?:""[^"]*)*)"|([^"\;\n]*))
        

        I have tested it here regexr.com and it does the job.

        const QRegExp regExp("(\\;|\\n|^)(?:""([^\"]*(?:\"\"[^\"]*)*)\"|([^\"\\;\\n]*))");
        
        if (!regExp.isValid())
          qDebug() << "Regular expression error " << regExp.errorString();
        
        QString line = csvFile.readLine();
        QStringList fields = line.split(regExp);
        

        But when I run it in my code, only a list of empty string (in wrong number) is returned.

        Can anybody tell me why?

        G Offline
        G Offline
        Gojir4
        wrote on 17 Sept 2020, 10:13 last edited by Gojir4
        #2

        @Merlino Hi, probably because QRegExp is not fully perl regular expression compliant. I guess it should work with QRegularExpression. Other possibility is that you have different configuration (multiline, global, case sensitivity)

        edit: see note from https://doc.qt.io/qt-5/qregexp.html#details

        Note: In Qt 5, the new QRegularExpression class provides a Perl compatible implementation of regular expressions and is recommended in place of QRegExp.

        1 Reply Last reply
        2
        • M Merlino
          17 Sept 2020, 10:08

          Hello,

          I'm trying to use a regular expression to parse a simple CSV file which has this form:

          01;3.6.1;A;C;HELLO;1: quit;UINT8;N.A.;0.7;4.5;"Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua."
          03;5.4.2;F;K;GOODBYE;0: stay;UINT8;N.A.;0.0;1.2;Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.
          

          I've found this reg exp:

          (\;|\n|^)(?:"([^"]*(?:""[^"]*)*)"|([^"\;\n]*))
          

          I have tested it here regexr.com and it does the job.

          const QRegExp regExp("(\\;|\\n|^)(?:""([^\"]*(?:\"\"[^\"]*)*)\"|([^\"\\;\\n]*))");
          
          if (!regExp.isValid())
            qDebug() << "Regular expression error " << regExp.errorString();
          
          QString line = csvFile.readLine();
          QStringList fields = line.split(regExp);
          

          But when I run it in my code, only a list of empty string (in wrong number) is returned.

          Can anybody tell me why?

          G Offline
          G Offline
          Gojir4
          wrote on 17 Sept 2020, 10:16 last edited by
          #3

          @Merlino Can't you simply use line.split(";") ?

          M 1 Reply Last reply 17 Sept 2020, 10:22
          3
          • G Gojir4
            17 Sept 2020, 10:16

            @Merlino Can't you simply use line.split(";") ?

            M Offline
            M Offline
            Merlino
            wrote on 17 Sept 2020, 10:22 last edited by
            #4

            @Gojir4 no because the string fields can contain punctuation and quotation marks so the simple split would be fooled.

            A G 2 Replies Last reply 17 Sept 2020, 10:24
            0
            • M Merlino
              17 Sept 2020, 10:22

              @Gojir4 no because the string fields can contain punctuation and quotation marks so the simple split would be fooled.

              A Offline
              A Offline
              aha_1980
              Lifetime Qt Champion
              wrote on 17 Sept 2020, 10:24 last edited by
              #5

              Hi @Merlino,

              use QRegularExpression, please. QRegExp is deprecated since 2012 and will be removed from Qt6.

              Regards

              Qt has to stay free or it will die.

              M 1 Reply Last reply 17 Sept 2020, 10:36
              4
              • M Merlino
                17 Sept 2020, 10:22

                @Gojir4 no because the string fields can contain punctuation and quotation marks so the simple split would be fooled.

                G Offline
                G Offline
                Gojir4
                wrote on 17 Sept 2020, 10:26 last edited by
                #6

                @Merlino I see, so make a global match and iterate on the results to fill your QStringList. You regex is already doing the "splitting" job

                1 Reply Last reply
                0
                • A aha_1980
                  17 Sept 2020, 10:24

                  Hi @Merlino,

                  use QRegularExpression, please. QRegExp is deprecated since 2012 and will be removed from Qt6.

                  Regards

                  M Offline
                  M Offline
                  Merlino
                  wrote on 17 Sept 2020, 10:36 last edited by
                  #7

                  @aha_1980 I have changed my code with QRegularExpression, but the problem is still present

                  A 1 Reply Last reply 17 Sept 2020, 10:51
                  0
                  • M Merlino
                    17 Sept 2020, 10:36

                    @aha_1980 I have changed my code with QRegularExpression, but the problem is still present

                    A Offline
                    A Offline
                    aha_1980
                    Lifetime Qt Champion
                    wrote on 17 Sept 2020, 10:51 last edited by
                    #8

                    Hi @Merlino,

                    for a start, try this:

                    #include <QDebug>
                    #include <QRegularExpression>
                    
                    int main(int argc, char *argv[])
                    {
                        const QString s = R"(
                    01;3.6.1;A;C;HELLO;1: quit;UINT8;N.A.;0.7;4.5;"Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.
                    03;5.4.2;F;K;GOODBYE;0: stay;UINT8;N.A.;0.0;1.2;Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.
                    )";
                        const QRegularExpression regExp(R"x((\;|\n|^)(?:"([^"]*(?:""[^"]*)*)"|([^"\;\n]*)))x");
                    
                        QRegularExpressionMatchIterator matchIt = regExp.globalMatch(s);
                        while (matchIt.hasNext()) {
                            const QRegularExpressionMatch match = matchIt.next();
                            qDebug() << match.capturedTexts();
                        }
                    
                        return  0;
                    }
                    

                    You will need to fine-tune it, but it goes in the correct direction.

                    Output:

                    ("\n01", "\n", "", "01")
                    (";3.6.1", ";", "", "3.6.1")
                    (";A", ";", "", "A")
                    (";C", ";", "", "C")
                    (";HELLO", ";", "", "HELLO")
                    (";1: quit", ";", "", "1: quit")
                    (";UINT8", ";", "", "UINT8")
                    (";N.A.", ";", "", "N.A.")
                    (";0.7", ";", "", "0.7")
                    (";4.5", ";", "", "4.5")
                    (";", ";", "", "")
                    ("\n03", "\n", "", "03")
                    (";5.4.2", ";", "", "5.4.2")
                    (";F", ";", "", "F")
                    (";K", ";", "", "K")
                    (";GOODBYE", ";", "", "GOODBYE")
                    (";0: stay", ";", "", "0: stay")
                    (";UINT8", ";", "", "UINT8")
                    (";N.A.", ";", "", "N.A.")
                    (";0.0", ";", "", "0.0")
                    (";1.2", ";", "", "1.2")
                    (";Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.", ";", "", "Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.")
                    ("\n", "\n", "", "")
                    

                    Regards

                    Qt has to stay free or it will die.

                    1 Reply Last reply
                    4
                    • JonBJ Offline
                      JonBJ Offline
                      JonB
                      wrote on 17 Sept 2020, 11:26 last edited by JonB
                      #9

                      I haven't tested what the regular expressions do, but you might want to augment your test case to include a string value which itself has embedded " or ; characters --- if you intend to support those.

                      1 Reply Last reply
                      0
                      • M Merlino
                        17 Sept 2020, 10:08

                        Hello,

                        I'm trying to use a regular expression to parse a simple CSV file which has this form:

                        01;3.6.1;A;C;HELLO;1: quit;UINT8;N.A.;0.7;4.5;"Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua."
                        03;5.4.2;F;K;GOODBYE;0: stay;UINT8;N.A.;0.0;1.2;Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.
                        

                        I've found this reg exp:

                        (\;|\n|^)(?:"([^"]*(?:""[^"]*)*)"|([^"\;\n]*))
                        

                        I have tested it here regexr.com and it does the job.

                        const QRegExp regExp("(\\;|\\n|^)(?:""([^\"]*(?:\"\"[^\"]*)*)\"|([^\"\\;\\n]*))");
                        
                        if (!regExp.isValid())
                          qDebug() << "Regular expression error " << regExp.errorString();
                        
                        QString line = csvFile.readLine();
                        QStringList fields = line.split(regExp);
                        

                        But when I run it in my code, only a list of empty string (in wrong number) is returned.

                        Can anybody tell me why?

                        P Offline
                        P Offline
                        Pablo J. Rogina
                        wrote on 17 Sept 2020, 12:24 last edited by
                        #10

                        @Merlino said in QRegExp to parse a CSV file:

                        simple CSV file

                        If the file conforms to such format, you should have one and only one marker as field separator. Originally a comma (so the name) but later on some other character (that obviously cannot be part of the field values...)

                        @Gojir4
                        Can't you simply use line.split(";") ?

                        Yes you can. @Gojir4 provided the right answer I guess. From you data example, in your case it seems to be a SCSV file indeed: a semi-colon separated values.

                        @Merlino about 2 hours ago
                        no because the string fields can contain punctuation and quotation marks so the simple split would be fooled.

                        Yes, you'll have punctation and quotation marks in the string fields, but I bet none of such characters will be a semi-colon (;)

                        It looks like you're over-complicating your use case.

                        Upvote the answer(s) that helped you solve the issue
                        Use "Topic Tools" button to mark your post as Solved
                        Add screenshots via postimage.org
                        Don't ask support requests via chat/PM. Please use the forum so others can benefit from the solution in the future

                        1 Reply Last reply
                        1

                        1/10

                        17 Sept 2020, 10:08

                        • Login

                        • Login or register to search.
                        1 out of 10
                        • First post
                          1/10
                          Last post
                        0
                        • Categories
                        • Recent
                        • Tags
                        • Popular
                        • Users
                        • Groups
                        • Search
                        • Get Qt Extensions
                        • Unsolved