Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. QRegExp to parse a CSV file
Forum Updated to NodeBB v4.3 + New Features

QRegExp to parse a CSV file

Scheduled Pinned Locked Moved Solved General and Desktop
10 Posts 5 Posters 782 Views 3 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Offline
    M Offline
    Merlino
    wrote on last edited by Merlino
    #1

    Hello,

    I'm trying to use a regular expression to parse a simple CSV file which has this form:

    01;3.6.1;A;C;HELLO;1: quit;UINT8;N.A.;0.7;4.5;"Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua."
    03;5.4.2;F;K;GOODBYE;0: stay;UINT8;N.A.;0.0;1.2;Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.
    

    I've found this reg exp:

    (\;|\n|^)(?:"([^"]*(?:""[^"]*)*)"|([^"\;\n]*))
    

    I have tested it here regexr.com and it does the job.

    const QRegExp regExp("(\\;|\\n|^)(?:""([^\"]*(?:\"\"[^\"]*)*)\"|([^\"\\;\\n]*))");
    
    if (!regExp.isValid())
      qDebug() << "Regular expression error " << regExp.errorString();
    
    QString line = csvFile.readLine();
    QStringList fields = line.split(regExp);
    

    But when I run it in my code, only a list of empty string (in wrong number) is returned.

    Can anybody tell me why?

    Gojir4G Pablo J. RoginaP 3 Replies Last reply
    0
    • M Merlino

      @aha_1980 I have changed my code with QRegularExpression, but the problem is still present

      aha_1980A Offline
      aha_1980A Offline
      aha_1980
      Lifetime Qt Champion
      wrote on last edited by
      #8

      Hi @Merlino,

      for a start, try this:

      #include <QDebug>
      #include <QRegularExpression>
      
      int main(int argc, char *argv[])
      {
          const QString s = R"(
      01;3.6.1;A;C;HELLO;1: quit;UINT8;N.A.;0.7;4.5;"Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.
      03;5.4.2;F;K;GOODBYE;0: stay;UINT8;N.A.;0.0;1.2;Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.
      )";
          const QRegularExpression regExp(R"x((\;|\n|^)(?:"([^"]*(?:""[^"]*)*)"|([^"\;\n]*)))x");
      
          QRegularExpressionMatchIterator matchIt = regExp.globalMatch(s);
          while (matchIt.hasNext()) {
              const QRegularExpressionMatch match = matchIt.next();
              qDebug() << match.capturedTexts();
          }
      
          return  0;
      }
      

      You will need to fine-tune it, but it goes in the correct direction.

      Output:

      ("\n01", "\n", "", "01")
      (";3.6.1", ";", "", "3.6.1")
      (";A", ";", "", "A")
      (";C", ";", "", "C")
      (";HELLO", ";", "", "HELLO")
      (";1: quit", ";", "", "1: quit")
      (";UINT8", ";", "", "UINT8")
      (";N.A.", ";", "", "N.A.")
      (";0.7", ";", "", "0.7")
      (";4.5", ";", "", "4.5")
      (";", ";", "", "")
      ("\n03", "\n", "", "03")
      (";5.4.2", ";", "", "5.4.2")
      (";F", ";", "", "F")
      (";K", ";", "", "K")
      (";GOODBYE", ";", "", "GOODBYE")
      (";0: stay", ";", "", "0: stay")
      (";UINT8", ";", "", "UINT8")
      (";N.A.", ";", "", "N.A.")
      (";0.0", ";", "", "0.0")
      (";1.2", ";", "", "1.2")
      (";Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.", ";", "", "Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.")
      ("\n", "\n", "", "")
      

      Regards

      Qt has to stay free or it will die.

      1 Reply Last reply
      4
      • M Merlino

        Hello,

        I'm trying to use a regular expression to parse a simple CSV file which has this form:

        01;3.6.1;A;C;HELLO;1: quit;UINT8;N.A.;0.7;4.5;"Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua."
        03;5.4.2;F;K;GOODBYE;0: stay;UINT8;N.A.;0.0;1.2;Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.
        

        I've found this reg exp:

        (\;|\n|^)(?:"([^"]*(?:""[^"]*)*)"|([^"\;\n]*))
        

        I have tested it here regexr.com and it does the job.

        const QRegExp regExp("(\\;|\\n|^)(?:""([^\"]*(?:\"\"[^\"]*)*)\"|([^\"\\;\\n]*))");
        
        if (!regExp.isValid())
          qDebug() << "Regular expression error " << regExp.errorString();
        
        QString line = csvFile.readLine();
        QStringList fields = line.split(regExp);
        

        But when I run it in my code, only a list of empty string (in wrong number) is returned.

        Can anybody tell me why?

        Gojir4G Offline
        Gojir4G Offline
        Gojir4
        wrote on last edited by Gojir4
        #2

        @Merlino Hi, probably because QRegExp is not fully perl regular expression compliant. I guess it should work with QRegularExpression. Other possibility is that you have different configuration (multiline, global, case sensitivity)

        edit: see note from https://doc.qt.io/qt-5/qregexp.html#details

        Note: In Qt 5, the new QRegularExpression class provides a Perl compatible implementation of regular expressions and is recommended in place of QRegExp.

        1 Reply Last reply
        2
        • M Merlino

          Hello,

          I'm trying to use a regular expression to parse a simple CSV file which has this form:

          01;3.6.1;A;C;HELLO;1: quit;UINT8;N.A.;0.7;4.5;"Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua."
          03;5.4.2;F;K;GOODBYE;0: stay;UINT8;N.A.;0.0;1.2;Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.
          

          I've found this reg exp:

          (\;|\n|^)(?:"([^"]*(?:""[^"]*)*)"|([^"\;\n]*))
          

          I have tested it here regexr.com and it does the job.

          const QRegExp regExp("(\\;|\\n|^)(?:""([^\"]*(?:\"\"[^\"]*)*)\"|([^\"\\;\\n]*))");
          
          if (!regExp.isValid())
            qDebug() << "Regular expression error " << regExp.errorString();
          
          QString line = csvFile.readLine();
          QStringList fields = line.split(regExp);
          

          But when I run it in my code, only a list of empty string (in wrong number) is returned.

          Can anybody tell me why?

          Gojir4G Offline
          Gojir4G Offline
          Gojir4
          wrote on last edited by
          #3

          @Merlino Can't you simply use line.split(";") ?

          M 1 Reply Last reply
          3
          • Gojir4G Gojir4

            @Merlino Can't you simply use line.split(";") ?

            M Offline
            M Offline
            Merlino
            wrote on last edited by
            #4

            @Gojir4 no because the string fields can contain punctuation and quotation marks so the simple split would be fooled.

            aha_1980A Gojir4G 2 Replies Last reply
            0
            • M Merlino

              @Gojir4 no because the string fields can contain punctuation and quotation marks so the simple split would be fooled.

              aha_1980A Offline
              aha_1980A Offline
              aha_1980
              Lifetime Qt Champion
              wrote on last edited by
              #5

              Hi @Merlino,

              use QRegularExpression, please. QRegExp is deprecated since 2012 and will be removed from Qt6.

              Regards

              Qt has to stay free or it will die.

              M 1 Reply Last reply
              4
              • M Merlino

                @Gojir4 no because the string fields can contain punctuation and quotation marks so the simple split would be fooled.

                Gojir4G Offline
                Gojir4G Offline
                Gojir4
                wrote on last edited by
                #6

                @Merlino I see, so make a global match and iterate on the results to fill your QStringList. You regex is already doing the "splitting" job

                1 Reply Last reply
                0
                • aha_1980A aha_1980

                  Hi @Merlino,

                  use QRegularExpression, please. QRegExp is deprecated since 2012 and will be removed from Qt6.

                  Regards

                  M Offline
                  M Offline
                  Merlino
                  wrote on last edited by
                  #7

                  @aha_1980 I have changed my code with QRegularExpression, but the problem is still present

                  aha_1980A 1 Reply Last reply
                  0
                  • M Merlino

                    @aha_1980 I have changed my code with QRegularExpression, but the problem is still present

                    aha_1980A Offline
                    aha_1980A Offline
                    aha_1980
                    Lifetime Qt Champion
                    wrote on last edited by
                    #8

                    Hi @Merlino,

                    for a start, try this:

                    #include <QDebug>
                    #include <QRegularExpression>
                    
                    int main(int argc, char *argv[])
                    {
                        const QString s = R"(
                    01;3.6.1;A;C;HELLO;1: quit;UINT8;N.A.;0.7;4.5;"Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.
                    03;5.4.2;F;K;GOODBYE;0: stay;UINT8;N.A.;0.0;1.2;Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.
                    )";
                        const QRegularExpression regExp(R"x((\;|\n|^)(?:"([^"]*(?:""[^"]*)*)"|([^"\;\n]*)))x");
                    
                        QRegularExpressionMatchIterator matchIt = regExp.globalMatch(s);
                        while (matchIt.hasNext()) {
                            const QRegularExpressionMatch match = matchIt.next();
                            qDebug() << match.capturedTexts();
                        }
                    
                        return  0;
                    }
                    

                    You will need to fine-tune it, but it goes in the correct direction.

                    Output:

                    ("\n01", "\n", "", "01")
                    (";3.6.1", ";", "", "3.6.1")
                    (";A", ";", "", "A")
                    (";C", ";", "", "C")
                    (";HELLO", ";", "", "HELLO")
                    (";1: quit", ";", "", "1: quit")
                    (";UINT8", ";", "", "UINT8")
                    (";N.A.", ";", "", "N.A.")
                    (";0.7", ";", "", "0.7")
                    (";4.5", ";", "", "4.5")
                    (";", ";", "", "")
                    ("\n03", "\n", "", "03")
                    (";5.4.2", ";", "", "5.4.2")
                    (";F", ";", "", "F")
                    (";K", ";", "", "K")
                    (";GOODBYE", ";", "", "GOODBYE")
                    (";0: stay", ";", "", "0: stay")
                    (";UINT8", ";", "", "UINT8")
                    (";N.A.", ";", "", "N.A.")
                    (";0.0", ";", "", "0.0")
                    (";1.2", ";", "", "1.2")
                    (";Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.", ";", "", "Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.")
                    ("\n", "\n", "", "")
                    

                    Regards

                    Qt has to stay free or it will die.

                    1 Reply Last reply
                    4
                    • JonBJ Offline
                      JonBJ Offline
                      JonB
                      wrote on last edited by JonB
                      #9

                      I haven't tested what the regular expressions do, but you might want to augment your test case to include a string value which itself has embedded " or ; characters --- if you intend to support those.

                      1 Reply Last reply
                      0
                      • M Merlino

                        Hello,

                        I'm trying to use a regular expression to parse a simple CSV file which has this form:

                        01;3.6.1;A;C;HELLO;1: quit;UINT8;N.A.;0.7;4.5;"Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua."
                        03;5.4.2;F;K;GOODBYE;0: stay;UINT8;N.A.;0.0;1.2;Lorem ipsum dolor sit amet, consectetur adipisci elit, sed do eiusmod tempor incidunt ut labore et dolore magna aliqua.
                        

                        I've found this reg exp:

                        (\;|\n|^)(?:"([^"]*(?:""[^"]*)*)"|([^"\;\n]*))
                        

                        I have tested it here regexr.com and it does the job.

                        const QRegExp regExp("(\\;|\\n|^)(?:""([^\"]*(?:\"\"[^\"]*)*)\"|([^\"\\;\\n]*))");
                        
                        if (!regExp.isValid())
                          qDebug() << "Regular expression error " << regExp.errorString();
                        
                        QString line = csvFile.readLine();
                        QStringList fields = line.split(regExp);
                        

                        But when I run it in my code, only a list of empty string (in wrong number) is returned.

                        Can anybody tell me why?

                        Pablo J. RoginaP Offline
                        Pablo J. RoginaP Offline
                        Pablo J. Rogina
                        wrote on last edited by
                        #10

                        @Merlino said in QRegExp to parse a CSV file:

                        simple CSV file

                        If the file conforms to such format, you should have one and only one marker as field separator. Originally a comma (so the name) but later on some other character (that obviously cannot be part of the field values...)

                        @Gojir4
                        Can't you simply use line.split(";") ?

                        Yes you can. @Gojir4 provided the right answer I guess. From you data example, in your case it seems to be a SCSV file indeed: a semi-colon separated values.

                        @Merlino about 2 hours ago
                        no because the string fields can contain punctuation and quotation marks so the simple split would be fooled.

                        Yes, you'll have punctation and quotation marks in the string fields, but I bet none of such characters will be a semi-colon (;)

                        It looks like you're over-complicating your use case.

                        Upvote the answer(s) that helped you solve the issue
                        Use "Topic Tools" button to mark your post as Solved
                        Add screenshots via postimage.org
                        Don't ask support requests via chat/PM. Please use the forum so others can benefit from the solution in the future

                        1 Reply Last reply
                        1

                        • Login

                        • Login or register to search.
                        • First post
                          Last post
                        0
                        • Categories
                        • Recent
                        • Tags
                        • Popular
                        • Users
                        • Groups
                        • Search
                        • Get Qt Extensions
                        • Unsolved