Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Split a QString with a regexp and keep the seperators
Forum Updated to NodeBB v4.3 + New Features

Split a QString with a regexp and keep the seperators

Scheduled Pinned Locked Moved General and Desktop
9 Posts 3 Posters 12.1k Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P Offline
    P Offline
    panosk
    wrote on last edited by
    #1

    Hello,

    Is there a way to use QString's split method with multiple seperators and keep the seperators at the end of each string in the resulting QStringList? For example
    @
    QString = aLongString;
    QRegExp sep("(\ |\.|\?)");
    QStringList stringList = aLongString.split(sep);
    @

    Thanks in advance.

    1 Reply Last reply
    0
    • T Offline
      T Offline
      tucnak
      wrote on last edited by
      #2

      Hi, ~panosk!

      1. QString::split(..) has overload for QRegExp. So, of course you can separate the string using regular expressions

      2. Append separator to the end of the substring block is impossible using default QString::split(..) functionality. You need more complicated algorithm to do it. I prefer this one:

      Found all occurences of separators (in your case - sep)

      -Insert afther each separator one QChar with MAX_INT code-

      -Split QString with QChar(MAX_INT)-

      Just push_back into QStringList pieces you got

      Smth like that.

      UPD: I won't give advices while I am tired. I won't give advices while I am tired. I won't give advices while I am tired. I won't give advices while I am tired. I won't give advices while I am tired.

      1 Reply Last reply
      0
      • T Offline
        T Offline
        tobias.hunger
        wrote on last edited by
        #3

        Tucnak, why don't you just put the individual parts into a QStringList while searching for all occurrences?

        Inserting into a string is a comparatively expensive operation: All the chars following after the insertion point need to be moved.

        1 Reply Last reply
        0
        • T Offline
          T Offline
          tucnak
          wrote on last edited by
          #4

          [quote author="Tobias Hunger" date="1352586191"]Tucnak, why don't you just put the individual parts into a QStringList while searching for all occurrences?

          Inserting into a string is a comparatively expensive operation: All the chars following after the insertion point need to be moved.[/quote]

          Thanks, ~Tobias. I am really tired so wrote stupid advice like this one. Of course you are right.

          1 Reply Last reply
          0
          • P Offline
            P Offline
            panosk
            wrote on last edited by
            #5

            Thank you both for your replies. Tobias, can you elaborate a little bit? If I get it right, the general idea is to create an empty QStringList and start appending substrings based on the separators? In that case, I suppose I will have to work with indexes and ranges?

            1 Reply Last reply
            0
            • T Offline
              T Offline
              tobias.hunger
              wrote on last edited by
              #6

              panosk: Yeap, you got that right.

              1 Reply Last reply
              0
              • P Offline
                P Offline
                panosk
                wrote on last edited by
                #7

                OK, thanks a lot. Still, it would be nice to have a KeepSeparator option in QString::split(). Such an option, along with the existing KeepEmptyParts, would make splitting and rejoining strings even more convenient in a non-destructive manner :)

                1 Reply Last reply
                0
                • T Offline
                  T Offline
                  tobias.hunger
                  wrote on last edited by
                  #8

                  panosk: I really do not see the use case. You know the separator, otherwise you would not be able to split.

                  If you don't then you are better off parsing the string properly.

                  I assume you are still trying to parse class="whatever" from HTML? That is something that will go very wrong using RegExps, so do not do that. There are lots of ways a regexp-based approach will break down here.

                  1 Reply Last reply
                  0
                  • P Offline
                    P Offline
                    panosk
                    wrote on last edited by
                    #9

                    @Tobias. My problem is that I have to use many separators and not only one. In the snippet I wrote in my first post (ignore the white space, I included it for variety's shake), the string will be splitted as expected, but then I cannot reconstruct it because the separators are lost so, for example, I don't know which strings end in a full stop or in a question mark.

                    I'm trying to achieve some sort of plain text sentence tokenization. I would never use regexps for parsing HTML or XML -- I always prefer a parser in such cases.

                    Eventually I will have to build a proper tokenizer, but it's not a priority right now so I'm trying to find the most convenient way to do it.

                    1 Reply Last reply
                    0

                    • Login

                    • Login or register to search.
                    • First post
                      Last post
                    0
                    • Categories
                    • Recent
                    • Tags
                    • Popular
                    • Users
                    • Groups
                    • Search
                    • Get Qt Extensions
                    • Unsolved