Split a QString with a regexp and keep the seperators
-
wrote on 10 Nov 2012, 11:45 last edited by
Hello,
Is there a way to use QString's split method with multiple seperators and keep the seperators at the end of each string in the resulting QStringList? For example
@
QString = aLongString;
QRegExp sep("(\ |\.|\?)");
QStringList stringList = aLongString.split(sep);
@Thanks in advance.
-
wrote on 10 Nov 2012, 21:13 last edited by
Hi, ~panosk!
-
QString::split(..) has overload for QRegExp. So, of course you can separate the string using regular expressions
-
Append separator to the end of the substring block is impossible using default QString::split(..) functionality. You need more complicated algorithm to do it. I prefer this one:
Found all occurences of separators (in your case -
sep
)-Insert afther each separator one QChar with MAX_INT code-
-Split QString with QChar(MAX_INT)-
Just push_back into QStringList pieces you got
Smth like that.
UPD: I won't give advices while I am tired. I won't give advices while I am tired. I won't give advices while I am tired. I won't give advices while I am tired. I won't give advices while I am tired.
-
-
wrote on 10 Nov 2012, 22:23 last edited by
Tucnak, why don't you just put the individual parts into a QStringList while searching for all occurrences?
Inserting into a string is a comparatively expensive operation: All the chars following after the insertion point need to be moved.
-
wrote on 10 Nov 2012, 22:26 last edited by
[quote author="Tobias Hunger" date="1352586191"]Tucnak, why don't you just put the individual parts into a QStringList while searching for all occurrences?
Inserting into a string is a comparatively expensive operation: All the chars following after the insertion point need to be moved.[/quote]
Thanks, ~Tobias. I am really tired so wrote stupid advice like this one. Of course you are right.
-
wrote on 10 Nov 2012, 22:50 last edited by
Thank you both for your replies. Tobias, can you elaborate a little bit? If I get it right, the general idea is to create an empty QStringList and start appending substrings based on the separators? In that case, I suppose I will have to work with indexes and ranges?
-
wrote on 10 Nov 2012, 23:07 last edited by
panosk: Yeap, you got that right.
-
wrote on 10 Nov 2012, 23:22 last edited by
OK, thanks a lot. Still, it would be nice to have a KeepSeparator option in QString::split(). Such an option, along with the existing KeepEmptyParts, would make splitting and rejoining strings even more convenient in a non-destructive manner :)
-
wrote on 11 Nov 2012, 08:17 last edited by
panosk: I really do not see the use case. You know the separator, otherwise you would not be able to split.
If you don't then you are better off parsing the string properly.
I assume you are still trying to parse class="whatever" from HTML? That is something that will go very wrong using RegExps, so do not do that. There are lots of ways a regexp-based approach will break down here.
-
wrote on 11 Nov 2012, 11:02 last edited by
@Tobias. My problem is that I have to use many separators and not only one. In the snippet I wrote in my first post (ignore the white space, I included it for variety's shake), the string will be splitted as expected, but then I cannot reconstruct it because the separators are lost so, for example, I don't know which strings end in a full stop or in a question mark.
I'm trying to achieve some sort of plain text sentence tokenization. I would never use regexps for parsing HTML or XML -- I always prefer a parser in such cases.
Eventually I will have to build a proper tokenizer, but it's not a priority right now so I'm trying to find the most convenient way to do it.
1/9