QRegularExpressionMatch



  • Hi all,

    I'm trying to search through a text file that contains some data embedded in a bunch of garbage I don't want, something like this:

    garbage garbage
    garbage [garbage] garbage
    garbage
    garbage garbage [more garbage] {the, variable, data, that, I, want}
    garbage...

    and so on. (Don't know if it matters, but note that nothing follows the closing } bracket on the lines that have data.)

    I've succeeded in using a QRegularExpression using the curly brackets to find the lines that contain the data I want and appending the whole line to a QStringList, but how can I append just the data between the {} brackets?

    QRegularExpressionMatch isn't working for me (so far) because the data is variable and the match seems to be limited to an exact string - no wildcards allowed - or am I wrong?

    I've thought about iterating a second time through the captured stringList and do a split on the bracket, but that seems like double work when the regular expression has already found what I want.

    If anyone can guide me to something that might work, I'll be happy to do the homework!

    Thanks!



  • @MScottM said in QRegularExpressionMatch:

    the match seems to be limited to an exact string - no wildcards allowed - or am I wrong?

    If that was true then regex would be pretty useless ;-)

    Regex patterns are just that: Patterns. You need to have some sort of idea how the data looks like (the format) before you can build a regex pattern that allows you to extract it. Right now it seems like all you know is that the data you're after is between curly brackets and that the closing curly bracket is always at the end of a line - that is already a start. But what do you now about the data between the curly brackets?
    I'm really lacking the information about what the data between the garbage is to give a better answer. Is the format of said data known and you just need certain values of it or do you need to extract just the entire string within the curly brackets?

    As you mentioned yourself: Doing a second regex matching is a waste of resources. There's nothing the second match can do that the first one can't do.

    In general I can really recommend using a tool named RegexBuddy - it saved my butt many times: https://www.regexbuddy.com
    It allows you to assemble patterns even without knowing all of the regex syntax and it provides a great testing facility. It also comes with a very extensive set of regex patterns for "common" tasks.



  • Hi Joel, thanks for your reply!

    The data between the curly brackets has known names, but appears in random locations between the brackets, with varying data, like this:

    {Instrument1=nn.nn, Instrument9=nnn.nnnn, Instrument3=n.n}
    {Instrument4=n.nnn, Instrument1=nn.nn, Instrument7=-nn.nnn, Instrument2=nn.nn}
    and so on...

    The number of reporting instruments goes into several dozens. The numbers reported for any one instrument (n.n) can change how many decimal places are reported.

    I was vaguely thinking that if I could extract out everything between the curly brackets, I could maybe turn it into a JSON object...? Haven't gotten that far yet. My ultimate goal is to be able to pick an instrument and graph its data over time (part of the garbage in the file is a timestamp...so probably not garbage after all :)

    I'm obviously an inexperience programmer, still learning the best ways of going about things - I appreciate any input!

    -Scott



  • Hi Scott,

    This is something that is definitely possible (using a regex). Regex supports loops and stuff (repeating capturing groups).
    I'm happy to help with this when I get a few minutes but it will be very difficult without a certain set of sample data. Is that something you can share/publish?

    I didn't really try this but something like this should already match one of the instruments: Instrument(\d+)=(-?)(\d+\.\d+) so it's a matter of putting together a capturing group of that and repeating it.


  • Lifetime Qt Champion

    Hi,

    To add to @Joel-Bodenmann, can you show an example of what you want to grab ? That would help devise a proper expression.

    Note that there's the QRegularExpression tool example that you can use to validate that your expression is working.



  • Here are two example lines containing data. I had to sanitize it for posting - the instrument names are actually very specific and have mostly to do with pressures and temperatures - changing the pattern for the specific names I think I can handle. The amount of data in a line (number of reporting instruments) can actually be much longer - I shortened them for brevity:

    2017-10-19 14:08:58,325 TRACE [Publisher] C.DataPublisher [DataPublisher.java:159] Publishing {Instrument12=-900.0, Instrument0=82.4, Instrument11=-900.0, Instrument1=131.16875000000005, Instrument2=1.0, Instrument13=0.0, Instrument3=120.091247064, Instrument6=-0.9, Instrument22, Instrument4=83.91875000000005, Instrument12Status=1.0, Instrument13Status=1.0}
    2017-10-19 14:08:58,825 TRACE [Publisher] C.DataPublisher [DataPublisher.java:159] Publishing {Instrument0=82.4, Instrument11=-900.0, Instrument1=135.33125000000007, Instrument3=120.091247064, Instrument4=83.91875000000005, Instrument22, Instrument23=-900.0, Instrument31=-900.0}

    Each new line starts with a date and timestamp.

    -Scott



  • If they are not nested (i.e. you never have {Instrument0=82.4,{Instrument11=-900.0, Instrument1=135.33125000000007}, Instrument4=83.91875000000005}) then you can use:

    const QString input = "2017-10-19 14:08:58,325 TRACE [Publisher] C.DataPublisher [DataPublisher.java:159] Publishing {Instrument12=-900.0, Instrument0=82.4, Instrument11=-900.0, Instrument1=131.16875000000005, Instrument2=1.0, Instrument13=0.0, Instrument3=120.091247064, Instrument6=-0.9, Instrument22, Instrument4=83.91875000000005, Instrument12Status=1.0, Instrument13Status=1.0}"
        "2017-10-19 14:08:58,825 TRACE [Publisher] C.DataPublisher [DataPublisher.java:159] Publishing {Instrument0=82.4, Instrument11=-900.0, Instrument1=135.33125000000007, Instrument3=120.091247064, Instrument4=83.91875000000005, Instrument22, Instrument23=-900.0, Instrument31=-900.0}";
        const QRegularExpression filterJunk(QStringLiteral("{(.+?)}"));
        const QRegularExpression instrumentRegExp(QStringLiteral(R"**(,?\s*Instrument(\d+)(?:Status)?\s*(?:=\s*([-+]?\d*\.?\d+(?:[eE][-+]?\d+)?))?)**"),QRegularExpression::CaseInsensitiveOption);
        auto i = filterJunk.globalMatch(input);
        while (i.hasNext()) {
            auto j= instrumentRegExp.globalMatch(i.next().capturedRef(1));
            while (j.hasNext()) {
                const auto match = j.next();
                QString result  = QStringLiteral("Instrument: %1").arg(match.capturedRef(1).toInt());
                if(match.lastCapturedIndex()>1)
                result  += QStringLiteral(" Value: %1").arg(match.capturedRef(2).toDouble());
                qDebug() << result ;
            }
        }
    


  • VRonin strikes again :D



  • For sure! Thanks VRonin.

    A couple more questions - is it possible to pass in a variable as part of the regex matching line?

    Also I don't seem to have the QRegularExpression Tool example (mentioned above). I'm using: Qt Creator 4.2.1 Based on Qt 5.8.0 (MSVC 2015, 32 bit)

    Do I need to update my IDE to get it?

    -Scott



  • is it possible to pass in a variable as part of the regex matching line?

    can you explain better? maybe with an example?

    Also I don't seem to have the QRegularExpression Tool example (mentioned above).

    I use https://regex101.com/ same principle, better colours and allows you to unit-test too



  • Sorry, yes - I mean in the regular expression line:

    instrumentRegExp(R"(,?\sInstrument(\d+)(?:Status)?\s(?:=\s*([-+]?\d*.?\d+(?:[eE][-+]?\d+)?))?)**"

    where its looking for an Instrument, is there a way to do something like this:

    instrumentRegExp(R"(,?\sMyStringVariableHere(\d+)(?:Status)?\s(?:=\s*([-+]?\d*.?\d+(?:[eE][-+]?\d+)?))?)**"

    Regarding regular expression testing, I have realized that my text editor supports regex searching. I think it would be just a few tweaks to move them from there to QT.

    Thanks again!



  • @MScottM said in QRegularExpressionMatch:

    where its looking for an Instrument, is there a way to do something like this:
    instrumentRegExp(R"(,?\sMyStringVariableHere(\d+)(?:Status)?\s(?:=\s*([-+]?\d*.?\d+(?:[eE][-+]?\d+)?))?)**"

    Well, just insert them as a variable to a QString before you pass the QString to QRegularExpression().

    QString myString("Static text with %1 some %2 variables").arg("foo").arg("bar");
    


  • Dang...geniuses...all of you.

    Thanks! Will be testing tonight.



  • Side note:

    @Joel-Bodenmann Let's stare into the abyss of arg().arg(): http://doc.qt.io/qt-5/qstring.html#arg-12



  • @VRonin That was interesting - didn't know about that one :p


  • Qt Champions 2016

    You're a overplaying it a bit, aren't you? :)
    I mean I knew of that pitfall, but there's nothing wrong in using arg().arg() as long as you're aware of this peculiarity, right?



  • Warning: philosophical talk below.

    I'm actually surprised to hear this from you, @kshegunov. I used arg above so I'm not 100% dogmatic on it but lets take a good look at it:

    arg is a search and replace so it's slower than QString::operator+ (if you concatenate a lot you can use operator%(const QString&,const QString&) in QStringBuilder). This should be reason enough not to use it. If you add the arg().arg() pitfall, that applies to any string you don't have complete control over all the aspect of input, it makes it almost evil.

    The only reason to use arg() I can see is internationalisation, but, even then, you have to take care of just using arg(const QString&) as numbers shown to the user should be processed by QLocale and be mindful that if you call arg more than once on the same string you are opening yourself to a translator possibly introducing what looks like a logic bug in your code



  • Success!!

    This piece of code is working now! Nearly exactly as VRonin posted...Plus I learned something from the discussion above: I have a lot to learn :)

    I wanted to ask about the capitol R that starts off the RegEx string - I figured out that it lets you use the single escape characters, but I couldn't find it documented anywhere...?

    Thanks to all and best regards!

    Scott



  • @MScottM said in QRegularExpressionMatch:

    I wanted to ask about the capitol R that starts off the RegEx string - I figured out that it lets you use the single escape characters, but I couldn't find it documented anywhere...?

    https://en.wikipedia.org/wiki/C%2B%2B11#New_string_literals


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.