Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

Split string every two chara



  • Hi,

    I have a string that looks like this
    6628286628443028289c8368294829282828282828733300

    What would be the best way to split this into string list that would look like this
    66 28 28 66 28 44 30 28 28 9c 83 68 29 48 29 28 28 28 28 28 28 73 33 00
    I need pairs of two strings.

    Tnx,
    Zgembo





  • This post is deleted!


  • @lelev I have found out this way:

    QRegularExpression rx("(..)");
    QRegularExpressionMatchIterator rxIterator = rx.globalMatch(datas.toHex());
    QStringList stringData;
    while (rxIterator.hasNext()) {
    QRegularExpressionMatch match = rxIterator.next();
    QString word = match.captured(1);
    stringData << word;
    }


  • Lifetime Qt Champion

    @zgembo is datas a QByteArray?

    Then simply use datas.toHex(' '); Fastest and easiest.

    Regards

    Edit: Ah, it should be in a string list. No problem, you use:

    const QString s = datas.toHex('  ');
    const QStringList list = s.split(' ');
    

  • Moderators

    Guys, I know regular expressions are sexy an all, but they are terrible terrible performance and memory monsters. Don't use them for simple tasks like splitting arrays. It just hurts eyes to see.

    If you really need the strings a simple for loop is a lot better:

    QByteArray datas_hex = datas.toHex();
    QStringList result;
    result.reserve(datas_hex.size() / 2);
    
    for (int i = 0; i < datas_hex.size(); i += 2)
        result.push_back(QString::fromLatin1(datas_hex.data() + i, 2));
    

    If you can keep the original data around it's even better to not make any copies at all:

    QByteArray datas_hex = datas.toHex();
    QVector<QLatin1String> result;
    result2.reserve(datas_hex.size() / 2);
    
    for (int i = 0; i < datas.size(); i += 2)
        result.push_back(QLatin1String(datas_hex.data() + i, 2));
    

    And if you can use more efficient container it's even better:

    QByteArray datas_hex = datas.toHex();
    std::vector<QLatin1String> result;
    result.reserve(datas_hex.size() / 2);
    
    for (int i = 0; i < datas_hex.size(); i += 2)
        result.emplace_back(datas_hex.data() + i, 2);
    

    I did some timings for you. On a 10Mb data sample on my machine:
    regex: 6572ms
    string copy: 1021ms
    string ref: 155ms
    string ref + std::vector: 105ms
    aha_1980 solution: 1322ms

    Please, please, please mind our battery lives and electrical bills.


  • Lifetime Qt Champion

    @chris-kawa Looking at the OP's last code, datas contain the raw bytes, because they are converted toHex() first. So you will need to adopt your code.

    Regards


  • Moderators

    @aha_1980 Thanks, I missed that. Still, what I said holds. I corrected my post.


  • Lifetime Qt Champion

    hi.
    its pretty hefty difference between regex and string ref. very interesting.


  • Moderators

    @mrjj I consider Regexps good for validating short input data like login forms and the likes. For processing large amounts of data a handcrafted solution, even if you need to add couple of ifs or switches to match the regex is always gonna be a lot faster. They are just too generic to have good performance.


  • Lifetime Qt Champion

    I guess thats the normal trade-off between generality/flexibility and hand made a specific solution.
    It also explains why Qt syntax highlighting gets very heavy with huge files. :)


Log in to reply