Qt Forum

    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Unsolved

    Solved Get all urls in a text file

    General and Desktop
    3
    10
    2555
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Y
      yodusow bardon last edited by yodusow bardon

      I have a text file with texts and urls, I want to get all the urls in that file, how can I do that using Qt?

      1 Reply Last reply Reply Quote 0
      • mrjj
        mrjj Lifetime Qt Champion last edited by mrjj

        hi
        is it just a list of urls or is the url mixed with other type of text?
        Are you asking how you can parse them or how you would read the text file?
        Can you show some lines from the file?
        You can read all url lines by line this way

        QFile inputFile(fileName);
        if (inputFile.open(QIODevice::ReadOnly))
        {
           QTextStream in(&inputFile);
           while (!in.atEnd())
           {
              QString line = in.readLine();
              ...
           }
           inputFile.close();
        }
        
        1 Reply Last reply Reply Quote 1
        • Y
          yodusow bardon last edited by

          It's a mixed text with urls. I think that the way you pointed out might have performance problem, am I wrong?

          mrjj 1 Reply Last reply Reply Quote 0
          • mrjj
            mrjj Lifetime Qt Champion @yodusow bardon last edited by

            well it reads one line at a time if that is what you mean.
            but it all depends how your text file is structured.
            if text are not neatly on lines (\n), reading it as lines is pointless.

            Y 1 Reply Last reply Reply Quote 1
            • SGaist
              SGaist Lifetime Qt Champion last edited by

              Hi,

              You can also load the content of your file completely and then run a search through it using QRegularExpression

              Interested in AI ? www.idiap.ch
              Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

              1 Reply Last reply Reply Quote 1
              • Y
                yodusow bardon @mrjj last edited by yodusow bardon

                @mrjj Actually the application doesn't need to know if it has lines or not, I just need to get all the links.

                I think that I will use regex: https://gist.github.com/dperini/729294

                @SGaist I saw your answer before posting, but yes, I think that in this case it's better to use regex.

                mrjj 1 Reply Last reply Reply Quote 0
                • mrjj
                  mrjj Lifetime Qt Champion @yodusow bardon last edited by

                  @yodusow-bardon
                  Ok, so its like a dump.
                  That is one nice RegularExpression ;)

                  Y 1 Reply Last reply Reply Quote 0
                  • Y
                    yodusow bardon @mrjj last edited by

                    @mrjj I just realized that this one isn't working with Qt. I'm getting a warning:

                    QRegularExpressionPrivate::doMatch(): called on an invalid QRegularExpression object

                    I will try to find other like this or make this one to work. - If you have one, I will accept too. haha.

                    mrjj 1 Reply Last reply Reply Quote 0
                    • mrjj
                      mrjj Lifetime Qt Champion @yodusow bardon last edited by

                      @yodusow-bardon
                      Hi
                      The actual expression should still work with the QRegularExpression Class ?
                      seems just to add strings using + to make it more readable.
                      "(?:(?:https?|ftp)://)" + "(?:\S+(?::\S*)?@)?" ...
                      so you can easy convert to Qt , i think.
                      or?

                      Y 1 Reply Last reply Reply Quote 0
                      • Y
                        yodusow bardon @mrjj last edited by

                        @mrjj That is how I'm doing it:

                        QRegularExpression re(
                          "^"
                          // protocol identifier
                          "(?:(?:https?|ftp)://)"
                          // user:pass authentication
                          "(?:\\S+(?::\\S*)?@)?"
                          "(?:"
                          // IP address exclusion
                          // private & local networks
                          "(?!(?:10|127)(?:\\.\\d{1,3}){3})"
                          "(?!(?:169\\.254|192\\.168)(?:\\.\\d{1,3}){2})"
                          "(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})"
                          // IP address dotted notation octets
                          // excludes loopback network 0.0.0.0
                          // excludes reserved space >= 224.0.0.0
                          // excludes network & broacast addresses
                          // (first & last IP address of each class)
                          "(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])"
                          "(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}"
                          "(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))"
                          "|"
                          // host name
                          "(?:(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)"
                          // domain name
                          "(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)*"
                          // TLD identifier
                          "(?:\\.(?:[a-z\\u00a1-\\uffff]{2,}))"
                          // TLD may end with dot
                          "\\.?"
                          ")"
                          // port number
                          "(?::\\d{2,5})?"
                          // resource path
                          "(?:[/?#]\\S*)?"
                          "$"
                        );
                          
                        re.setPatternOptions(QRegularExpression::MultilineOption |
                                           QRegularExpression::DotMatchesEverythingOption |
                                           QRegularExpression::CaseInsensitiveOption);
                        
                        auto match = re.match(text);
                        if ( match.hasMatch()) {
                          qDebug() << match.captured(0);
                        } else {
                          qDebug() << "Nothing found";
                        }
                        
                        1 Reply Last reply Reply Quote 2
                        • First post
                          Last post