Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Get all urls in a text file
QtWS25 Last Chance

Get all urls in a text file

Scheduled Pinned Locked Moved Solved General and Desktop
10 Posts 3 Posters 3.2k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Y Offline
    Y Offline
    yodusow bardon
    wrote on last edited by yodusow bardon
    #1

    I have a text file with texts and urls, I want to get all the urls in that file, how can I do that using Qt?

    1 Reply Last reply
    0
    • mrjjM Offline
      mrjjM Offline
      mrjj
      Lifetime Qt Champion
      wrote on last edited by mrjj
      #2

      hi
      is it just a list of urls or is the url mixed with other type of text?
      Are you asking how you can parse them or how you would read the text file?
      Can you show some lines from the file?
      You can read all url lines by line this way

      QFile inputFile(fileName);
      if (inputFile.open(QIODevice::ReadOnly))
      {
         QTextStream in(&inputFile);
         while (!in.atEnd())
         {
            QString line = in.readLine();
            ...
         }
         inputFile.close();
      }
      
      1 Reply Last reply
      1
      • Y Offline
        Y Offline
        yodusow bardon
        wrote on last edited by
        #3

        It's a mixed text with urls. I think that the way you pointed out might have performance problem, am I wrong?

        mrjjM 1 Reply Last reply
        0
        • Y yodusow bardon

          It's a mixed text with urls. I think that the way you pointed out might have performance problem, am I wrong?

          mrjjM Offline
          mrjjM Offline
          mrjj
          Lifetime Qt Champion
          wrote on last edited by
          #4

          well it reads one line at a time if that is what you mean.
          but it all depends how your text file is structured.
          if text are not neatly on lines (\n), reading it as lines is pointless.

          Y 1 Reply Last reply
          1
          • SGaistS Offline
            SGaistS Offline
            SGaist
            Lifetime Qt Champion
            wrote on last edited by
            #5

            Hi,

            You can also load the content of your file completely and then run a search through it using QRegularExpression

            Interested in AI ? www.idiap.ch
            Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

            1 Reply Last reply
            1
            • mrjjM mrjj

              well it reads one line at a time if that is what you mean.
              but it all depends how your text file is structured.
              if text are not neatly on lines (\n), reading it as lines is pointless.

              Y Offline
              Y Offline
              yodusow bardon
              wrote on last edited by yodusow bardon
              #6

              @mrjj Actually the application doesn't need to know if it has lines or not, I just need to get all the links.

              I think that I will use regex: https://gist.github.com/dperini/729294

              @SGaist I saw your answer before posting, but yes, I think that in this case it's better to use regex.

              mrjjM 1 Reply Last reply
              0
              • Y yodusow bardon

                @mrjj Actually the application doesn't need to know if it has lines or not, I just need to get all the links.

                I think that I will use regex: https://gist.github.com/dperini/729294

                @SGaist I saw your answer before posting, but yes, I think that in this case it's better to use regex.

                mrjjM Offline
                mrjjM Offline
                mrjj
                Lifetime Qt Champion
                wrote on last edited by
                #7

                @yodusow-bardon
                Ok, so its like a dump.
                That is one nice RegularExpression ;)

                Y 1 Reply Last reply
                0
                • mrjjM mrjj

                  @yodusow-bardon
                  Ok, so its like a dump.
                  That is one nice RegularExpression ;)

                  Y Offline
                  Y Offline
                  yodusow bardon
                  wrote on last edited by
                  #8

                  @mrjj I just realized that this one isn't working with Qt. I'm getting a warning:

                  QRegularExpressionPrivate::doMatch(): called on an invalid QRegularExpression object

                  I will try to find other like this or make this one to work. - If you have one, I will accept too. haha.

                  mrjjM 1 Reply Last reply
                  0
                  • Y yodusow bardon

                    @mrjj I just realized that this one isn't working with Qt. I'm getting a warning:

                    QRegularExpressionPrivate::doMatch(): called on an invalid QRegularExpression object

                    I will try to find other like this or make this one to work. - If you have one, I will accept too. haha.

                    mrjjM Offline
                    mrjjM Offline
                    mrjj
                    Lifetime Qt Champion
                    wrote on last edited by
                    #9

                    @yodusow-bardon
                    Hi
                    The actual expression should still work with the QRegularExpression Class ?
                    seems just to add strings using + to make it more readable.
                    "(?:(?:https?|ftp)://)" + "(?:\S+(?::\S*)?@)?" ...
                    so you can easy convert to Qt , i think.
                    or?

                    Y 1 Reply Last reply
                    0
                    • mrjjM mrjj

                      @yodusow-bardon
                      Hi
                      The actual expression should still work with the QRegularExpression Class ?
                      seems just to add strings using + to make it more readable.
                      "(?:(?:https?|ftp)://)" + "(?:\S+(?::\S*)?@)?" ...
                      so you can easy convert to Qt , i think.
                      or?

                      Y Offline
                      Y Offline
                      yodusow bardon
                      wrote on last edited by
                      #10

                      @mrjj That is how I'm doing it:

                      QRegularExpression re(
                        "^"
                        // protocol identifier
                        "(?:(?:https?|ftp)://)"
                        // user:pass authentication
                        "(?:\\S+(?::\\S*)?@)?"
                        "(?:"
                        // IP address exclusion
                        // private & local networks
                        "(?!(?:10|127)(?:\\.\\d{1,3}){3})"
                        "(?!(?:169\\.254|192\\.168)(?:\\.\\d{1,3}){2})"
                        "(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})"
                        // IP address dotted notation octets
                        // excludes loopback network 0.0.0.0
                        // excludes reserved space >= 224.0.0.0
                        // excludes network & broacast addresses
                        // (first & last IP address of each class)
                        "(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])"
                        "(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}"
                        "(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))"
                        "|"
                        // host name
                        "(?:(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)"
                        // domain name
                        "(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)*"
                        // TLD identifier
                        "(?:\\.(?:[a-z\\u00a1-\\uffff]{2,}))"
                        // TLD may end with dot
                        "\\.?"
                        ")"
                        // port number
                        "(?::\\d{2,5})?"
                        // resource path
                        "(?:[/?#]\\S*)?"
                        "$"
                      );
                        
                      re.setPatternOptions(QRegularExpression::MultilineOption |
                                         QRegularExpression::DotMatchesEverythingOption |
                                         QRegularExpression::CaseInsensitiveOption);
                      
                      auto match = re.match(text);
                      if ( match.hasMatch()) {
                        qDebug() << match.captured(0);
                      } else {
                        qDebug() << "Nothing found";
                      }
                      
                      1 Reply Last reply
                      2

                      • Login

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • Users
                      • Groups
                      • Search
                      • Get Qt Extensions
                      • Unsolved