Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Get all urls in a text file
Forum Updated to NodeBB v4.3 + New Features

Get all urls in a text file

Scheduled Pinned Locked Moved Solved General and Desktop
10 Posts 3 Posters 3.2k Views 3 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Y Offline
    Y Offline
    yodusow bardon
    wrote on 19 Dec 2015, 20:35 last edited by yodusow bardon
    #1

    I have a text file with texts and urls, I want to get all the urls in that file, how can I do that using Qt?

    1 Reply Last reply
    0
    • M Offline
      M Offline
      mrjj
      Lifetime Qt Champion
      wrote on 19 Dec 2015, 20:58 last edited by mrjj
      #2

      hi
      is it just a list of urls or is the url mixed with other type of text?
      Are you asking how you can parse them or how you would read the text file?
      Can you show some lines from the file?
      You can read all url lines by line this way

      QFile inputFile(fileName);
      if (inputFile.open(QIODevice::ReadOnly))
      {
         QTextStream in(&inputFile);
         while (!in.atEnd())
         {
            QString line = in.readLine();
            ...
         }
         inputFile.close();
      }
      
      1 Reply Last reply
      1
      • Y Offline
        Y Offline
        yodusow bardon
        wrote on 19 Dec 2015, 21:44 last edited by
        #3

        It's a mixed text with urls. I think that the way you pointed out might have performance problem, am I wrong?

        M 1 Reply Last reply 19 Dec 2015, 21:58
        0
        • Y yodusow bardon
          19 Dec 2015, 21:44

          It's a mixed text with urls. I think that the way you pointed out might have performance problem, am I wrong?

          M Offline
          M Offline
          mrjj
          Lifetime Qt Champion
          wrote on 19 Dec 2015, 21:58 last edited by
          #4

          well it reads one line at a time if that is what you mean.
          but it all depends how your text file is structured.
          if text are not neatly on lines (\n), reading it as lines is pointless.

          Y 1 Reply Last reply 19 Dec 2015, 22:09
          1
          • S Offline
            S Offline
            SGaist
            Lifetime Qt Champion
            wrote on 19 Dec 2015, 22:00 last edited by
            #5

            Hi,

            You can also load the content of your file completely and then run a search through it using QRegularExpression

            Interested in AI ? www.idiap.ch
            Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

            1 Reply Last reply
            1
            • M mrjj
              19 Dec 2015, 21:58

              well it reads one line at a time if that is what you mean.
              but it all depends how your text file is structured.
              if text are not neatly on lines (\n), reading it as lines is pointless.

              Y Offline
              Y Offline
              yodusow bardon
              wrote on 19 Dec 2015, 22:09 last edited by yodusow bardon
              #6

              @mrjj Actually the application doesn't need to know if it has lines or not, I just need to get all the links.

              I think that I will use regex: https://gist.github.com/dperini/729294

              @SGaist I saw your answer before posting, but yes, I think that in this case it's better to use regex.

              M 1 Reply Last reply 19 Dec 2015, 22:17
              0
              • Y yodusow bardon
                19 Dec 2015, 22:09

                @mrjj Actually the application doesn't need to know if it has lines or not, I just need to get all the links.

                I think that I will use regex: https://gist.github.com/dperini/729294

                @SGaist I saw your answer before posting, but yes, I think that in this case it's better to use regex.

                M Offline
                M Offline
                mrjj
                Lifetime Qt Champion
                wrote on 19 Dec 2015, 22:17 last edited by
                #7

                @yodusow-bardon
                Ok, so its like a dump.
                That is one nice RegularExpression ;)

                Y 1 Reply Last reply 19 Dec 2015, 22:20
                0
                • M mrjj
                  19 Dec 2015, 22:17

                  @yodusow-bardon
                  Ok, so its like a dump.
                  That is one nice RegularExpression ;)

                  Y Offline
                  Y Offline
                  yodusow bardon
                  wrote on 19 Dec 2015, 22:20 last edited by
                  #8

                  @mrjj I just realized that this one isn't working with Qt. I'm getting a warning:

                  QRegularExpressionPrivate::doMatch(): called on an invalid QRegularExpression object

                  I will try to find other like this or make this one to work. - If you have one, I will accept too. haha.

                  M 1 Reply Last reply 19 Dec 2015, 22:26
                  0
                  • Y yodusow bardon
                    19 Dec 2015, 22:20

                    @mrjj I just realized that this one isn't working with Qt. I'm getting a warning:

                    QRegularExpressionPrivate::doMatch(): called on an invalid QRegularExpression object

                    I will try to find other like this or make this one to work. - If you have one, I will accept too. haha.

                    M Offline
                    M Offline
                    mrjj
                    Lifetime Qt Champion
                    wrote on 19 Dec 2015, 22:26 last edited by
                    #9

                    @yodusow-bardon
                    Hi
                    The actual expression should still work with the QRegularExpression Class ?
                    seems just to add strings using + to make it more readable.
                    "(?:(?:https?|ftp)://)" + "(?:\S+(?::\S*)?@)?" ...
                    so you can easy convert to Qt , i think.
                    or?

                    Y 1 Reply Last reply 19 Dec 2015, 22:30
                    0
                    • M mrjj
                      19 Dec 2015, 22:26

                      @yodusow-bardon
                      Hi
                      The actual expression should still work with the QRegularExpression Class ?
                      seems just to add strings using + to make it more readable.
                      "(?:(?:https?|ftp)://)" + "(?:\S+(?::\S*)?@)?" ...
                      so you can easy convert to Qt , i think.
                      or?

                      Y Offline
                      Y Offline
                      yodusow bardon
                      wrote on 19 Dec 2015, 22:30 last edited by
                      #10

                      @mrjj That is how I'm doing it:

                      QRegularExpression re(
                        "^"
                        // protocol identifier
                        "(?:(?:https?|ftp)://)"
                        // user:pass authentication
                        "(?:\\S+(?::\\S*)?@)?"
                        "(?:"
                        // IP address exclusion
                        // private & local networks
                        "(?!(?:10|127)(?:\\.\\d{1,3}){3})"
                        "(?!(?:169\\.254|192\\.168)(?:\\.\\d{1,3}){2})"
                        "(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})"
                        // IP address dotted notation octets
                        // excludes loopback network 0.0.0.0
                        // excludes reserved space >= 224.0.0.0
                        // excludes network & broacast addresses
                        // (first & last IP address of each class)
                        "(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])"
                        "(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}"
                        "(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))"
                        "|"
                        // host name
                        "(?:(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)"
                        // domain name
                        "(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)*"
                        // TLD identifier
                        "(?:\\.(?:[a-z\\u00a1-\\uffff]{2,}))"
                        // TLD may end with dot
                        "\\.?"
                        ")"
                        // port number
                        "(?::\\d{2,5})?"
                        // resource path
                        "(?:[/?#]\\S*)?"
                        "$"
                      );
                        
                      re.setPatternOptions(QRegularExpression::MultilineOption |
                                         QRegularExpression::DotMatchesEverythingOption |
                                         QRegularExpression::CaseInsensitiveOption);
                      
                      auto match = re.match(text);
                      if ( match.hasMatch()) {
                        qDebug() << match.captured(0);
                      } else {
                        qDebug() << "Nothing found";
                      }
                      
                      1 Reply Last reply
                      2

                      10/10

                      19 Dec 2015, 22:30

                      • Login

                      • Login or register to search.
                      10 out of 10
                      • First post
                        10/10
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • Users
                      • Groups
                      • Search
                      • Get Qt Extensions
                      • Unsolved