Get all urls in a text file
-
I have a text file with texts and urls, I want to get all the urls in that file, how can I do that using Qt?
-
hi
is it just a list of urls or is the url mixed with other type of text?
Are you asking how you can parse them or how you would read the text file?
Can you show some lines from the file?
You can read all url lines by line this wayQFile inputFile(fileName); if (inputFile.open(QIODevice::ReadOnly)) { QTextStream in(&inputFile); while (!in.atEnd()) { QString line = in.readLine(); ... } inputFile.close(); }
-
It's a mixed text with urls. I think that the way you pointed out might have performance problem, am I wrong?
-
It's a mixed text with urls. I think that the way you pointed out might have performance problem, am I wrong?
well it reads one line at a time if that is what you mean.
but it all depends how your text file is structured.
if text are not neatly on lines (\n), reading it as lines is pointless. -
Hi,
You can also load the content of your file completely and then run a search through it using QRegularExpression
-
well it reads one line at a time if that is what you mean.
but it all depends how your text file is structured.
if text are not neatly on lines (\n), reading it as lines is pointless.@mrjj Actually the application doesn't need to know if it has lines or not, I just need to get all the links.
I think that I will use regex: https://gist.github.com/dperini/729294
@SGaist I saw your answer before posting, but yes, I think that in this case it's better to use regex.
-
@mrjj Actually the application doesn't need to know if it has lines or not, I just need to get all the links.
I think that I will use regex: https://gist.github.com/dperini/729294
@SGaist I saw your answer before posting, but yes, I think that in this case it's better to use regex.
@yodusow-bardon
Ok, so its like a dump.
That is one nice RegularExpression ;) -
@yodusow-bardon
Ok, so its like a dump.
That is one nice RegularExpression ;)@mrjj I just realized that this one isn't working with Qt. I'm getting a warning:
QRegularExpressionPrivate::doMatch(): called on an invalid QRegularExpression object
I will try to find other like this or make this one to work. - If you have one, I will accept too. haha.
-
@mrjj I just realized that this one isn't working with Qt. I'm getting a warning:
QRegularExpressionPrivate::doMatch(): called on an invalid QRegularExpression object
I will try to find other like this or make this one to work. - If you have one, I will accept too. haha.
@yodusow-bardon
Hi
The actual expression should still work with the QRegularExpression Class ?
seems just to add strings using + to make it more readable.
"(?:(?:https?|ftp)://)" + "(?:\S+(?::\S*)?@)?" ...
so you can easy convert to Qt , i think.
or? -
@yodusow-bardon
Hi
The actual expression should still work with the QRegularExpression Class ?
seems just to add strings using + to make it more readable.
"(?:(?:https?|ftp)://)" + "(?:\S+(?::\S*)?@)?" ...
so you can easy convert to Qt , i think.
or?@mrjj That is how I'm doing it:
QRegularExpression re( "^" // protocol identifier "(?:(?:https?|ftp)://)" // user:pass authentication "(?:\\S+(?::\\S*)?@)?" "(?:" // IP address exclusion // private & local networks "(?!(?:10|127)(?:\\.\\d{1,3}){3})" "(?!(?:169\\.254|192\\.168)(?:\\.\\d{1,3}){2})" "(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})" // IP address dotted notation octets // excludes loopback network 0.0.0.0 // excludes reserved space >= 224.0.0.0 // excludes network & broacast addresses // (first & last IP address of each class) "(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])" "(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}" "(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))" "|" // host name "(?:(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)" // domain name "(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)*" // TLD identifier "(?:\\.(?:[a-z\\u00a1-\\uffff]{2,}))" // TLD may end with dot "\\.?" ")" // port number "(?::\\d{2,5})?" // resource path "(?:[/?#]\\S*)?" "$" ); re.setPatternOptions(QRegularExpression::MultilineOption | QRegularExpression::DotMatchesEverythingOption | QRegularExpression::CaseInsensitiveOption); auto match = re.match(text); if ( match.hasMatch()) { qDebug() << match.captured(0); } else { qDebug() << "Nothing found"; }