How to get html source code from a web page?
-
How to get html source code from a web page? I want to get the html source code from a web page then fitter it for text. But how I would do that?
-
Web-pages always come as source code. So just download the HTML file.
You can use QNetworkAccessManager for this purpose, I think. Parsing the HTML code to "extract" the text will be the more difficult task here. Don't know how useful a QWebView can be for this task, but maybe you can construct a rather simple regular expression with QRegExp to get the text block out of the HTML code.
For example, this would get everything enclosed by <body> ... </body> tags:
@QString getText(const QString &theHtmlCode)
{
QRegExp filter("<body>(.+)</body>");
int result = filer.indexIn(theHtmlCode);
if(result != -1)
{
return filter.cap(1);
}
return QString();
}@ -
Doesn't know what is the "theHtmlCode'. Where do I put the URL in, and How?
[quote author="MuldeR" date="1365379340"]Web-pages always come as source code. So just download the HTML file.You can use QNetworkAccessManager for this purpose, I think. Parsing the HTML code to "extract" the text will be the more difficult task here. Don't know how useful a QWebView can be for this task, but maybe you can construct a rather simple regular expression with QRegExp to get the text block out of the HTML code.
For example, this would get everything enclosed by <body> ... </body> tags:
@QString getText(const QString &theHtmlCode)
{
QRegExp filter("<body>(.+)</body>");
int result = filer.indexIn(theHtmlCode);
if(result != -1)
{
return filter.cap(1);
}
return QString();
}@[/quote] -
You can either use "QWebView":http://qt-project.org/doc/qt-4.8/qwebview.html or "QNetworkAcessManager":http://qt-project.org/doc/qt-4.8/qnetworkaccessmanager.html as already suggested by MuldeR. Have you even taken a look at the documentation of these classes? There are already examples of how to achieve what you are asking for in the documentation of these classes (3 lines of code are all it takes). Now all you need to do is read the doc and decide which one you want to use.