Solved Parsing hyperlinks from HTML using QTextDocument?
-
I'm having trouble with parsing simple things from a simple HTML page without using Webkit and DOM (I don't want a Webkit dependency).
- This page says
QTextDocument
can parse stuff from HTML. - I do the following and I can see that my HTML had been parsed nicely, but only the user-visible text, no markup here:
for (QTextBlock block = doc.begin(), end = doc.end(); block != end; block = block.next()) { qDebug() << block.text(); }
- I read some
QTextBlock
docs and try this, butanchorHref
andanchorNames
is always empty even for the blocks that I know are<a href>
.
for (QTextBlock block = doc.begin(), end = doc.end(); block != end; block = block.next()) { qDebug() << block.text(); qDebug() << block.charFormat().anchorNames(); qDebug() << block.charFormat().anchorHref(); qDebug() << "-------------------------------"; }
Is there any way to get the hyperlink URLs?
- This page says
-
@Violet-Giraffe
This should work (untested though)void searchLink(QTextFrame * parent) { for( QTextFrame::iterator it = parent->begin(); !it.atEnd(); ++it ) { QTextFrame *textFrame = it.currentFrame(); QTextBlock textBlock = it.currentBlock(); if( textFrame ) { this->searchLink(textFrame); } else if( textBlock.isValid() ) { this->searchLink(textBlock); } } } void searchLink(QTextBlock & parent) { for(QTextBlock::iterator it = parent.begin(); !it.atEnd(); ++it) { QTextFragment textFragment = it.fragment(); if( textFragment.isValid() ) { QTextCharFormat textCharFormat = textFragment.charFormat(); if( textCharFormat.isAnchor() ) { textCharFormat.anchorHref(); // <-- URL } } } }
The searchLink() method searches recursively.
searchLink( textDocument->rootFrame() );
-
Aha! So my mistake was that I only looked at blocks and not fragments. Thank you.