Parsing hyperlinks from HTML using QTextDocument?



  • I'm having trouble with parsing simple things from a simple HTML page without using Webkit and DOM (I don't want a Webkit dependency).

    1. This page says QTextDocument can parse stuff from HTML.
    2. I do the following and I can see that my HTML had been parsed nicely, but only the user-visible text, no markup here:
    for (QTextBlock block = doc.begin(), end = doc.end(); block != end; block = block.next())
    {
    	qDebug() << block.text();
    }
    
    1. I read some QTextBlock docs and try this, but anchorHref and anchorNames is always empty even for the blocks that I know are <a href>.
    for (QTextBlock block = doc.begin(), end = doc.end(); block != end; block = block.next())
    {
    	qDebug() << block.text();
    	qDebug() << block.charFormat().anchorNames();
    	qDebug() << block.charFormat().anchorHref();
    	qDebug() << "-------------------------------";
    }
    

    Is there any way to get the hyperlink URLs?


  • Moderators

    @Violet-Giraffe
    This should work (untested though)

    void searchLink(QTextFrame * parent)
    {
        for( QTextFrame::iterator it = parent->begin(); !it.atEnd(); ++it )
        {
            QTextFrame *textFrame = it.currentFrame();
            QTextBlock textBlock = it.currentBlock();
    
            if( textFrame )
            {
                this->searchLink(textFrame);
            }
            else if( textBlock.isValid() )
            {
                this->searchLink(textBlock);
            }
        }
    }
    
    void searchLink(QTextBlock & parent)
    {
        for(QTextBlock::iterator it = parent.begin(); !it.atEnd(); ++it)
        {
            QTextFragment textFragment = it.fragment();
            if( textFragment.isValid() )
            {
                QTextCharFormat textCharFormat = textFragment.charFormat();
                if( textCharFormat.isAnchor() )
                {
                     textCharFormat.anchorHref();  // <-- URL
                }
            }
        }
    }
    

    The searchLink() method searches recursively.

    searchLink( textDocument->rootFrame() );
    


  • Aha! So my mistake was that I only looked at blocks and not fragments. Thank you.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.