Qt Forum

    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Unsolved

    QRegExp for Searching HTML Files?

    General and Desktop
    3
    5
    2569
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      fargo last edited by

      I have a pool of html files and want to search through them for same targeted text. It is required to search in their text contents and ignore the html tags. I tried QRegExp, but could not find a good pattern to do this. So, I'd appreciate any help in this regard.

      Thank you.

      1 Reply Last reply Reply Quote 0
      • G
        goetz last edited by

        Searching directly is almost impossible, as the formatting gets in the way all the time. One possible solution could be, to load the HTML into a "QTextDocument":http://doc.qt.nokia.com/4.7/qtextdocument.html and use find on the document.

        This has the drawback, that the HTML might be completely altered if you are in need to manipulate the contents save the file afterwards.

        http://www.catb.org/~esr/faqs/smart-questions.html

        1 Reply Last reply Reply Quote 0
        • A
          andre last edited by

          First of all, using a QRegExp to search through files on disk isn't something that is supported directly by Qt. You'd have to load the files one by one and then search the contents. Then, using a regexp to parse HTML or XML is a bad idea. You really don't want to do that. I would recommend that you use some HTML tidy program to create valid XML from it, and then use Qt's XML classes to search for your text.

          1 Reply Last reply Reply Quote 0
          • G
            goetz last edited by

            Even if it is valid XHTML, searching will not work out well with the raw XHTML source. Imagine you search for "foo bar" and have in your markup:

            @
            <em>foo</em> <span class='hugo'>ba</span><span class='superduper'>r</span>
            @

            You still want to match this (obviously silly) construct, as its plain text representation is still "foo bar". Best approach IMO would be the text search of QTextDocument.

            http://www.catb.org/~esr/faqs/smart-questions.html

            1 Reply Last reply Reply Quote 0
            • A
              andre last edited by

              Good point. I think you are right, and QTextDocument::find() is the way to go.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post