Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. QRegExp for Searching HTML Files?
QtWS25 Last Chance

QRegExp for Searching HTML Files?

Scheduled Pinned Locked Moved General and Desktop
5 Posts 3 Posters 2.8k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • F Offline
    F Offline
    fargo
    wrote on last edited by
    #1

    I have a pool of html files and want to search through them for same targeted text. It is required to search in their text contents and ignore the html tags. I tried QRegExp, but could not find a good pattern to do this. So, I'd appreciate any help in this regard.

    Thank you.

    1 Reply Last reply
    0
    • G Offline
      G Offline
      goetz
      wrote on last edited by
      #2

      Searching directly is almost impossible, as the formatting gets in the way all the time. One possible solution could be, to load the HTML into a "QTextDocument":http://doc.qt.nokia.com/4.7/qtextdocument.html and use find on the document.

      This has the drawback, that the HTML might be completely altered if you are in need to manipulate the contents save the file afterwards.

      http://www.catb.org/~esr/faqs/smart-questions.html

      1 Reply Last reply
      0
      • A Offline
        A Offline
        andre
        wrote on last edited by
        #3

        First of all, using a QRegExp to search through files on disk isn't something that is supported directly by Qt. You'd have to load the files one by one and then search the contents. Then, using a regexp to parse HTML or XML is a bad idea. You really don't want to do that. I would recommend that you use some HTML tidy program to create valid XML from it, and then use Qt's XML classes to search for your text.

        1 Reply Last reply
        0
        • G Offline
          G Offline
          goetz
          wrote on last edited by
          #4

          Even if it is valid XHTML, searching will not work out well with the raw XHTML source. Imagine you search for "foo bar" and have in your markup:

          @
          <em>foo</em> <span class='hugo'>ba</span><span class='superduper'>r</span>
          @

          You still want to match this (obviously silly) construct, as its plain text representation is still "foo bar". Best approach IMO would be the text search of QTextDocument.

          http://www.catb.org/~esr/faqs/smart-questions.html

          1 Reply Last reply
          0
          • A Offline
            A Offline
            andre
            wrote on last edited by
            #5

            Good point. I think you are right, and QTextDocument::find() is the way to go.

            1 Reply Last reply
            0

            • Login

            • Login or register to search.
            • First post
              Last post
            0
            • Categories
            • Recent
            • Tags
            • Popular
            • Users
            • Groups
            • Search
            • Get Qt Extensions
            • Unsolved