Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. What's the best way to parse a huge XML Document?
Forum Updated to NodeBB v4.3 + New Features

What's the best way to parse a huge XML Document?

Scheduled Pinned Locked Moved Unsolved General and Desktop
13 Posts 5 Posters 2.7k Views 2 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • mrjjM mrjj

    Hi
    Well that is what XPath do/was made for.
    https://www.w3schools.com/xml/xpath_intro.asp

    J Offline
    J Offline
    JohnSRV
    wrote on last edited by
    #4

    @mrjj Thanks

    XPath seems what i need for this task. But how can i properly use it in my Qt Application. I read the documentation on how to run XQuery from Qt Application but I can't find anything on XPath. Is it used the same way as XQuery?

    Plus, the documentation of XQuery describes how to parse an XML File and write the Result into a new File. I need to replace Values in the same document.

    JonBJ 1 Reply Last reply
    0
    • J JohnSRV

      @mrjj Thanks

      XPath seems what i need for this task. But how can i properly use it in my Qt Application. I read the documentation on how to run XQuery from Qt Application but I can't find anything on XPath. Is it used the same way as XQuery?

      Plus, the documentation of XQuery describes how to parse an XML File and write the Result into a new File. I need to replace Values in the same document.

      JonBJ Offline
      JonBJ Offline
      JonB
      wrote on last edited by JonB
      #5

      @JohnSRV said in What's the best way to parse a huge XML Document?:

      and write the Result into a new File. I need to replace Values in the same document.

      I did post above.

      1 Reply Last reply
      0
      • JonBJ JonB

        @JohnSRV
        Be aware that if you are wanting to "change the Value of the Element", you are going to have to rewrite the whole document. If you don't want to use low-level QXmlStreamReader/Writer to do the searching yourself while reading/writing, you are going to have to read the whole document into memory for editing (and then save back). So just how "huge" is your document? :)

        J Offline
        J Offline
        JohnSRV
        wrote on last edited by
        #6

        @JonB The Problem with the Document is the huge amout of Tags having the same Tag name. I have over 5000 Tags <FieldValue>. I just need to manipulate the one with the attribute browsername = "InputBuffer".

        Getting all the these Tags in a QNodeList is absurd. XPath seems to give one the possibility to extract a Tag with a specific Attribute. But as you pointed out I need rewrite the whole Document.

        PS: the XML Document has 13 Ko. Problem is as I mentioned that it uses the same Tag name over and over again so I need to sort by attribute not by Tag name.

        JonBJ 1 Reply Last reply
        0
        • J JohnSRV

          @JonB The Problem with the Document is the huge amout of Tags having the same Tag name. I have over 5000 Tags <FieldValue>. I just need to manipulate the one with the attribute browsername = "InputBuffer".

          Getting all the these Tags in a QNodeList is absurd. XPath seems to give one the possibility to extract a Tag with a specific Attribute. But as you pointed out I need rewrite the whole Document.

          PS: the XML Document has 13 Ko. Problem is as I mentioned that it uses the same Tag name over and over again so I need to sort by attribute not by Tag name.

          JonBJ Offline
          JonBJ Offline
          JonB
          wrote on last edited by
          #7

          @JohnSRV said in What's the best way to parse a huge XML Document?:

          But as you pointed out I need rewrite the whole Document.

          Yep, as long as you understand that it's fine.

          the XML Document has 13 Ko

          LOL, that's a joke, it's not "huge" at all!! I thought you mean it might me more like 13GB, and then you might have memory issues.... So what's the problem reading it into memory? From a brief search, if you want to use XPath there is https://stackoverflow.com/questions/56062025/search-for-nodes-in-a-qdomdocument-using-xpath. But I think you're supposed to look at using https://doc.qt.io/qt-5/qxmlquery.html, maybe https://stackoverflow.com/questions/34852028/trying-to-get-a-string-out-of-an-xml-document-with-qxmlquery is worth a read.

          I admit I too am not sure how it fits together with current Qt XML offerings, hopefully these are some links to get you going.

          1 Reply Last reply
          1
          • J JohnSRV

            Hello Everyone,

            I'm trying to process a huge XML Document where I search a specific Tag and then change the Value of the Element. Here's an example

            <FieldValue      uuid="{fd660de7-c298-4a74-a69e-c05a597796c8}"
                                        browseName="InputBuffer" >
                      2000
                  <UnevaluatedInitialValue>
                                                
                  </UnevaluatedInitialValue>
                   <EvaluatedInitialValue>
                                                
                  </EvaluatedInitialValue>
                   <InitialValue>
                                                
                   </InitialValue>
            
            </FieldValue>
            
            

            Let's say I want to set the value of the Input Buffer to 5000. The Problem is that the Document has literally tens of thousands of <FieldValue> Tags. Is there a way to search the Tag I need based on the value of its Attributes like browseName = "InputBuffer" and extract that particular tag that I need?

            The QDomDocument Class doesn't seem to be able to achieve this. Neither the QXMLStreamReader.

            Thanks !

            R Offline
            R Offline
            Robert Hairgrove
            wrote on last edited by
            #8

            @JohnSRV This sounds like more of a job for a database. Although some people misuse XML that way, it was not designed as a substitute for a database. Or are you forced to use XML for some reason?

            J 1 Reply Last reply
            0
            • R Robert Hairgrove

              @JohnSRV This sounds like more of a job for a database. Although some people misuse XML that way, it was not designed as a substitute for a database. Or are you forced to use XML for some reason?

              J Offline
              J Offline
              JohnSRV
              wrote on last edited by JohnSRV
              #9

              @Robert-Hairgrove Hey I'm actually forced to use xml. The file I'm trying to manipule is a project in epf Format which used by some softwares for virtual commissioning. What I'm trying to do is to build a gui where the User gives the Ip Address of the Control system he wants to connect to. This will be written in the Project file which can be handeld as an XML File.

              @JonB I used DOM to go over the Tags in the file and extract the ones i want to change. You're right the file is not huge and doesn't cause any memory issues. What I'm still trying to find out is how to write the new File. I know i have to rewrite the whole thing but with new attribute Values for some Tags . What's the better way to do it? use DOM with a loop ? or XMLStreamwriter ? The XMLStreamWriter Doc wasn't helpful.

              JonBJ 1 Reply Last reply
              0
              • J JohnSRV

                @Robert-Hairgrove Hey I'm actually forced to use xml. The file I'm trying to manipule is a project in epf Format which used by some softwares for virtual commissioning. What I'm trying to do is to build a gui where the User gives the Ip Address of the Control system he wants to connect to. This will be written in the Project file which can be handeld as an XML File.

                @JonB I used DOM to go over the Tags in the file and extract the ones i want to change. You're right the file is not huge and doesn't cause any memory issues. What I'm still trying to find out is how to write the new File. I know i have to rewrite the whole thing but with new attribute Values for some Tags . What's the better way to do it? use DOM with a loop ? or XMLStreamwriter ? The XMLStreamWriter Doc wasn't helpful.

                JonBJ Offline
                JonBJ Offline
                JonB
                wrote on last edited by JonB
                #10

                @JohnSRV
                Once you have a QDomDocument, and you have made the desired changes into it, you have QByteArray QDomDocument::toByteArray(int indent = 1) const or QString QDomDocument::toString(int indent = 1) const to save back to file.

                Your alternative is to use QXmlStreamReader and, while you are reading, QXmlStreamWriter to copy to output as you go. But then you must do your searching/replacing "on the fly" as you are reading the nodes, incrementally.

                J 1 Reply Last reply
                1
                • JonBJ JonB

                  @JohnSRV
                  Once you have a QDomDocument, and you have made the desired changes into it, you have QByteArray QDomDocument::toByteArray(int indent = 1) const or QString QDomDocument::toString(int indent = 1) const to save back to file.

                  Your alternative is to use QXmlStreamReader and, while you are reading, QXmlStreamWriter to copy to output as you go. But then you must do your searching/replacing "on the fly" as you are reading the nodes, incrementally.

                  J Offline
                  J Offline
                  JohnSRV
                  wrote on last edited by JohnSRV
                  #11

                  @JonB Thanks. I'm actually still stuck at making the changes. I'm trying to change the value of a CDATA Section.

                  <FieldValue
                                              uuid="{360074f3-3779-4ba7-a407-e5c51ffb9093}"
                                              browseName="ServerIp"
                  >
                  <![CDATA[127.0.0.1]]>
                  </FieldValue>
                  
                  <FieldValue
                                              uuid="{4e79b8c6-d431-412f-bf04-24e3d320f7f3}"
                                              browseName="ServerPort"
                  >
                                                  54323
                  
                   </FieldValue>
                  

                  I'm trying ti change the value 127.0.0.1 in the CDATA Section and the Port 54323 to 562. Now in the QDOM I find only setAttribute to change the Attributes of an Element. are there Methods in QDOM to manipulate a CDATA Section and the actual content of the Element like 54323 and not only the attributes?

                  UPDATE
                  For the Port Value I can use the setNodeValue Method. Pardon that was a stupid question. I'm still puzzled with the CDATA Section though.

                  JonBJ 1 Reply Last reply
                  0
                  • J JohnSRV

                    @JonB Thanks. I'm actually still stuck at making the changes. I'm trying to change the value of a CDATA Section.

                    <FieldValue
                                                uuid="{360074f3-3779-4ba7-a407-e5c51ffb9093}"
                                                browseName="ServerIp"
                    >
                    <![CDATA[127.0.0.1]]>
                    </FieldValue>
                    
                    <FieldValue
                                                uuid="{4e79b8c6-d431-412f-bf04-24e3d320f7f3}"
                                                browseName="ServerPort"
                    >
                                                    54323
                    
                     </FieldValue>
                    

                    I'm trying ti change the value 127.0.0.1 in the CDATA Section and the Port 54323 to 562. Now in the QDOM I find only setAttribute to change the Attributes of an Element. are there Methods in QDOM to manipulate a CDATA Section and the actual content of the Element like 54323 and not only the attributes?

                    UPDATE
                    For the Port Value I can use the setNodeValue Method. Pardon that was a stupid question. I'm still puzzled with the CDATA Section though.

                    JonBJ Offline
                    JonBJ Offline
                    JonB
                    wrote on last edited by JonB
                    #12

                    @JohnSRV
                    You need to look at methods of QDomNode. Get a hold of (a reference to) the FieldValue node. Verify it's a CDATA node type:

                    QDomNode node = getReferenceToFieldValueNodeByWhateverMeans();
                    # I *think* the `CDATA` node is actually the first child of the `FieldValue` node
                    # rather than the `FieldValue` node itself, but you'll have to check
                    QDomNode cdata = node.firstChild();
                    Q_ASSERT(cdata.isCDATASection());
                    cdata.setNodeValue("192.168.0.1");
                    

                    I do not promise this is just what you need, you'll have to read the docs and play around, but this is the approach.

                    And btw: QDomDocument inherits from QDomNode. Earlier I said that to save it you need to use a toString/ByteArray() method on it. I now see there is https://doc.qt.io/qt-5/qdomnode.html#save, you can use that to save the whole of the QDomDocument to a QTextStream (which can be attached to a file).

                    1 Reply Last reply
                    2
                    • S Offline
                      S Offline
                      Suliman123
                      wrote on last edited by
                      #13

                      Do you need exactly Qt solution? You could check this sample https://redata.dev/smartxml/docs/how-to-parse-xml-into-sql-or-json.html

                      it's not QT, but allow parse big filex

                      1 Reply Last reply
                      0

                      • Login

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • Users
                      • Groups
                      • Search
                      • Get Qt Extensions
                      • Unsolved