Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. What's the best way to parse a huge XML Document?

What's the best way to parse a huge XML Document?

Scheduled Pinned Locked Moved Unsolved General and Desktop
13 Posts 5 Posters 2.1k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    JohnSRV
    wrote on 15 Jul 2020, 12:53 last edited by
    #1

    Hello Everyone,

    I'm trying to process a huge XML Document where I search a specific Tag and then change the Value of the Element. Here's an example

    <FieldValue      uuid="{fd660de7-c298-4a74-a69e-c05a597796c8}"
                                browseName="InputBuffer" >
              2000
          <UnevaluatedInitialValue>
                                        
          </UnevaluatedInitialValue>
           <EvaluatedInitialValue>
                                        
          </EvaluatedInitialValue>
           <InitialValue>
                                        
           </InitialValue>
    
    </FieldValue>
    
    

    Let's say I want to set the value of the Input Buffer to 5000. The Problem is that the Document has literally tens of thousands of <FieldValue> Tags. Is there a way to search the Tag I need based on the value of its Attributes like browseName = "InputBuffer" and extract that particular tag that I need?

    The QDomDocument Class doesn't seem to be able to achieve this. Neither the QXMLStreamReader.

    Thanks !

    J R 2 Replies Last reply 15 Jul 2020, 13:57
    0
    • M Offline
      M Offline
      mrjj
      Lifetime Qt Champion
      wrote on 15 Jul 2020, 13:00 last edited by
      #2

      Hi
      Well that is what XPath do/was made for.
      https://www.w3schools.com/xml/xpath_intro.asp

      J 1 Reply Last reply 15 Jul 2020, 14:00
      4
      • J JohnSRV
        15 Jul 2020, 12:53

        Hello Everyone,

        I'm trying to process a huge XML Document where I search a specific Tag and then change the Value of the Element. Here's an example

        <FieldValue      uuid="{fd660de7-c298-4a74-a69e-c05a597796c8}"
                                    browseName="InputBuffer" >
                  2000
              <UnevaluatedInitialValue>
                                            
              </UnevaluatedInitialValue>
               <EvaluatedInitialValue>
                                            
              </EvaluatedInitialValue>
               <InitialValue>
                                            
               </InitialValue>
        
        </FieldValue>
        
        

        Let's say I want to set the value of the Input Buffer to 5000. The Problem is that the Document has literally tens of thousands of <FieldValue> Tags. Is there a way to search the Tag I need based on the value of its Attributes like browseName = "InputBuffer" and extract that particular tag that I need?

        The QDomDocument Class doesn't seem to be able to achieve this. Neither the QXMLStreamReader.

        Thanks !

        J Offline
        J Offline
        JonB
        wrote on 15 Jul 2020, 13:57 last edited by
        #3

        @JohnSRV
        Be aware that if you are wanting to "change the Value of the Element", you are going to have to rewrite the whole document. If you don't want to use low-level QXmlStreamReader/Writer to do the searching yourself while reading/writing, you are going to have to read the whole document into memory for editing (and then save back). So just how "huge" is your document? :)

        J 1 Reply Last reply 15 Jul 2020, 14:26
        0
        • M mrjj
          15 Jul 2020, 13:00

          Hi
          Well that is what XPath do/was made for.
          https://www.w3schools.com/xml/xpath_intro.asp

          J Offline
          J Offline
          JohnSRV
          wrote on 15 Jul 2020, 14:00 last edited by
          #4

          @mrjj Thanks

          XPath seems what i need for this task. But how can i properly use it in my Qt Application. I read the documentation on how to run XQuery from Qt Application but I can't find anything on XPath. Is it used the same way as XQuery?

          Plus, the documentation of XQuery describes how to parse an XML File and write the Result into a new File. I need to replace Values in the same document.

          J 1 Reply Last reply 15 Jul 2020, 14:21
          0
          • J JohnSRV
            15 Jul 2020, 14:00

            @mrjj Thanks

            XPath seems what i need for this task. But how can i properly use it in my Qt Application. I read the documentation on how to run XQuery from Qt Application but I can't find anything on XPath. Is it used the same way as XQuery?

            Plus, the documentation of XQuery describes how to parse an XML File and write the Result into a new File. I need to replace Values in the same document.

            J Offline
            J Offline
            JonB
            wrote on 15 Jul 2020, 14:21 last edited by JonB
            #5

            @JohnSRV said in What's the best way to parse a huge XML Document?:

            and write the Result into a new File. I need to replace Values in the same document.

            I did post above.

            1 Reply Last reply
            0
            • J JonB
              15 Jul 2020, 13:57

              @JohnSRV
              Be aware that if you are wanting to "change the Value of the Element", you are going to have to rewrite the whole document. If you don't want to use low-level QXmlStreamReader/Writer to do the searching yourself while reading/writing, you are going to have to read the whole document into memory for editing (and then save back). So just how "huge" is your document? :)

              J Offline
              J Offline
              JohnSRV
              wrote on 15 Jul 2020, 14:26 last edited by
              #6

              @JonB The Problem with the Document is the huge amout of Tags having the same Tag name. I have over 5000 Tags <FieldValue>. I just need to manipulate the one with the attribute browsername = "InputBuffer".

              Getting all the these Tags in a QNodeList is absurd. XPath seems to give one the possibility to extract a Tag with a specific Attribute. But as you pointed out I need rewrite the whole Document.

              PS: the XML Document has 13 Ko. Problem is as I mentioned that it uses the same Tag name over and over again so I need to sort by attribute not by Tag name.

              J 1 Reply Last reply 15 Jul 2020, 14:34
              0
              • J JohnSRV
                15 Jul 2020, 14:26

                @JonB The Problem with the Document is the huge amout of Tags having the same Tag name. I have over 5000 Tags <FieldValue>. I just need to manipulate the one with the attribute browsername = "InputBuffer".

                Getting all the these Tags in a QNodeList is absurd. XPath seems to give one the possibility to extract a Tag with a specific Attribute. But as you pointed out I need rewrite the whole Document.

                PS: the XML Document has 13 Ko. Problem is as I mentioned that it uses the same Tag name over and over again so I need to sort by attribute not by Tag name.

                J Offline
                J Offline
                JonB
                wrote on 15 Jul 2020, 14:34 last edited by
                #7

                @JohnSRV said in What's the best way to parse a huge XML Document?:

                But as you pointed out I need rewrite the whole Document.

                Yep, as long as you understand that it's fine.

                the XML Document has 13 Ko

                LOL, that's a joke, it's not "huge" at all!! I thought you mean it might me more like 13GB, and then you might have memory issues.... So what's the problem reading it into memory? From a brief search, if you want to use XPath there is https://stackoverflow.com/questions/56062025/search-for-nodes-in-a-qdomdocument-using-xpath. But I think you're supposed to look at using https://doc.qt.io/qt-5/qxmlquery.html, maybe https://stackoverflow.com/questions/34852028/trying-to-get-a-string-out-of-an-xml-document-with-qxmlquery is worth a read.

                I admit I too am not sure how it fits together with current Qt XML offerings, hopefully these are some links to get you going.

                1 Reply Last reply
                1
                • J JohnSRV
                  15 Jul 2020, 12:53

                  Hello Everyone,

                  I'm trying to process a huge XML Document where I search a specific Tag and then change the Value of the Element. Here's an example

                  <FieldValue      uuid="{fd660de7-c298-4a74-a69e-c05a597796c8}"
                                              browseName="InputBuffer" >
                            2000
                        <UnevaluatedInitialValue>
                                                      
                        </UnevaluatedInitialValue>
                         <EvaluatedInitialValue>
                                                      
                        </EvaluatedInitialValue>
                         <InitialValue>
                                                      
                         </InitialValue>
                  
                  </FieldValue>
                  
                  

                  Let's say I want to set the value of the Input Buffer to 5000. The Problem is that the Document has literally tens of thousands of <FieldValue> Tags. Is there a way to search the Tag I need based on the value of its Attributes like browseName = "InputBuffer" and extract that particular tag that I need?

                  The QDomDocument Class doesn't seem to be able to achieve this. Neither the QXMLStreamReader.

                  Thanks !

                  R Offline
                  R Offline
                  Robert Hairgrove
                  wrote on 16 Jul 2020, 09:09 last edited by
                  #8

                  @JohnSRV This sounds like more of a job for a database. Although some people misuse XML that way, it was not designed as a substitute for a database. Or are you forced to use XML for some reason?

                  J 1 Reply Last reply 20 Jul 2020, 16:46
                  0
                  • R Robert Hairgrove
                    16 Jul 2020, 09:09

                    @JohnSRV This sounds like more of a job for a database. Although some people misuse XML that way, it was not designed as a substitute for a database. Or are you forced to use XML for some reason?

                    J Offline
                    J Offline
                    JohnSRV
                    wrote on 20 Jul 2020, 16:46 last edited by JohnSRV
                    #9

                    @Robert-Hairgrove Hey I'm actually forced to use xml. The file I'm trying to manipule is a project in epf Format which used by some softwares for virtual commissioning. What I'm trying to do is to build a gui where the User gives the Ip Address of the Control system he wants to connect to. This will be written in the Project file which can be handeld as an XML File.

                    @JonB I used DOM to go over the Tags in the file and extract the ones i want to change. You're right the file is not huge and doesn't cause any memory issues. What I'm still trying to find out is how to write the new File. I know i have to rewrite the whole thing but with new attribute Values for some Tags . What's the better way to do it? use DOM with a loop ? or XMLStreamwriter ? The XMLStreamWriter Doc wasn't helpful.

                    J 1 Reply Last reply 20 Jul 2020, 18:29
                    0
                    • J JohnSRV
                      20 Jul 2020, 16:46

                      @Robert-Hairgrove Hey I'm actually forced to use xml. The file I'm trying to manipule is a project in epf Format which used by some softwares for virtual commissioning. What I'm trying to do is to build a gui where the User gives the Ip Address of the Control system he wants to connect to. This will be written in the Project file which can be handeld as an XML File.

                      @JonB I used DOM to go over the Tags in the file and extract the ones i want to change. You're right the file is not huge and doesn't cause any memory issues. What I'm still trying to find out is how to write the new File. I know i have to rewrite the whole thing but with new attribute Values for some Tags . What's the better way to do it? use DOM with a loop ? or XMLStreamwriter ? The XMLStreamWriter Doc wasn't helpful.

                      J Offline
                      J Offline
                      JonB
                      wrote on 20 Jul 2020, 18:29 last edited by JonB
                      #10

                      @JohnSRV
                      Once you have a QDomDocument, and you have made the desired changes into it, you have QByteArray QDomDocument::toByteArray(int indent = 1) const or QString QDomDocument::toString(int indent = 1) const to save back to file.

                      Your alternative is to use QXmlStreamReader and, while you are reading, QXmlStreamWriter to copy to output as you go. But then you must do your searching/replacing "on the fly" as you are reading the nodes, incrementally.

                      J 1 Reply Last reply 23 Jul 2020, 08:21
                      1
                      • J JonB
                        20 Jul 2020, 18:29

                        @JohnSRV
                        Once you have a QDomDocument, and you have made the desired changes into it, you have QByteArray QDomDocument::toByteArray(int indent = 1) const or QString QDomDocument::toString(int indent = 1) const to save back to file.

                        Your alternative is to use QXmlStreamReader and, while you are reading, QXmlStreamWriter to copy to output as you go. But then you must do your searching/replacing "on the fly" as you are reading the nodes, incrementally.

                        J Offline
                        J Offline
                        JohnSRV
                        wrote on 23 Jul 2020, 08:21 last edited by JohnSRV
                        #11

                        @JonB Thanks. I'm actually still stuck at making the changes. I'm trying to change the value of a CDATA Section.

                        <FieldValue
                                                    uuid="{360074f3-3779-4ba7-a407-e5c51ffb9093}"
                                                    browseName="ServerIp"
                        >
                        <![CDATA[127.0.0.1]]>
                        </FieldValue>
                        
                        <FieldValue
                                                    uuid="{4e79b8c6-d431-412f-bf04-24e3d320f7f3}"
                                                    browseName="ServerPort"
                        >
                                                        54323
                        
                         </FieldValue>
                        

                        I'm trying ti change the value 127.0.0.1 in the CDATA Section and the Port 54323 to 562. Now in the QDOM I find only setAttribute to change the Attributes of an Element. are there Methods in QDOM to manipulate a CDATA Section and the actual content of the Element like 54323 and not only the attributes?

                        UPDATE
                        For the Port Value I can use the setNodeValue Method. Pardon that was a stupid question. I'm still puzzled with the CDATA Section though.

                        J 1 Reply Last reply 23 Jul 2020, 08:45
                        0
                        • J JohnSRV
                          23 Jul 2020, 08:21

                          @JonB Thanks. I'm actually still stuck at making the changes. I'm trying to change the value of a CDATA Section.

                          <FieldValue
                                                      uuid="{360074f3-3779-4ba7-a407-e5c51ffb9093}"
                                                      browseName="ServerIp"
                          >
                          <![CDATA[127.0.0.1]]>
                          </FieldValue>
                          
                          <FieldValue
                                                      uuid="{4e79b8c6-d431-412f-bf04-24e3d320f7f3}"
                                                      browseName="ServerPort"
                          >
                                                          54323
                          
                           </FieldValue>
                          

                          I'm trying ti change the value 127.0.0.1 in the CDATA Section and the Port 54323 to 562. Now in the QDOM I find only setAttribute to change the Attributes of an Element. are there Methods in QDOM to manipulate a CDATA Section and the actual content of the Element like 54323 and not only the attributes?

                          UPDATE
                          For the Port Value I can use the setNodeValue Method. Pardon that was a stupid question. I'm still puzzled with the CDATA Section though.

                          J Offline
                          J Offline
                          JonB
                          wrote on 23 Jul 2020, 08:45 last edited by JonB
                          #12

                          @JohnSRV
                          You need to look at methods of QDomNode. Get a hold of (a reference to) the FieldValue node. Verify it's a CDATA node type:

                          QDomNode node = getReferenceToFieldValueNodeByWhateverMeans();
                          # I *think* the `CDATA` node is actually the first child of the `FieldValue` node
                          # rather than the `FieldValue` node itself, but you'll have to check
                          QDomNode cdata = node.firstChild();
                          Q_ASSERT(cdata.isCDATASection());
                          cdata.setNodeValue("192.168.0.1");
                          

                          I do not promise this is just what you need, you'll have to read the docs and play around, but this is the approach.

                          And btw: QDomDocument inherits from QDomNode. Earlier I said that to save it you need to use a toString/ByteArray() method on it. I now see there is https://doc.qt.io/qt-5/qdomnode.html#save, you can use that to save the whole of the QDomDocument to a QTextStream (which can be attached to a file).

                          1 Reply Last reply
                          2
                          • S Offline
                            S Offline
                            Suliman123
                            wrote on 15 Mar 2024, 11:10 last edited by
                            #13

                            Do you need exactly Qt solution? You could check this sample https://redata.dev/smartxml/docs/how-to-parse-xml-into-sql-or-json.html

                            it's not QT, but allow parse big filex

                            1 Reply Last reply
                            0

                            • Login

                            • Login or register to search.
                            • First post
                              Last post
                            0
                            • Categories
                            • Recent
                            • Tags
                            • Popular
                            • Users
                            • Groups
                            • Search
                            • Get Qt Extensions
                            • Unsolved