What's the best way to parse a huge XML Document?
-
Hello Everyone,
I'm trying to process a huge XML Document where I search a specific Tag and then change the Value of the Element. Here's an example
<FieldValue uuid="{fd660de7-c298-4a74-a69e-c05a597796c8}" browseName="InputBuffer" > 2000 <UnevaluatedInitialValue> </UnevaluatedInitialValue> <EvaluatedInitialValue> </EvaluatedInitialValue> <InitialValue> </InitialValue> </FieldValue>
Let's say I want to set the value of the Input Buffer to 5000. The Problem is that the Document has literally tens of thousands of <FieldValue> Tags. Is there a way to search the Tag I need based on the value of its Attributes like browseName = "InputBuffer" and extract that particular tag that I need?
The QDomDocument Class doesn't seem to be able to achieve this. Neither the QXMLStreamReader.
Thanks !
-
Hi
Well that is what XPath do/was made for.
https://www.w3schools.com/xml/xpath_intro.asp -
Hello Everyone,
I'm trying to process a huge XML Document where I search a specific Tag and then change the Value of the Element. Here's an example
<FieldValue uuid="{fd660de7-c298-4a74-a69e-c05a597796c8}" browseName="InputBuffer" > 2000 <UnevaluatedInitialValue> </UnevaluatedInitialValue> <EvaluatedInitialValue> </EvaluatedInitialValue> <InitialValue> </InitialValue> </FieldValue>
Let's say I want to set the value of the Input Buffer to 5000. The Problem is that the Document has literally tens of thousands of <FieldValue> Tags. Is there a way to search the Tag I need based on the value of its Attributes like browseName = "InputBuffer" and extract that particular tag that I need?
The QDomDocument Class doesn't seem to be able to achieve this. Neither the QXMLStreamReader.
Thanks !
@JohnSRV
Be aware that if you are wanting to "change the Value of the Element", you are going to have to rewrite the whole document. If you don't want to use low-levelQXmlStreamReader/Writer
to do the searching yourself while reading/writing, you are going to have to read the whole document into memory for editing (and then save back). So just how "huge" is your document? :) -
Hi
Well that is what XPath do/was made for.
https://www.w3schools.com/xml/xpath_intro.asp@mrjj Thanks
XPath seems what i need for this task. But how can i properly use it in my Qt Application. I read the documentation on how to run XQuery from Qt Application but I can't find anything on XPath. Is it used the same way as XQuery?
Plus, the documentation of XQuery describes how to parse an XML File and write the Result into a new File. I need to replace Values in the same document.
-
@mrjj Thanks
XPath seems what i need for this task. But how can i properly use it in my Qt Application. I read the documentation on how to run XQuery from Qt Application but I can't find anything on XPath. Is it used the same way as XQuery?
Plus, the documentation of XQuery describes how to parse an XML File and write the Result into a new File. I need to replace Values in the same document.
-
@JohnSRV
Be aware that if you are wanting to "change the Value of the Element", you are going to have to rewrite the whole document. If you don't want to use low-levelQXmlStreamReader/Writer
to do the searching yourself while reading/writing, you are going to have to read the whole document into memory for editing (and then save back). So just how "huge" is your document? :)@JonB The Problem with the Document is the huge amout of Tags having the same Tag name. I have over 5000 Tags <FieldValue>. I just need to manipulate the one with the attribute browsername = "InputBuffer".
Getting all the these Tags in a QNodeList is absurd. XPath seems to give one the possibility to extract a Tag with a specific Attribute. But as you pointed out I need rewrite the whole Document.
PS: the XML Document has 13 Ko. Problem is as I mentioned that it uses the same Tag name over and over again so I need to sort by attribute not by Tag name.
-
@JonB The Problem with the Document is the huge amout of Tags having the same Tag name. I have over 5000 Tags <FieldValue>. I just need to manipulate the one with the attribute browsername = "InputBuffer".
Getting all the these Tags in a QNodeList is absurd. XPath seems to give one the possibility to extract a Tag with a specific Attribute. But as you pointed out I need rewrite the whole Document.
PS: the XML Document has 13 Ko. Problem is as I mentioned that it uses the same Tag name over and over again so I need to sort by attribute not by Tag name.
@JohnSRV said in What's the best way to parse a huge XML Document?:
But as you pointed out I need rewrite the whole Document.
Yep, as long as you understand that it's fine.
the XML Document has 13 Ko
LOL, that's a joke, it's not "huge" at all!! I thought you mean it might me more like 13GB, and then you might have memory issues.... So what's the problem reading it into memory? From a brief search, if you want to use XPath there is https://stackoverflow.com/questions/56062025/search-for-nodes-in-a-qdomdocument-using-xpath. But I think you're supposed to look at using https://doc.qt.io/qt-5/qxmlquery.html, maybe https://stackoverflow.com/questions/34852028/trying-to-get-a-string-out-of-an-xml-document-with-qxmlquery is worth a read.
I admit I too am not sure how it fits together with current Qt XML offerings, hopefully these are some links to get you going.
-
Hello Everyone,
I'm trying to process a huge XML Document where I search a specific Tag and then change the Value of the Element. Here's an example
<FieldValue uuid="{fd660de7-c298-4a74-a69e-c05a597796c8}" browseName="InputBuffer" > 2000 <UnevaluatedInitialValue> </UnevaluatedInitialValue> <EvaluatedInitialValue> </EvaluatedInitialValue> <InitialValue> </InitialValue> </FieldValue>
Let's say I want to set the value of the Input Buffer to 5000. The Problem is that the Document has literally tens of thousands of <FieldValue> Tags. Is there a way to search the Tag I need based on the value of its Attributes like browseName = "InputBuffer" and extract that particular tag that I need?
The QDomDocument Class doesn't seem to be able to achieve this. Neither the QXMLStreamReader.
Thanks !
@JohnSRV This sounds like more of a job for a database. Although some people misuse XML that way, it was not designed as a substitute for a database. Or are you forced to use XML for some reason?
-
@JohnSRV This sounds like more of a job for a database. Although some people misuse XML that way, it was not designed as a substitute for a database. Or are you forced to use XML for some reason?
@Robert-Hairgrove Hey I'm actually forced to use xml. The file I'm trying to manipule is a project in epf Format which used by some softwares for virtual commissioning. What I'm trying to do is to build a gui where the User gives the Ip Address of the Control system he wants to connect to. This will be written in the Project file which can be handeld as an XML File.
@JonB I used DOM to go over the Tags in the file and extract the ones i want to change. You're right the file is not huge and doesn't cause any memory issues. What I'm still trying to find out is how to write the new File. I know i have to rewrite the whole thing but with new attribute Values for some Tags . What's the better way to do it? use DOM with a loop ? or XMLStreamwriter ? The XMLStreamWriter Doc wasn't helpful.
-
@Robert-Hairgrove Hey I'm actually forced to use xml. The file I'm trying to manipule is a project in epf Format which used by some softwares for virtual commissioning. What I'm trying to do is to build a gui where the User gives the Ip Address of the Control system he wants to connect to. This will be written in the Project file which can be handeld as an XML File.
@JonB I used DOM to go over the Tags in the file and extract the ones i want to change. You're right the file is not huge and doesn't cause any memory issues. What I'm still trying to find out is how to write the new File. I know i have to rewrite the whole thing but with new attribute Values for some Tags . What's the better way to do it? use DOM with a loop ? or XMLStreamwriter ? The XMLStreamWriter Doc wasn't helpful.
@JohnSRV
Once you have aQDomDocument
, and you have made the desired changes into it, you haveQByteArray QDomDocument::toByteArray(int indent = 1) const
orQString QDomDocument::toString(int indent = 1) const
to save back to file.Your alternative is to use
QXmlStreamReader
and, while you are reading,QXmlStreamWriter
to copy to output as you go. But then you must do your searching/replacing "on the fly" as you are reading the nodes, incrementally. -
@JohnSRV
Once you have aQDomDocument
, and you have made the desired changes into it, you haveQByteArray QDomDocument::toByteArray(int indent = 1) const
orQString QDomDocument::toString(int indent = 1) const
to save back to file.Your alternative is to use
QXmlStreamReader
and, while you are reading,QXmlStreamWriter
to copy to output as you go. But then you must do your searching/replacing "on the fly" as you are reading the nodes, incrementally.@JonB Thanks. I'm actually still stuck at making the changes. I'm trying to change the value of a CDATA Section.
<FieldValue uuid="{360074f3-3779-4ba7-a407-e5c51ffb9093}" browseName="ServerIp" > <![CDATA[127.0.0.1]]> </FieldValue> <FieldValue uuid="{4e79b8c6-d431-412f-bf04-24e3d320f7f3}" browseName="ServerPort" > 54323 </FieldValue>
I'm trying ti change the value 127.0.0.1 in the CDATA Section and the Port 54323 to 562. Now in the QDOM I find only setAttribute to change the Attributes of an Element. are there Methods in QDOM to manipulate a CDATA Section and the actual content of the Element like 54323 and not only the attributes?
UPDATE
For the Port Value I can use the setNodeValue Method. Pardon that was a stupid question. I'm still puzzled with the CDATA Section though. -
@JonB Thanks. I'm actually still stuck at making the changes. I'm trying to change the value of a CDATA Section.
<FieldValue uuid="{360074f3-3779-4ba7-a407-e5c51ffb9093}" browseName="ServerIp" > <![CDATA[127.0.0.1]]> </FieldValue> <FieldValue uuid="{4e79b8c6-d431-412f-bf04-24e3d320f7f3}" browseName="ServerPort" > 54323 </FieldValue>
I'm trying ti change the value 127.0.0.1 in the CDATA Section and the Port 54323 to 562. Now in the QDOM I find only setAttribute to change the Attributes of an Element. are there Methods in QDOM to manipulate a CDATA Section and the actual content of the Element like 54323 and not only the attributes?
UPDATE
For the Port Value I can use the setNodeValue Method. Pardon that was a stupid question. I'm still puzzled with the CDATA Section though.@JohnSRV
You need to look at methods ofQDomNode
. Get a hold of (a reference to) theFieldValue
node. Verify it's aCDATA
node type:QDomNode node = getReferenceToFieldValueNodeByWhateverMeans(); # I *think* the `CDATA` node is actually the first child of the `FieldValue` node # rather than the `FieldValue` node itself, but you'll have to check QDomNode cdata = node.firstChild(); Q_ASSERT(cdata.isCDATASection()); cdata.setNodeValue("192.168.0.1");
I do not promise this is just what you need, you'll have to read the docs and play around, but this is the approach.
And btw:
QDomDocument
inherits fromQDomNode
. Earlier I said that to save it you need to use atoString/ByteArray()
method on it. I now see there is https://doc.qt.io/qt-5/qdomnode.html#save, you can use that to save the whole of theQDomDocument
to aQTextStream
(which can be attached to a file). -
Do you need exactly Qt solution? You could check this sample https://redata.dev/smartxml/docs/how-to-parse-xml-into-sql-or-json.html
it's not QT, but allow parse big filex