checking XML starting and ending tag with QDomDocument
-
When Ctrl-C+Ctrl-V in XML descriptor files, lots of <tag2> attrib="ABC" </tag1> kinda errors happen. Reading it with QDomDocument, these elements are simply ignored. How to check and correct these errors? Or at least how to notice these elements with incomplete failing tags?
-
When Ctrl-C+Ctrl-V in XML descriptor files, lots of <tag2> attrib="ABC" </tag1> kinda errors happen. Reading it with QDomDocument, these elements are simply ignored. How to check and correct these errors? Or at least how to notice these elements with incomplete failing tags?
Well, you could read the file node-at-a-time using QXmlStreamReader if the input is supposed to be a known, reasonably straightforward format. Then you will know the first time it does something unexpected.
However, the example you gave cannot be corrected automatically without knowing precisely what rules it should have followed. Given this input,
<tag2> attrib="ABC" </tag1>
is the "correct" input supposed to be:<tag2> attrib="ABC" </tag2>
, the close tag was supposed to match the open tag. The text is perfectly acceptable.<tag1> attrib="ABC" </tag1>
, the open tag was supposed to match the close tag.<tag2 attrib="ABC"> </tag2>
, the text that looks like an attribute actually should be an attribute, with or without with the space in the text, and with either tag1 or tag2. Does either element, tag1 or tag2, allow an attribute called "attrib"?- Other options I haven't guessed at.
If you have a schema , use a validating parser to determine all the ways your input is broken. Even without a schema, a well-formed XML check will pick up some obvious errors. Better still, tell the source of the XML to fix their own output.
-
Well, you could read the file node-at-a-time using QXmlStreamReader if the input is supposed to be a known, reasonably straightforward format. Then you will know the first time it does something unexpected.
However, the example you gave cannot be corrected automatically without knowing precisely what rules it should have followed. Given this input,
<tag2> attrib="ABC" </tag1>
is the "correct" input supposed to be:<tag2> attrib="ABC" </tag2>
, the close tag was supposed to match the open tag. The text is perfectly acceptable.<tag1> attrib="ABC" </tag1>
, the open tag was supposed to match the close tag.<tag2 attrib="ABC"> </tag2>
, the text that looks like an attribute actually should be an attribute, with or without with the space in the text, and with either tag1 or tag2. Does either element, tag1 or tag2, allow an attribute called "attrib"?- Other options I haven't guessed at.
If you have a schema , use a validating parser to determine all the ways your input is broken. Even without a schema, a well-formed XML check will pick up some obvious errors. Better still, tell the source of the XML to fix their own output.
@ChrisW67 Thanks, Chris,
so before starting to process with QDomDocument, I must add a round of "syntax checking" with QXmlStreamReader.The correction is end tag<--start tag, because the creator usually modifies the starting tag at the beginning of a line after block copy ...then forgets to modify the end tag accordingly.
Anyways, if notepad++ is used for editing, it warns of such errors. I only want to add a further checking for possible errors. -