Please nominate your Qt Champions for 2021!

Strange behavior of QDomDocument::toString()

  • Hello. I have 4.7.2 version and this code:
    @#include <QtCore/QCoreApplication>
    #include <QtXml/QDomDocument>
    #include <QtCore/QDebug>

    int main(int argc, char *argv[])
    QCoreApplication a(argc, argv);

    QString simpleXml = "<node1>&amp;lt;node2/&ampgt;</node1>";
    QDomDocument doc("simple");
    QString simpleXml2 = doc.toString();
    qDebug() << simpleXml2;
    return a.exec&#40;&#41;;


    the result is "<node1>&lt;node2/></node1>";

    instead of &gt; appears greater than sign. How?

  • Why not? I guess it's perfectly valid XML to have a literal '>' if it can be parsed in the right way. Did you use xmllint on that?

  • I mean that ideally, I should get the exact same line, which gave to QDomDocument::setContent(). Why QDomDocument makes unescaping of DomText instead of me.

  • Sorry, but no. I think that QDomDocument constructs the text when you call toString(). It does not keep track if the document was modified or not and if it could perhaps just return whatever was set on it.

  • agree, but I was referring to the above described example. Where could happen modifying of document and what justified such behavior of setContent or toString?

  • Your reply shows that you did not understand my reply. Let me try to rephrase. My idea is that QDomDocument does not keep a string representation of the document. It parses the document into its internal, node-based data structure when you set it using setContent(), and then discards the string. Now, when you request a string representation of the document, such a string is constructed from the internal representation.*

    The document is not modified at all. You are just getting a different representation than you expected of the same document. It is, as peppe pointed out, perfectly valid and represents the same document as the one you first put in. Similary, the order of attributes in an XML document representation is undefined. That means that
    <node arg1="foo" arg2="bar"/>

    represents exactly the same document as
    <node arg2="bar" arg1="foo"/>

    even if the textual representations of the document differ. That's XML for you. Deal with it. Relying on these kind of things to be stable or in a specific form is a bug, IMO.

    *) Note, this is my idea of how QDomDocument works, I did not verify this against the docs or the source code. You probably should do that yourself to be sure.

  • No, no, I did not mean the string representation of the document, I tried to understand why the contents of QDomText changes after parsing and reverse string builder. I wanted to know why, during one of this action only one character is converted from html code to character representation - closing bracket. Why not sign ampersand, not the quotes and not opening bracket - the sign >. It seemed strange to me, and there were thoughts that something is not working as it should. I picked up this discussion to find out - can I expect that the data that I put between the tags and which do not violate validity of XML document will not change after reverse string construction.

  • Final attempt:
    I tried to explain above that the different representations that you get represent the very same document. That is: the contents of your document was not changed. As for why encode one angeled brace, and not the other: as peppe said, why not? It is valid XML, why make the representation longer than needed by writing > instead of > ? Again: I doubt QDomDocument (or any of the nodes, for that matter) keep track of how these were originally encoded.

  • Both string representations are semantically identical, though not literal identical. This is all that matters, everything else is subject to the inner workings of the libs and classes used. As long as you get a semantically identical XML out of what you put in, everything is ok.

Log in to reply