QDomDocument and namespaces
-
Using Qt 5.4.0 on OS X 10.10.
I was looking at the possibility of processing ODF spreadsheet files (ODS) as XML files. Using QDomDocument I loaded the container file 'context.xml' then wrote it back without change as a simple test.
What I found is that the entire document was 'revised' by doing this. It appears that the namespaces (?) have been updated throughout the entire document for one.
For example, this is a snippet from the original:
<table:table-cell table:style-name="ce49" office:value-type="string" calcext:value-type="string"><text:p>Y Axis</text:p></table:table-cell>
This is what it looked like after:
<table:table-cell xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" table:style-name="ce49" calcext:value-type="string" xmlns:calcext="urn:org:documentfoundation:names:experimental:calc:xmlns:calcext:1.0" office:value-type="string" xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"><text:p xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0">Y Axis</text:p></table:table-cell>
This does alter how the document is interpreted when loaded by LibreOffice. Many of the cells have visible differences in appearance that are obvious when viewing the rewritten file.
Is there any way to make QDomDocument a little more passive when reading these files?
-
@Rondog said:
Hi
I was wondering if you used
http://doc.qt.io/qt-5/qdomdocument.html#setContent
with namespaceProcessing to false if it would still insert "extra" when
saved. -
That worked. With 'namespaceProcessing' set to false the entries make more sense. Aside from rearranging the order of the attributes everything looks as it should. Kind of obvious now that I look at it.
There is still something fishy going on though. The document has changed somehow (more than appearance) but I am not sure where ...
-
Okay, I figured out the reason for the difference (even after turning off the namespace processing).
When QDomDocument writes the XML file it does a nice clean job where each 'node' is on a new line, nested, and all that.
The original XML file is a stream without line breaks of any kind.
The line breaks cause the text to be interpreted differently. For example, a cell might contain formatted text, something like "Cell text content" where each word has its own formatting independent of the formatting of the cell.
In the output from QDomDocument these end up on separate lines. When it is read in (by LibreOffice) each change in text style ends up as a new line inside the cell itself.
I didn't think XML files were sensitive to formatting but it looks like they can be in some cases.
-
OK. so inline text is handled strangely.
Do you have a sample with such text ?
I would like to know why. -
I believe the line breaks and indent spacing become part of the cell text. The text sub formatting is put onto different lines but it is within the scope of the original cell text.
I do have a sample. This is from one of the cells in an ODS spreadsheet that has "Uc (k=2):" for text. The first part of the text is using the default cell formatting, the letter 'k' has one set of formatting for whatever reason, and the rest of the text has a second formatting style (italic).
This is how it was originally written in the XML file:
<text:p>Uc (<text:span text:style-name="T1">k</text:span><text:span text:style-name="T2">=2):</text:span>
This next sample is from QDomDocument. Everything between text:p and </text:p> (which includes line breaks and an indent) becomes part of the final text that appears in the cell:
<text:p>Uc (<text:span text:style-name="T1">k</text:span> <text:span text:style-name="T2">=2):</text:span> </text:p>
If I manually change the second version to put everything on the same line in the XML file it works fine so I know this is the reason for the odd behaviour.
The second version is a little easier on the eyes when looking at the document but it causes problems.
-
Thank you.
Hmm, I would not have guess that would alter the reading.
Seems only cosmetic. Guess that text:span is special.