Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Special characters in XML files created with QDomDocument

Special characters in XML files created with QDomDocument

Scheduled Pinned Locked Moved General and Desktop
21 Posts 6 Posters 20.4k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Z Offline
    Z Offline
    zester
    wrote on last edited by
    #4

    The out put that your getting that you say you don't want use that as your string.

    1 Reply Last reply
    0
    • I Offline
      I Offline
      ionwind
      wrote on last edited by
      #5

      If I remember correctly I tried that before, then the output was something like
      @<Cell><Data ss:Type="String">Foo&ampamp;&ampamp;#10;Bar</Data></Cell>@
      But I'll try I tomorrow just to make sure...

      1 Reply Last reply
      0
      • Z Offline
        Z Offline
        zester
        wrote on last edited by
        #6

        Hmmmm I am not sure how it would be done on Windows.

        But from reading I see that...

        In Windows applications, a new line is normally stored as a pair of characters: carriage return (CR) and line feed (LF). In Unix applications, a new line is normally stored as an LF character. Macintosh applications also use an LF to store a new line.

        XML stores a new line as LF.

        1 Reply Last reply
        0
        • G Offline
          G Offline
          goetz
          wrote on last edited by
          #7

          This should do the trick:

          @
          QString contents = "Foo\nBar";
          QDomText cellContents = doc->createTextNode(contents);
          data.appendChild(cellContents);
          @

          Wether the newline is put as an numeric entity (&#10;) or as a literal character is irrelevant. XML wise, both are equivalent.

          [EDIT:] PS:
          createTextNode() manipulates the input text in such a way, that when the document is parsed again the then retrieved string is exactly the same as your original one. So, every single ampersand (&) is transformed into a &amp;. The same holds for "<" and ">", and in some cases the single and double quotation marks (' and ").

          http://www.catb.org/~esr/faqs/smart-questions.html

          1 Reply Last reply
          0
          • Z Offline
            Z Offline
            zester
            wrote on last edited by
            #8

            I'm not sure that will work volker but then again I could be wrong.

            What the OP was getting at was that excel needs to uses special characters that are illegal in xml and the parser is converting the characters in such away to make them legal, which end's up not being the proper format for excel.

            In my previous post I was back tracking trying to figure out why that maybe.

            Excel supports XML-based files, but the file must conform to some rules (excel XML sheet XSD schema).
            http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats#Excel_XML_Spreadsheet_example

            Not sure if that will help ;)

            1 Reply Last reply
            0
            • Chris KawaC Offline
              Chris KawaC Offline
              Chris Kawa
              Lifetime Qt Champion
              wrote on last edited by
              #9

              Why not just put your text inside CDATA section i.e. instead of QDomText use QDomCDATASection?
              &, < and newlines are all valid text inside that markup.

              1 Reply Last reply
              0
              • I Offline
                I Offline
                ionwind
                wrote on last edited by
                #10

                Thanks for all the suggestions, but unfortunately none of them seems to work so far :(

                zester's got the right idea there what I'm up to here ;)

                I had tried before what Volker suggested. \n in the string does this to the output file:
                @<Cell><Data ss:Type="String">Foo
                Bar</Data></Cell>@
                While that's right it's not what I'm after here, since in Excel that shows up as a space between Foo and Bar, not a newline.

                I was also wondering if CDATASection suggested by Krzysztof would work. I was rather hopeful when I saw this to appear in the output file:
                @<Cell><Data ss:Type="String"><![CDATA[Foo&amp#10;Bar]]></Data></Cell>@
                But unfortunately, when opening the output file in Excel, the contents of the cell were still Foo&#10;Bar instead of Foo and Bar being on separate lines.

                If anyone got any more suggestions I'd be happy to try them, but I guess it's simpler if I just arrange the output file so that I don't need the damn newlines :P

                1 Reply Last reply
                0
                • Z Offline
                  Z Offline
                  zester
                  wrote on last edited by
                  #11

                  It's a case of your damed if you do and damed if you don't thank microsoft
                  for there non-standard ways lol ;) I will keep looking I have run into this problem
                  before I just can't for the life of me remember how I solved it. Maybe when volker
                  get's a free minute from the Qt Summit he can enlighten us.

                  1 Reply Last reply
                  0
                  • J Offline
                    J Offline
                    jim_kaiser
                    wrote on last edited by
                    #12

                    So, sounds like you needs ASCII char 10 in your data. Did you try this?

                    @ QString contents = QString("Foo %1 Bar").arg((char)10); @

                    Remove the spaces around %1 .. seems code tag is eating it without the spaces..

                    [ Edit: No different from putting a \n ofcourse.. that doesn't work in excel?? ]

                    [ Edit2: Just use a string with \n while in Qt.. in the end when you save to an excel file... convert characters to excels format. That should work no? Okay my bad.. i see what you need.. the special chars from excel in Qt.. ]

                    1 Reply Last reply
                    0
                    • Chris KawaC Offline
                      Chris KawaC Offline
                      Chris Kawa
                      Lifetime Qt Champion
                      wrote on last edited by
                      #13

                      When you use CDATA everything you put there will be used as is, without any parsing so no wonder it shows untransformed &#10;
                      Maybe you tried it already, but I think the mix of those two solutions i.e. CDATA and \n should work.

                      [Edit.] Heh, just a wild idea just now if that doesn't work. You can combine CDATA and text like so:
                      @ <![CDATA[some text]]>&#10;<![CDATA[some more text]]> @

                      1 Reply Last reply
                      0
                      • G Offline
                        G Offline
                        goetz
                        wrote on last edited by
                        #14

                        Guys! Get you some book about basic XML!

                        XML wise all this is equivalent:

                        @
                        &#10; == '\n' == QChar(10)
                        @

                        It is completely irrelevant if you put &#10; or '\n' into an XML parser! If it makes a difference, the parser is not standards compliant.

                        And just to prove: Make a hand crafted xlsx file an just put a newline in it. At least in OpenOffice it is displayed with a line break. I didn't have a "genuine" Excel at hand to test.

                        http://www.catb.org/~esr/faqs/smart-questions.html

                        1 Reply Last reply
                        0
                        • I Offline
                          I Offline
                          ionwind
                          wrote on last edited by
                          #15

                          jim_kaiser, just to mention that \n doesn't work in Excel. Anyway the last solution you posted would work of course. I take it you mean something like this:
                          @QFile exportFile(fileName);
                          if(exportFile.open(QIODevice::WriteOnly))
                          {
                          QTextStream TextStream(&exportFile);
                          QString docString = doc->toString().replace("&ampamp;", "&");
                          TextStream << docString;
                          exportFile.close();
                          }@
                          I could do it that way, yes. In my case the exported XML files have thousands of lines and the largest ones have "only" 30-50 thousand lines, so it's not a problem, even though searching and replacing bits of that large string just isn't programmatically good-looking ;P

                          Krzysztof, thanks for the suggestions but so far any combination of CDATA and any newline sequence doesn't seem to do the trick :\

                          1 Reply Last reply
                          0
                          • Chris KawaC Offline
                            Chris KawaC Offline
                            Chris Kawa
                            Lifetime Qt Champion
                            wrote on last edited by
                            #16

                            @Volker It doesn't make a difference if it's in a text node, but id does if it's in the CDATA node. &#10; is a 4-letter string there and not a newline mark.
                            Other than that what you said is absolutely true - that's why I think \n (or QChar(10) which is not necessarily the same with windows 2-char newlines is it?) should work just fine inside CDATA node(used to have those & and <), but if ionwind says it doesn't than I don't really know..

                            1 Reply Last reply
                            0
                            • Z Offline
                              Z Offline
                              zester
                              wrote on last edited by
                              #17

                              @Volker when ubuntu finishes there Qt Unity Interface and all those gtk developers turn into qt developers,
                              and you have 40,000 american ubuntu users swarming the qt forums wanting help learning qt.

                              There going to chew you up and spit you out my friend, lol If I was you I would work on my people skills.
                              and how do I know this because I did gtk development support for ubuntu at one time I had a single thread
                              about gtk programming. It received 142,000 hits and 3,000 comments and everyone wanted to be kindly spoon-feed.
                              If you want proof of my claims just ask I am sure I can dig that old thread up.

                              just saying ;)

                              Number of Ubuntu users = 12+ million :)
                              Ubuntu's current focus = Qt

                              1 Reply Last reply
                              0
                              • I Offline
                                I Offline
                                ionwind
                                wrote on last edited by
                                #18

                                In case someone wants to try it out, here's a xml file that can be opened in Excel. On the first cell (that couldn't be made with Qt, so far) there's a newline that Excel understands, the other three are suggestions made here that can be produced by Qt but won't work in Excel.
                                Thanks for everyone anyway ;D
                                @<?xml version="1.0"?>
                                <?mso-application progid="Excel.Sheet"?>
                                <Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
                                <Worksheet ss:Name="Newline">
                                <Table>
                                <Row>
                                <Cell><Data ss:Type="String">Foo&amp#10;Bar</Data></Cell>
                                <Cell><Data ss:Type="String">Foo&ampamp;#10;Bar</Data></Cell>
                                <Cell><Data ss:Type="String">Foo<![CDATA[&amp#10;]]>Bar</Data></Cell>
                                <Cell><Data ss:Type="String">Foo\nBar</Data></Cell>
                                </Row>
                                </Table>
                                </Worksheet>
                                </Workbook>@

                                [Edit] There should be xmlns="urn:schemas-microsoft-com:office:spreadsheet" in Workbook element too, but for some reason it won't appear in the code :|

                                1 Reply Last reply
                                0
                                • M Offline
                                  M Offline
                                  mlong
                                  wrote on last edited by
                                  #19

                                  You could always render your well-formed XML into a QString (or QByteArray, maybe?) and then use a QRegExp to malform it into something hideous that Excel can understand (i.e., replace all instances of &#10; with )

                                  It's a dirty hack, and not without it's limitations, but it might work...

                                  Software Engineer
                                  My views and opinions do not necessarily reflect those of anyone -- living or dead, real or fictional -- in this universe or any other similar multiverse node. Void where prohibited. Your mileage may vary. Caveat emptor.

                                  1 Reply Last reply
                                  0
                                  • G Offline
                                    G Offline
                                    goetz
                                    wrote on last edited by
                                    #20

                                    Hi,

                                    the following example works for me.

                                    It creates an xlsx file with two cells. The first contains a text without line breaks, the second cell has a line break after each word.

                                    It's displayed correctly in OpenOffice here. As I'm in a train now and have no access to a windows box, I cannot check on genuine Microsoft Office's Excel - I'll do that on Tuesday. I'd be happy to get feedback though.

                                    As you can see, it's all Qt DOM methods in use, no fancy search and replace magic.

                                    @
                                    #include <QApplication>
                                    #include <QDebug>
                                    #include <QtXml>
                                    #include <QFile>
                                    #include <QTextStream>

                                    QDomElement appendDomNode(QDomNode &parent, const QString &name, const QString &content = QString())
                                    {
                                    QDomDocument doc = parent.ownerDocument();
                                    QDomElement elem = doc.createElement(name);
                                    if(!content.isNull()) {
                                    QDomCharacterData pcdata = doc.createTextNode(content);
                                    elem.appendChild(pcdata);
                                    }
                                    parent.appendChild(elem);
                                    return elem;
                                    }

                                    int main(int argc, char *argv[])
                                    {
                                    QApplication a(argc, argv);

                                    QDomDocument doc;
                                    QDomProcessingInstruction xmlVers = doc.createProcessingInstruction("xml", "version=\"1.0\" encoding='utf-8'");
                                    doc.appendChild(xmlVers);
                                    
                                    QDomElement Workbook = doc.createElementNS("urn:schemas-microsoft-com:office:spreadsheet", "ss:Workbook");
                                    doc.appendChild(Workbook);
                                    
                                    QDomElement Styles = appendDomNode(Workbook, "ss:Styles");
                                    QDomElement Style = appendDomNode(Styles, "ss:Style");
                                    Style.setAttribute("ss:ID", "1");
                                    
                                    QDomElement Worksheet = appendDomNode(Workbook, "ss:Worksheet");
                                    Worksheet.setAttribute("ss:Name", "Sheet1");
                                    
                                    QDomElement Table = appendDomNode(Worksheet, "ss:Table");
                                    
                                    QDomElement Column_1 = appendDomNode(Table, "ss:Column");
                                    Column_1.setAttribute("ss:Width", "80");
                                    
                                    QDomElement Column_2 = appendDomNode(Table, "ss:Column");
                                    Column_2.setAttribute("ss:Width", "80");
                                    
                                    QDomElement Row = appendDomNode(Table, "ss:Row");
                                    Row.setAttribute("ss:StyleID", "1");
                                    
                                    QDomElement Cell_1 = appendDomNode(Row, "ss:Cell");
                                    QDomElement Data_1 = appendDomNode(Cell_1, "ss:Data", "A string without line breaks");
                                    Data_1.setAttribute("ss:Type", "String");
                                    
                                    QDomElement Cell_2 = appendDomNode(Row, "ss:Cell");
                                    QDomElement Data_2 = appendDomNode(Cell_2, "ss:Data", "A\nstring\nwith\nline\nbreaks");
                                    Data_2.setAttribute("ss:Type", "String");
                                    
                                    qDebug() << doc.toString();
                                    
                                    QFile xlsxFile&#40;"/tmp/qttest.xlsx"&#41;;
                                    xlsxFile.open(QIODevice::WriteOnly);
                                    QTextStream ts(&xlsxFile);
                                    doc.save(ts, 4);
                                    xlsxFile.close();
                                    
                                    return 0;
                                    

                                    }
                                    @

                                    http://www.catb.org/~esr/faqs/smart-questions.html

                                    1 Reply Last reply
                                    0
                                    • G Offline
                                      G Offline
                                      goetz
                                      wrote on last edited by
                                      #21

                                      Here are the results of the checks with Microsoft's Office Excel:

                                      • one has to rename the file from .xlsx to .xml to be able to import it
                                      • using a literal newline character ('\n') in the strings does not work
                                      • replacing the literal newline with '&#10;' (aka entity) does work
                                      • one must put a suitable cell format into the xml to enable line breaks

                                      So, the conclusion is, that the XML parser that is used by Excel to handle the file is definitely broken and does not follow the standards!

                                      You can easily test this by using xmllint: Save a file from excel in XML format, have xmllint parse it and save the result (this replaces all '&#10;' with literal newlines) and try to open the file in Excel. It will fail.

                                      Unfortunately I see no chance in getting some easy support for this case into Qt. You would have to patch method "static QString encodeText()" in qdom.cpp to escape newlines.

                                      http://www.catb.org/~esr/faqs/smart-questions.html

                                      1 Reply Last reply
                                      0

                                      • Login

                                      • Login or register to search.
                                      • First post
                                        Last post
                                      0
                                      • Categories
                                      • Recent
                                      • Tags
                                      • Popular
                                      • Users
                                      • Groups
                                      • Search
                                      • Get Qt Extensions
                                      • Unsolved