[SOLVED] Encoding UTF-8, QStandardItem and cross-compiling



  • Hello all.

    I'm developing a desktop application on Qt 5.5. I'm on a Ubuntu 14.04 environment, but I need the application to work on both Ubuntu and Windows. For that, I am using MinGW, which works fine.
    My application is accessing some web services which return xml files, encoded in UTF-8, with a lot of non-english characters. I parse the xml using rapidxml, and (tipically) store some information on a QMap<QString, QString> sctructure. The code I use for that is something like this:

    mapaOrgaos[oName.item(i)->text().toUtf8()] = oID.item(i)->text().toUtf8();
    

    Here both oName and oID are instances of QStandardItemModel.
    After that, I add the map keys to a Combo box, like this:

    ui->ComboOrgaos->addItems(mapaOrgaos.keys());
    

    On Ubuntu, everything works fine. But when I cross compile for Windows, the texts shown in the combo box are not correctly encoded (the text shows quotation marks over a black background for non-english characters).
    I've been searching the web for a week trying to find out what might be the problem. One important thing to add is that the encoding works fine for strings that do not pass through a QStandardItemModel object. For example, literal strings (used in labels) or xml results passed directly to a text box show correctly encoded text.
    Any ideas for what can be causing this?



  • I don't understand why are you calling toUtf8(). QString always stores its data as Unicode. Once you've parsed your xml, you shouldn't care about its original encoding anymore.



  • Leonardo,

    yes, you're right. Actually I've put the "toUtf8()" hoping that it might solve the problem, which of course it doesn't.
    Same problem if I just use

    oName.item(i)->text()
    


  • Could you us show some of your parsing code?



  • This post is deleted!


  • @Leonardo

    Sure.
    As I mentioned before, I use rapidxml for the parsing job. It works with c strings, so I created an interface between the parser and the Qt framework, to handle the conversion between QStrings and char *. It goes something like this:

    int clParser::parseXml(QString qstrDoc)  {// qstrDoc holds the XML 
       // I found that the easiest way to handle the conversion is through a QByteArray
       bytes = new QByteArray(qstrDoc.toLocal8Bit().constData(), qstrDoc.toLocal8Bit().size());
       bytes_allocated = true;
    
       try {
          oXml.parse<parse_validate_closing_tags>(bytes->data());
       } catch(parse_error e) {
          return -1;
       }
       if(oXml.first_node()->first_node() == 0) {
          return -2;
       } else {
          return 0;
       }
    }
    

    Now for the actual data extraction part. I created a logic to translate a XML document into a table. The method below feeds a QStandardItemModel with values from the column "qstrColName", belonging to entity "qstrEntName", whose parent is "qstrParentName".

    void clParser::getColumn(QString qstrParentName, QString qstrEntName, QString qstrColName, QStandardItemModel &oCol) {
    
        int i = 0;
        int iDepth;
    
        // Again, the QByteArrays are used to convert between QString and char *
        QByteArray byteParent(qstrParentName.toLocal8Bit().constData(), qstrParentName.toLocal8Bit().size());
        QByteArray byteEnt(qstrEntName.toLocal8Bit().constData(), qstrEntName.toLocal8Bit().size());
        QByteArray byteCol(qstrColName.toLocal8Bit().constData(), qstrColName.toLocal8Bit().size());
    
        char *strColName = byteCol.data();
        char *strParentName = byteParent.data();
        char *strEntName = byteEnt.data();
    
        // Look for the first ocurrence of  entity
        xml_node<> *nodeFirst = oXml.first_node();
        xml_node<> *nodeEnt = find_first(strEntName, iDepth);
        xml_node<> *nodeParent = 0;
        if(nodeEnt==0) { 
           return;
        }
    
        xml_node<> *nodeCol;
        xml_attribute<> *attr;
        while(nodeEnt != 0) {
            // Look for a node with the same name as the column
            nodeCol = nodeEnt->first_node(strColName);
            if(nodeCol == 0) {
                // No node found, I loop through the attributes
                attr = nodeEnt->first_attribute(strColName,byteCol.size(), false);
                if(attr != 0) {
                    i++;
                    // HERE IS WHERE THE DATA GETS ACTUALLY TRANSFERED
                    oCol.appendRow(new QStandardItem());
                    std::string strtmp(attr->value());
                    oCol.item(i-1)->setText(QString::fromStdString(strtmp));
                } else {
                    nodeEnt++;
                }
            } else {
                while(nodeCol != 0) {
                    i++;
                    oCol.appendRow(new QStandardItem());
                    if(nodeCol->value_size() == 0)
                        // Missing data
                        oCol.item(i-1)->setData(QString("NA"));
                    else {
                        // HERE, AGAIN, DATA GETS TRANSFERED
                        std::string strtmp(nodeCol->value());
                        oCol.item(i-1)->setText(QString::fromStdString(strtmp));
                    }
    
                    nodeCol = nodeCol->next_sibling(strColName);
                }
            }
    
            // To treat cases where there are entities and parents with the same name
            if(i > 0) {
                if(nodeEnt->next_sibling(strEntName) == 0 && qstrParentName.compare(QString("")) != 0) {
                    nodeParent = nodeEnt->parent();
                    while(nodeParent != nodeFirst && strcmp(nodeParent->name(), strParentName) != 0) {
                        nodeParent = nodeParent->parent();
                    }
                    if(nodeParent == nodeFirst){
                        return;
                    }
                    nodeParent = nodeParent->next_sibling();
                    if(nodeParent) {
                        nodeEnt = first(strEntName, nodeParent);
                        while(nodeEnt == 0) {
                            nodeParent = nodeParent->next_sibling();
                            if(nodeParent == 0) {
                                break;
                            }
                            nodeEnt = first(strEntName, nodeParent);
                        }
                    } else {
                        break;
                    }
                } else {
                    nodeEnt = nodeEnt->next_sibling(strEntName);
                }
            }
        }
    }
    


  • Hi. Why are you calling toLocal8Bit() and not toUtf8() when converting your xml from QString to char*? As you know for sure it's utf8, you shouldn't rely on the system to get the proper encoding.

    http://doc.qt.io/qt-5/qstring.html#toLocal8Bit



  • Oh well. I didn't have a reason for doing that, it was just plain ignorance.
    I corrected that, and also one more thing. I eliminate the std::string from

    std::string strtmp(attr->value());
    oCol.item(i-1)->setText(QString::fromStdString(strtmp));
    

    Standard c++ strings might not cope well with UTF-8 encoding. So I changed it to

    QString strtmp(attr->value());
    oCol.item(i-1)->setText(strtmp);
    

    These two changes combined have solved it. Thank you very much!



Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.