Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

Unexpected result from QTextCodec::canEncode(QString&)



  • Dear Qt-ists,

    I have created a QTextCodec instance, named codec, for the encoding "US-ASCII", using the method QTextCodec::codecForName.

    Now, I have a QString object str that contains symbols that cannot be encoded using this encoding (© and é, precisely). Indeed, these symbols are replaced with character ? in the QByteArray resulting from a call to codec->fromUnicode(str).

    However the call to codec->canEncode(str) yields true, which I find rather counterintuitive as a result.

    Is this expected behaviour ? If so, then I suppose that the documentation of the method canEncode should be expanded.


  • Lifetime Qt Champion

    Hi and welcome to devnet,

    What version of Qt are you using ?
    On what platform ?

    Can you post a minimal compilable sample code that reproduces that ?


  • Moderators

    @DDEH said in Unexpected result from QTextCodec::canEncode(QString&):

    I have created a QTextCodec instance, named codec, for the encoding "US-ASCII", using the method QTextCodec::codecForName.

    What do you get when you do qDebug() << codec-> mibEnum() << codec->name(); ?

    Indeed, these symbols are replaced with character ? in the QByteArray resulting from a call to codec->fromUnicode(str).

    1. How did you create the unicode string?
    2. How did you display the encoded string?
    3. Call toHex() on your QByteArray. Are the ? characters 0x3F?


  • @SGaist said in Unexpected result from QTextCodec::canEncode(QString&):

    Hi and welcome to devnet,

    Hi,

    Thanks for the welcome.

    What version of Qt are you using ?

    5.5.1

    On what platform ?

    linux-x86_64

    Can you post a minimal compilable sample code that reproduces that ?

    Yes, kind of.

    main.cpp

    #include "monediteur.h"
    #include <QApplication>
    
    int main(int argc, char *argv[])
    {
        QApplication a(argc, argv);
        MonEditeur w;
        w.show();
    
        return a.exec();
    }
    

    monediteur.h

    #ifndef MONEDITEUR_H
    #define MONEDITEUR_H
    
    #include <QMainWindow>
    
    namespace Ui {
    class MonEditeur;
    }
    
    class MonEditeur : public QMainWindow
    {
        Q_OBJECT
    
    public:
        explicit MonEditeur(QWidget *parent = 0);
        ~MonEditeur();
    
    private:
        Ui::MonEditeur *ui;
    
    private slots:
        void process();
    };
    
    #endif // MONEDITEUR_H
    

    monediteur.cpp

    #include "monediteur.h"
    #include "ui_monediteur.h"
    
    #include <QTextCodec>
    #include <QDebug>
    
    MonEditeur::MonEditeur(QWidget *parent) :
        QMainWindow(parent),
        ui(new Ui::MonEditeur)
    {
        ui->setupUi(this);
        connect(ui->pushButton, SIGNAL(clicked(bool)), this, SLOT(process()));
    }
    
    MonEditeur::~MonEditeur()
    {
        delete ui;
    }
    
    void MonEditeur::process()
    {
        QString codecId("US-ASCII");
        const QString contents= ui->textEditor->toPlainText();
        QTextCodec* codec=QTextCodec::codecForName(codecId.toLatin1());
        qDebug() << "Some attributes of this codec:";
        qDebug() << codec-> mibEnum() << codec->name();
        if (codec->canEncode(contents)) {
            qDebug() << codecId << " can encode the contents";
            qDebug() << contents;
            QByteArray ba=codec->fromUnicode(contents);
            QString check=codec->toUnicode(ba);
            qDebug() << "check: ";
            qDebug() << check;
            qDebug() << "--";
            QByteArray hex = ba.toHex();
            qDebug() << "toHex: ";
            qDebug() << hex;
        }
    }
    

    monediteur.ui

    <?xml version="1.0" encoding="UTF-8"?>
    <ui version="4.0">
     <class>MonEditeur</class>
     <widget class="QMainWindow" name="MonEditeur">
      <property name="geometry">
       <rect>
        <x>0</x>
        <y>0</y>
        <width>381</width>
        <height>324</height>
       </rect>
      </property>
      <property name="windowTitle">
       <string>MonEditeur</string>
      </property>
      <widget class="QWidget" name="centralWidget">
       <widget class="QPlainTextEdit" name="textEditor">
        <property name="geometry">
         <rect>
          <x>0</x>
          <y>0</y>
          <width>381</width>
          <height>241</height>
         </rect>
        </property>
       </widget>
       <widget class="QPushButton" name="pushButton">
        <property name="geometry">
         <rect>
          <x>130</x>
          <y>250</y>
          <width>80</width>
          <height>25</height>
         </rect>
        </property>
        <property name="text">
         <string>Process</string>
        </property>
       </widget>
      </widget>
      <widget class="QToolBar" name="mainToolBar">
       <attribute name="toolBarArea">
        <enum>TopToolBarArea</enum>
       </attribute>
       <attribute name="toolBarBreak">
        <bool>false</bool>
       </attribute>
      </widget>
      <widget class="QStatusBar" name="statusBar"/>
     </widget>
     <layoutdefault spacing="6" margin="11"/>
     <resources/>
     <connections/>
    </ui>
    


  • Thanks for taking some of your time to look at this issue.

    My answers are embedded in your post.
    @JKSH said in Unexpected result from QTextCodec::canEncode(QString&):

    @DDEH said in Unexpected result from QTextCodec::canEncode(QString&):

    I have created a QTextCodec instance, named codec, for the encoding "US-ASCII", using the method QTextCodec::codecForName.

    What do you get when you do qDebug() << codec-> mibEnum() << codec->name(); ?

    3 "US-ASCII"

    Indeed, these symbols are replaced with character ? in the QByteArray resulting from a call to codec->fromUnicode(str).

    1. How did you create the unicode string?

    The string is the result of toPlainText() froma qTextEdit instance.

    1. How did you display the encoded string?

    With qDebug() for instance.

    1. Call toHex() on your QByteArray. Are the ? characters 0x3F?

    Yes they are.

    The output of the program I posted in my previous post is the following:

    Some attributes of this codec:
    3 "US-ASCII"
    "US-ASCII"  can encode the contents
    "© André Cymone"
    check: 
    "? Andr? Cymone"
    --
    toHex: 
    "3f20416e64723f2043796d6f6e65"
    

  • Moderators

    @DDEH said in Unexpected result from QTextCodec::canEncode(QString&):

    The output of the program I posted in my previous post is the following:

    Some attributes of this codec:
    3 "US-ASCII"
    "US-ASCII"  can encode the contents
    "© André Cymone"
    check: 
    "? Andr? Cymone"
    --
    toHex: 
    "3f20416e64723f2043796d6f6e65"
    

    Looks like you found some incorrect behaviour; I agree that canEncode() should return false in your example.

    If it still behaves the same in the latest release (Qt 5.11.1), then you can submit a bug report at https://bugreports.qt.io/. However, I'm guessing that the report will be given low priority since US-ASCII is not a recommended encoding nowadays. (The devs are already putting all their time and energy into fixing much more serious bugs and adding new features)



  • @JKSH Thanks.

    I would not bet that the bug is restricted to this encoding. But I will investigate this later and possibly report a bug to the correct venue.

    Thanks again for your time and attention.


Log in to reply