How can I convert Unicode string into Shift-JIS?

  • Hello all,

    I have QString in Unicode format coming from japan country page source.
    I need to convert that Unicode string into Japanese character format.
    I have tried this code but it's crashing my program.

    QString htmlString="Amazonベーシック ハイスピードHDMIケーブル 2.0m (タイプAオス- タイプAオス、イーサネット、

     QTextCodec *codec = QTextCodec::codecForName("Shift-JIS");
     QByteArray encodedString = codec->fromUnicode(htmlString);


    Please help me out where I am doing wrong or any other way to achive this.

    Thanks in advance.

  • Lifetime Qt Champion


    You don't check whether codec is null. Are you sure you have one that support "Shift-JIS" ?

  • Thanks SGaist for quick reply.

    Yes I need to convert into Shift-JIS format.
    There is no content type is set on main page source head tag.This is coming from Japanese country web page which supports Shift-JIS so it will work for me.

  • Moderators

    what SGaist meant was if you have the codec on your system.
    Since QTextCodec::codecForName() returns a null pointer if it can't find the codec which will lead to a crash since you accessing the pointer right in the next line.

  • Lifetime Qt Champion

    I understood you right. My question is: Are you sure the text codec is available ? In my list of text codecs I have Shift_JIS (I don't think the - or _ is really relevant, I tried with both and didn't got a crash.

    Take a look at the output of "QTextCodec::availableCodecs()":

    Also, try to run application with a debugger to see what happens

  • Here is the output of @ QTextCodec::availableCodecs(); @

    ("UTF-8", "ISO-8859-1", "latin1", "CP819", "IBM819", "iso-ir-100", "csISOLatin1", "ISO-8859-15", "latin9", "UTF-32LE", "UTF-32BE", "UTF-32", "UTF-16LE", "UTF-16BE", "UTF-16", "System", "roman8", "hp-roman8", "csHPRoman8", "TIS-620", "ISO 8859-11", "WINSAMI2", "WS2", "Apple Roman", "macintosh", "MacRoman", "windows-1258", "CP1258", "windows-1257", "CP1257", "windows-1256", "CP1256", "windows-1255", "CP1255", "windows-1254", "CP1254", "windows-1253", "CP1253", "windows-1252", "CP1252", "windows-1251", "CP1251", "windows-1250", "CP1250", "IBM866", "CP866", "csIBM866", "IBM874", "CP874", "IBM850", "CP850", "csPC850Multilingual", "ISO-8859-16", "iso-ir-226", "latin10", "ISO-8859-14", "iso-ir-199", "latin8", "iso-celtic", "ISO-8859-13", "ISO-8859-10", "iso-ir-157", "latin6", "ISO-8859-10:1992", "csISOLatin6", "ISO-8859-9", "iso-ir-148", "latin5", "csISOLatin5", "ISO-8859-8", "ISO 8859-8-I", "iso-ir-138", "hebrew", "csISOLatinHebrew", "ISO-8859-7", "ECMA-118", "greek", "iso-ir-126", "csISOLatinGreek", "ISO-8859-6", "ISO-8859-6-I", "ECMA-114", "ASMO-708", "arabic", "iso-ir-127", "csISOLatinArabic", "ISO-8859-5", "cyrillic", "iso-ir-144", "csISOLatinCyrillic", "ISO-8859-4", "latin4", "iso-ir-110", "csISOLatin4", "ISO-8859-3", "latin3", "iso-ir-109", "csISOLatin3", "ISO-8859-2", "latin2", "iso-ir-101", "csISOLatin2", "KOI8-U", "KOI8-RU", "KOI8-R", "csKOI8R", "Iscii-Mlm", "Iscii-Knd", "Iscii-Tlg", "Iscii-Tml", "Iscii-Ori", "Iscii-Gjr", "Iscii-Pnj", "Iscii-Bng", "Iscii-Dev", "TSCII", "GB18030", "GBK", "GB2312", "CP936", "MS936", "windows-936", "EUC-JP", "ISO-2022-JP", "Shift_JIS", "JIS7", "SJIS", "MS_Kanji", "EUC-KR", "cp949", "Big5", "Big5-HKSCS", "Big5-ETen", "CP950")

    The program crashing problem is resolved.
    But while trying to append my final string to text box like follows.


    getting same string not converted into Japanese lang.

    while expected Result is:

    Amazonベーシック ハイスピードHDMIケーブル 2.0m (タイプAオス- タイプAオス、イーサネット、3D、オーディオリターン対応)

    I find one thing that my string is containing HTMLcode number and might be required to convert in Unicode.
    I am confused here.

  • Lifetime Qt Champion

    Wait, before going any further... Why not use:

    @ ui->txtPageSource->appendHtml(htmlString);@


    No need to do any conversion

  • Moderators

    htmlString contains no Japanese characters at all from what I see. There is just plain Latin1 text with HTML escape sequences that should be ignored by QString.

  • Hey SGaist thanks now its working for Text Box.
    But I need to put same result on QTableWidgetItem.

    I am trying but getting same older string format.

  • Thanks Tobias Hunger for reply,

    Can you please suggest me how can I convert this Latin1 text to their respective unicode or html format by which I can get respective Japanese characters at the time of adding string to QTableWidgetItem?

  • Below is sample code that converts from html format string to QString.

    @QString htmlString="Amazonベーシック ...";

    QString str;
    QRegExp rx("&#(\d+);");
    int pos1 = 0, pos2 = 0;
    while ((pos2 = rx.indexIn(htmlString, pos2)) != -1) {
    str.append(htmlString.mid(pos1, pos2-pos1));
    str.append(QChar(rx.cap(1).toInt())); // "&#xxxxx;" -> QChar(xxxxx)
    pos2 += rx.matchedLength();
    pos1 = pos2;

    Hope it helps.

Log in to reply

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.