How can I convert Unicode string into Shift-JIS?



  • Hello all,

    I have QString in Unicode format coming from japan country page source.
    I need to convert that Unicode string into Japanese character format.
    I have tried this code but it's crashing my program.

    @
    QString htmlString="Amazonベーシック ハイスピードHDMIケーブル 2.0m (タイプAオス- タイプAオス、イーサネット、
    3D、オーディオリターン対応)";

     QTextCodec *codec = QTextCodec::codecForName("Shift-JIS");
     QByteArray encodedString = codec->fromUnicode(htmlString);
    
    
    QString htmlString1=encodedString.data();
    

    @

    Please help me out where I am doing wrong or any other way to achive this.

    Thanks in advance.


  • Lifetime Qt Champion

    Hi,

    You don't check whether codec is null. Are you sure you have one that support "Shift-JIS" ?



  • Thanks SGaist for quick reply.

    Yes I need to convert into Shift-JIS format.
    There is no content type is set on main page source head tag.This is coming from Japanese country web page which supports Shift-JIS so it will work for me.


  • Moderators

    what SGaist meant was if you have the codec on your system.
    Since QTextCodec::codecForName() returns a null pointer if it can't find the codec which will lead to a crash since you accessing the pointer right in the next line.


  • Lifetime Qt Champion

    I understood you right. My question is: Are you sure the text codec is available ? In my list of text codecs I have Shift_JIS (I don't think the - or _ is really relevant, I tried with both and didn't got a crash.

    Take a look at the output of "QTextCodec::availableCodecs()":http://qt-project.org/doc/qt-4.8/qtextcodec.html#availableCodecs

    Also, try to run application with a debugger to see what happens



  • Here is the output of @ QTextCodec::availableCodecs(); @

    ("UTF-8", "ISO-8859-1", "latin1", "CP819", "IBM819", "iso-ir-100", "csISOLatin1", "ISO-8859-15", "latin9", "UTF-32LE", "UTF-32BE", "UTF-32", "UTF-16LE", "UTF-16BE", "UTF-16", "System", "roman8", "hp-roman8", "csHPRoman8", "TIS-620", "ISO 8859-11", "WINSAMI2", "WS2", "Apple Roman", "macintosh", "MacRoman", "windows-1258", "CP1258", "windows-1257", "CP1257", "windows-1256", "CP1256", "windows-1255", "CP1255", "windows-1254", "CP1254", "windows-1253", "CP1253", "windows-1252", "CP1252", "windows-1251", "CP1251", "windows-1250", "CP1250", "IBM866", "CP866", "csIBM866", "IBM874", "CP874", "IBM850", "CP850", "csPC850Multilingual", "ISO-8859-16", "iso-ir-226", "latin10", "ISO-8859-14", "iso-ir-199", "latin8", "iso-celtic", "ISO-8859-13", "ISO-8859-10", "iso-ir-157", "latin6", "ISO-8859-10:1992", "csISOLatin6", "ISO-8859-9", "iso-ir-148", "latin5", "csISOLatin5", "ISO-8859-8", "ISO 8859-8-I", "iso-ir-138", "hebrew", "csISOLatinHebrew", "ISO-8859-7", "ECMA-118", "greek", "iso-ir-126", "csISOLatinGreek", "ISO-8859-6", "ISO-8859-6-I", "ECMA-114", "ASMO-708", "arabic", "iso-ir-127", "csISOLatinArabic", "ISO-8859-5", "cyrillic", "iso-ir-144", "csISOLatinCyrillic", "ISO-8859-4", "latin4", "iso-ir-110", "csISOLatin4", "ISO-8859-3", "latin3", "iso-ir-109", "csISOLatin3", "ISO-8859-2", "latin2", "iso-ir-101", "csISOLatin2", "KOI8-U", "KOI8-RU", "KOI8-R", "csKOI8R", "Iscii-Mlm", "Iscii-Knd", "Iscii-Tlg", "Iscii-Tml", "Iscii-Ori", "Iscii-Gjr", "Iscii-Pnj", "Iscii-Bng", "Iscii-Dev", "TSCII", "GB18030", "GBK", "GB2312", "CP936", "MS936", "windows-936", "EUC-JP", "ISO-2022-JP", "Shift_JIS", "JIS7", "SJIS", "MS_Kanji", "EUC-KR", "cp949", "Big5", "Big5-HKSCS", "Big5-ETen", "CP950")

    The program crashing problem is resolved.
    But while trying to append my final string to text box like follows.

    @ui->txtPageSource->appendPlainText(htmlString1);@

    getting same string not converted into Japanese lang.

    while expected Result is:

    Amazonベーシック ハイスピードHDMIケーブル 2.0m (タイプAオス- タイプAオス、イーサネット、3D、オーディオリターン対応)

    I find one thing that my string is containing HTMLcode number and might be required to convert in Unicode.
    I am confused here.


  • Lifetime Qt Champion

    Wait, before going any further... Why not use:

    @ ui->txtPageSource->appendHtml(htmlString);@

    ?

    No need to do any conversion


  • Moderators

    htmlString contains no Japanese characters at all from what I see. There is just plain Latin1 text with HTML escape sequences that should be ignored by QString.



  • Hey SGaist thanks now its working for Text Box.
    But I need to put same result on QTableWidgetItem.

    I am trying but getting same older string format.



  • Thanks Tobias Hunger for reply,

    Can you please suggest me how can I convert this Latin1 text to their respective unicode or html format by which I can get respective Japanese characters at the time of adding string to QTableWidgetItem?



  • Below is sample code that converts from html format string to QString.

    @QString htmlString="Amazonベーシック ...";
    ui->textEdit->setHtml(htmlString);

    QString str;
    QRegExp rx("&#(\d+);");
    int pos1 = 0, pos2 = 0;
    while ((pos2 = rx.indexIn(htmlString, pos2)) != -1) {
    str.append(htmlString.mid(pos1, pos2-pos1));
    str.append(QChar(rx.cap(1).toInt())); // "&#xxxxx;" -> QChar(xxxxx)
    pos2 += rx.matchedLength();
    pos1 = pos2;
    }
    str.append(htmlString.mid(pos1));
    ui->textEdit_2->setText(str);
    @

    Hope it helps.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.