How can I convert Unicode string into Shift-JIS?
-
Hi,
You don't check whether codec is null. Are you sure you have one that support "Shift-JIS" ?
-
what SGaist meant was if you have the codec on your system.
Since QTextCodec::codecForName() returns a null pointer if it can't find the codec which will lead to a crash since you accessing the pointer right in the next line. -
I understood you right. My question is: Are you sure the text codec is available ? In my list of text codecs I have Shift_JIS (I don't think the - or _ is really relevant, I tried with both and didn't got a crash.
Take a look at the output of "QTextCodec::availableCodecs()":http://qt-project.org/doc/qt-4.8/qtextcodec.html#availableCodecs
Also, try to run application with a debugger to see what happens
-
Here is the output of @ QTextCodec::availableCodecs(); @
("UTF-8", "ISO-8859-1", "latin1", "CP819", "IBM819", "iso-ir-100", "csISOLatin1", "ISO-8859-15", "latin9", "UTF-32LE", "UTF-32BE", "UTF-32", "UTF-16LE", "UTF-16BE", "UTF-16", "System", "roman8", "hp-roman8", "csHPRoman8", "TIS-620", "ISO 8859-11", "WINSAMI2", "WS2", "Apple Roman", "macintosh", "MacRoman", "windows-1258", "CP1258", "windows-1257", "CP1257", "windows-1256", "CP1256", "windows-1255", "CP1255", "windows-1254", "CP1254", "windows-1253", "CP1253", "windows-1252", "CP1252", "windows-1251", "CP1251", "windows-1250", "CP1250", "IBM866", "CP866", "csIBM866", "IBM874", "CP874", "IBM850", "CP850", "csPC850Multilingual", "ISO-8859-16", "iso-ir-226", "latin10", "ISO-8859-14", "iso-ir-199", "latin8", "iso-celtic", "ISO-8859-13", "ISO-8859-10", "iso-ir-157", "latin6", "ISO-8859-10:1992", "csISOLatin6", "ISO-8859-9", "iso-ir-148", "latin5", "csISOLatin5", "ISO-8859-8", "ISO 8859-8-I", "iso-ir-138", "hebrew", "csISOLatinHebrew", "ISO-8859-7", "ECMA-118", "greek", "iso-ir-126", "csISOLatinGreek", "ISO-8859-6", "ISO-8859-6-I", "ECMA-114", "ASMO-708", "arabic", "iso-ir-127", "csISOLatinArabic", "ISO-8859-5", "cyrillic", "iso-ir-144", "csISOLatinCyrillic", "ISO-8859-4", "latin4", "iso-ir-110", "csISOLatin4", "ISO-8859-3", "latin3", "iso-ir-109", "csISOLatin3", "ISO-8859-2", "latin2", "iso-ir-101", "csISOLatin2", "KOI8-U", "KOI8-RU", "KOI8-R", "csKOI8R", "Iscii-Mlm", "Iscii-Knd", "Iscii-Tlg", "Iscii-Tml", "Iscii-Ori", "Iscii-Gjr", "Iscii-Pnj", "Iscii-Bng", "Iscii-Dev", "TSCII", "GB18030", "GBK", "GB2312", "CP936", "MS936", "windows-936", "EUC-JP", "ISO-2022-JP", "Shift_JIS", "JIS7", "SJIS", "MS_Kanji", "EUC-KR", "cp949", "Big5", "Big5-HKSCS", "Big5-ETen", "CP950")
The program crashing problem is resolved.
But while trying to append my final string to text box like follows.@ui->txtPageSource->appendPlainText(htmlString1);@
getting same string not converted into Japanese lang.
while expected Result is:
Amazonベーシック ハイスピードHDMIケーブル 2.0m (タイプAオス- タイプAオス、イーサネット、3D、オーディオリターン対応)
I find one thing that my string is containing HTMLcode number and might be required to convert in Unicode.
I am confused here. -
Wait, before going any further... Why not use:
@ ui->txtPageSource->appendHtml(htmlString);@
?
No need to do any conversion
-
htmlString contains no Japanese characters at all from what I see. There is just plain Latin1 text with HTML escape sequences that should be ignored by QString.
-
Below is sample code that converts from html format string to QString.
@QString htmlString="Amazonベーシック ...";
ui->textEdit->setHtml(htmlString);QString str;
QRegExp rx("&#(\d+);");
int pos1 = 0, pos2 = 0;
while ((pos2 = rx.indexIn(htmlString, pos2)) != -1) {
str.append(htmlString.mid(pos1, pos2-pos1));
str.append(QChar(rx.cap(1).toInt())); // "&#xxxxx;" -> QChar(xxxxx)
pos2 += rx.matchedLength();
pos1 = pos2;
}
str.append(htmlString.mid(pos1));
ui->textEdit_2->setText(str);
@Hope it helps.