Problem with Japanese encoding/display
-
I have a MySQL database with lots of Japanese in it. I usually connect to it with PHP and display the data on a web page with a UTF-8 character set, which works fine.
When I use PHP MyAdmin to view the database, I see this - ä½è—¤ã€€ç´€å - where I should see this - 佐藤 紀子. When I add and retrieve Japanese text, everything works fine, but that PHP MyAdmin view is odd.
My problem here is that I'm retrieving the same data with QT and it's displaying the wrong way. I thought I might simply need to change the encoding of things - maybe the ui file - to UTF-8, but it seems that's the standard encoding in QT Creator.
Any idea what I can do here?
Thanks a lot.
-
Thanks for the reply. Here's the code. The relevant bit is in the query at the bottom. 'snamej' is the bit with the problem.
@#include "db.h"
#include "ui_db.h"db::db(QWidget *parent) :
QMainWindow(parent),
ui(new Ui::db)
{
ui->setupUi(this);
fillStudentDrops();
}db::~db()
{
delete ui;
}void db::on_pushButton_clicked()
{}
void db::fillStudentDrops()
{
connect();
QSqlQuery query;
query.exec ("SELECT sname, id FROM students WHERE old='n' || old='' ORDER BY sname ASC");
while (query.next ()) {
QString sname = query.value (0).toString ();
QVariant id = query.value (1);
qDebug () << sname << id;
QString output = sname + " " +id.toString();
ui->combo1->addItem(output, id);
}
}void db::connect()
{
QSqlDatabase cdb = QSqlDatabase::addDatabase ("QMYSQL");
cdb.setHostName ("xxx.xxx.xxx.xxx");
cdb.setDatabaseName ("mydatabase");
cdb.setUserName ("myname");
cdb.setPassword ("mypassword");
if (!cdb.open()) qDebug() << "Failed to connect to root mysql admin";
}void db::on_combo1_currentIndexChanged(int index)
{
QString stu_id = ui->combo1->itemData(index).toString();// connect();
QSqlQuery query;// connect();
query.exec ("SELECT sname, id, snamej, email, email2, phone, mobile, dob, info, intro, uctype, ssdiscount, startdate, pass, onapack, joint, address FROM students WHERE id=" + stu_id +"");
while (query.next ()) {
QString sname_t = query.value(0).toString ();
QVariant id_t = query.value (1);
QString snamej_t = query.value(2).toString ();
QString email_t = query.value(3).toString ();
QString email2_t = query.value(4).toString ();
QString phone_t = query.value(5).toString ();
QString mobile_t = query.value(6).toString ();
QString dob_t = query.value(7).toString ();
QString info_t = query.value(8).toString ();
QString intro_t = query.value(9).toString ();
QString uctype_t = query.value(10).toString ();
QString ssdiscount_t = query.value(11).toString ();
QString startdate_t = query.value(12).toString ();
QString pass_t = query.value(13).toString ();
QString onapack_t = query.value(14).toString ();
QString joint_t = query.value(15).toString ();
QString address_t = query.value(16).toString ();
qDebug () << sname_t << id_t << snamej_t << email_t << email2_t << phone_t << mobile_t << dob_t << info_t << intro_t << uctype_t << ssdiscount_t << startdate_t << pass_t << onapack_t << joint_t << address_t;
ui->email->setText(email_t);
ui->phone->setText(phone_t);
ui->email2->setText(email2_t);
ui->mobile->setText(mobile_t);
ui->snamej->setText(snamej_t);
}
}@Thanks for any help. It might be something obvious to you - I'm just getting started with Qt.
-
I wouldn't say it's obvious to me. I would expect Qt to handle the encoding bit from the db correctly. To be sure however, I think you could try the following:
@QString theString = QString::fromUtf8(query.value(x).toByteArray());@
And see if that yields the desired results.
Of course this kind of hard coding will disqualify any possibility of changing encoding in the future (as if you would want to move away from unicode).
-
Uhm, I would try storing some known characters in the database. Then I would read it out with the above method, using both toString() and toByteArray(). Then see what the actual data is and then try to match it to the unicode table. I'd probably put the same known characters into a QString and see what the contents are:
@QString str = QString::fromUtf8("whatever\u03c0");@
str = whateverπ (That's lower case pi)
-
I've been trying all sorts of things but I haven't found an answer yet.
This character - 藤 - is this in unicode - \u85E4.
I put that character into the database through my normal HTML/PHP web page. When I look at the database stuff in PHPMyAdmin, it looks like this - è—¤. It also looks like that when I call it in the Qt thing I'm building.I did this: @ QString st = snamej_t.toUtf8().toHex();@
and got this: c3a8e28094c2a4
I put that number into this page here - http://www.string-functions.com/hex-string.aspx - and got this - è—¤.I tried putting the same character (藤) into the database with my Qt interface and directly from the .cpp file. Both times, when I retrieved the data, I got something more strange - something like this - �?��.
I was wondering again about Qt Creator and the encoding of the files. I changed the encoding of all files to UTF-8, but when I re-opened tham in Qt Creator, they seemed to have changed back to 'System'. As far as I can work out, the system encoding for this Windows PC I'm using should be unicode, because it's a Japanese OS.
I hope you can help me find some kind of answer to this. It's driving me nuts.
Thanks a lot.
-
Try that page's "Character Encoding Errors Analyzer":http://www.string-functions.com/encodingerror.aspx.
I also think that you should look into "QTextCodec::setCodecForCStrings()":http://doc.trolltech.com/latest/qtextcodec.html#setCodecForCStrings. The results look like latin-1 versions of utf-8 encoded text.
-
Maybe "this older thread":http://developer.qt.nokia.com/forums/viewthread/7048 is of help for you.
-
Thank you both.
I put that one character and the strange output into that error-analyzer page and got this:
Displaying 4 results
utf-8 (65001, Unicode (UTF-8)) -> Windows-1252 (1252, Western European (Windows))
utf-8 (65001, Unicode (UTF-8)) -> windows-1254 (1254, Turkish (Windows))
utf-8 (65001, Unicode (UTF-8)) -> windows-1256 (1256, Arabic (Windows))
utf-8 (65001, Unicode (UTF-8)) -> windows-1258 (1258, Vietnamese (Windows))I tried this:
@QTextCodec::setCodecForCStrings(QTextCodec::codecForName("UTF-8"));@
but it didn't seem to do anything.The thread Volker pointed me to seemed very promising, but... I tried removing the collation of the MySql database through PHPMyAdmin, but it wouldn't seem to let me. When I changed it to utf8_general_ci (from utf8_unicode_ci) I was able to put the character into the database via my Qt UI and read it in PHPMyAdmin, but when I looked at my webpage (which uses a UTF-8 character set) I just got a question mark.
Thanks for any more help. Sorry if this is just getting boring now...
-
Here's a minimal case. (Is this enough?)
@
SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT /;
/!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS /;
/!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION /;
/!40101 SET NAMES utf8 */;CREATE TABLE IF NOT EXISTS
students
(
id
smallint(3) NOT NULL auto_increment,
sname
varchar(30) collate utf8_unicode_ci default NULL,
snamej
mediumtext collate utf8_unicode_ci NOT NULL,
email
varchar(60) collate utf8_unicode_ci default NULL,
email2
varchar(50) collate utf8_unicode_ci NOT NULL,
phone
varchar(20) collate utf8_unicode_ci default NULL,
mobile
varchar(15) collate utf8_unicode_ci default NULL,
dob
date default NULL,
dobY
year(4) NOT NULL default '0000',
dobM
smallint(2) default NULL,
dobD
smallint(2) default NULL,
uclass
varchar(20) collate utf8_unicode_ci default NULL,
info
longtext collate utf8_unicode_ci,
intro
varchar(30) collate utf8_unicode_ci default NULL,
lessons
decimal(2,1) NOT NULL,
freect
smallint(2) NOT NULL,
level
mediumtext collate utf8_unicode_ci NOT NULL,
type
varchar(20) collate utf8_unicode_ci default NULL,
uctype
varchar(20) collate utf8_unicode_ci default NULL,
old
tinytext collate utf8_unicode_ci NOT NULL,
ssdiscount
tinytext collate utf8_unicode_ci,
paidforby
mediumtext collate utf8_unicode_ci,
paidforby_id
int(11) NOT NULL,
paysfor
mediumtext collate utf8_unicode_ci NOT NULL,
paysfor_id
int(11) NOT NULL,
intschool
tinytext collate utf8_unicode_ci,
booked
varchar(1) collate utf8_unicode_ci NOT NULL,
startdate
varchar(10) collate utf8_unicode_ci NOT NULL,
notcomenotes
longtext collate utf8_unicode_ci NOT NULL,
paysfor2
varchar(30) collate utf8_unicode_ci NOT NULL,
pass
varchar(8) collate utf8_unicode_ci NOT NULL,
onapack
tinytext collate utf8_unicode_ci NOT NULL,
joint
tinytext collate utf8_unicode_ci NOT NULL,
e1onlist
tinyint(1) NOT NULL,
e2onlist
tinyint(1) NOT NULL,
address
varchar(200) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (id
)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1428 ;INSERT INTO
students
(id
,sname
,snamej
,email
,email2
,phone
,mobile
,dob
,dobY
,dobM
,dobD
,uclass
,info
,intro
,lessons
,freect
,level
,type
,uctype
,old
,ssdiscount
,paidforby
,paidforby_id
,paysfor
,paysfor_id
,intschool
,booked
,startdate
,notcomenotes
,paysfor2
,pass
,onapack
,joint
,e1onlist
,e2onlist
,address
) VALUES
(1007, 'Noriko Sato', 'ä½è—¤ã€€ç´€å', 'noriko@phonecompany.jp', '', '022-333-9999', '090-2222-0000', '1971-12-10', 1971, 12, 10, '', '', '', '0.0', 0, '', '', 'Korean 50', '', '', '', 0, '', 0, '', 'y', '1226732681', '', '', '26ndjokmh4', '', '', 1, 0, '');@
The one character I keep referring to is here: è—¤ (in the 'snamej' field).
-
Thanks for the response. Here's what I got:
Variable_name Value
auto_increment_increment 1
auto_increment_offset 1
automatic_sp_privileges ON
back_log 50
basedir /
binlog_cache_size 32768
bulk_insert_buffer_size 8388608
character_set_client utf8
character_set_connection utf8
character_set_database latin1
character_set_filesystem binary
character_set_results utf8
character_set_server latin1
character_set_system utf8
character_sets_dir /usr/share/mysql/charsets/
collation_connection utf8_general_ci
collation_database latin1_swedish_ci
collation_server latin1_swedish_ci
completion_type 0
concurrent_insert 1
connect_timeout 10
datadir /var/lib/mysql/
date_format %Y-%m-%d
datetime_format %Y-%m-%d %H:%i:%s
default_week_format 0
delay_key_write ON
delayed_insert_limit 100
delayed_insert_timeout 300
delayed_queue_size 1000
div_precision_increment 4
keep_files_on_create OFF
engine_condition_pushdown OFF
expire_logs_days 0
flush OFF
flush_time 0
ft_boolean_syntax + -><()~*:""&|
ft_max_word_len 84
ft_min_word_len 4
ft_query_expansion_limit 20
ft_stopword_file (built-in)
group_concat_max_len 1024
have_archive YES
have_bdb NO
have_blackhole_engine YES
have_compress YES
have_community_features NO
have_profiling NO
have_crypt YES
have_csv YES
have_dynamic_loading YES
have_example_engine YES
have_federated_engine YES
have_geometry YES
have_innodb YES
have_isam NO
have_merge_engine YES
have_ndbcluster NO
have_openssl NO
have_ssl NO
have_query_cache YES
have_raid NO
have_rtree_keys YES
have_symlink YES
hostname biz107.inmotionhosting.com
init_connect
init_file
init_slave
innodb_additional_mem_pool_size 1048576
innodb_autoextend_increment 8
innodb_buffer_pool_awe_mem_mb 0
innodb_buffer_pool_size 134217728
innodb_checksums ON
innodb_commit_concurrency 0
innodb_concurrency_tickets 500
innodb_data_file_path ibdata1:10M:autoextend
innodb_data_home_dir
innodb_adaptive_hash_index ON
innodb_doublewrite ON
innodb_fast_shutdown 1
innodb_file_io_threads 4
innodb_file_per_table OFF
innodb_flush_log_at_trx_commit 1
innodb_flush_method
innodb_force_recovery 0
innodb_lock_wait_timeout 50
innodb_locks_unsafe_for_binlog OFF
innodb_log_arch_dir
innodb_log_archive OFF
innodb_log_buffer_size 1048576
innodb_log_file_size 5242880
innodb_log_files_in_group 2
innodb_log_group_home_dir ./
innodb_max_dirty_pages_pct 90
innodb_max_purge_lag 0
innodb_mirrored_log_groups 1
innodb_open_files 300
innodb_rollback_on_timeout OFF
innodb_support_xa ON
innodb_sync_spin_loops 20
innodb_table_locks ON
Variable_name Value
innodb_thread_concurrency 8
innodb_thread_sleep_delay 10000
innodb_use_legacy_cardinality_algorithm ON
interactive_timeout 30
join_buffer_size 131072
key_buffer_size 805306368
key_cache_age_threshold 300
key_cache_block_size 1024
key_cache_division_limit 100
language /usr/share/mysql/english/
large_files_support ON
large_page_size 0
large_pages OFF
lc_time_names en_US
license GPL
local_infile ON
locked_in_memory OFF
log ON
log_bin OFF
log_bin_trust_function_creators OFF
log_error
log_queries_not_using_indexes OFF
log_slave_updates OFF
log_slow_queries ON
log_warnings 1
long_query_time 3
low_priority_updates OFF
lower_case_file_system OFF
lower_case_table_names 0
max_allowed_packet 5242880
max_binlog_cache_size 18446744073709547520
max_binlog_size 1073741824
max_connect_errors 10
max_connections 500
max_delayed_threads 20
max_error_count 64
max_heap_table_size 16777216
max_insert_delayed_threads 20
max_join_size 18446744073709551615
max_length_for_sort_data 1024
max_prepared_stmt_count 16382
max_relay_log_size 0
max_seeks_for_key 18446744073709551615
max_sort_length 1024
max_sp_recursion_depth 0
max_tmp_tables 32
max_user_connections 30
max_write_lock_count 18446744073709551615
multi_range_count 256
myisam_data_pointer_size 6
myisam_max_sort_file_size 9223372036853727232
myisam_mmap_size 18446744073709551615
myisam_recover_options OFF
myisam_repair_threads 1
myisam_sort_buffer_size 8388608
myisam_stats_method nulls_unequal
net_buffer_length 16384
net_read_timeout 30
net_retry_count 10
net_write_timeout 60
new OFF
old_passwords OFF
open_files_limit 8702
optimizer_prune_level 1
optimizer_search_depth 62
pid_file /var/lib/mysql/biz107.inmotionhosting.com.pid
plugin_dir
port 3306
preload_buffer_size 32768
protocol_version 10
query_alloc_block_size 8192
query_cache_limit 1048576
query_cache_min_res_unit 4096
query_cache_size 536870912
query_cache_type ON
query_cache_wlock_invalidate OFF
query_prealloc_size 8192
range_alloc_block_size 4096
read_buffer_size 268435456
read_only OFF
read_rnd_buffer_size 16777216
relay_log
relay_log_index
relay_log_info_file relay-log.info
relay_log_purge ON
relay_log_space_limit 0
rpl_recovery_rank 0
secure_auth OFF
secure_file_priv
server_id 0
skip_external_locking ON
skip_networking OFF
skip_show_database OFF
slave_compressed_protocol OFF
slave_load_tmpdir /tmp/
slave_net_timeout 3600
slave_skip_errors OFF
slave_transaction_retries 10
slow_launch_time 2
socket /var/lib/mysql/mysql.sock
Variable_name Value
sort_buffer_size 268435456
sql_big_selects ON
sql_mode
sql_notes ON
sql_warnings OFF
ssl_ca
ssl_capath
ssl_cert
ssl_cipher
ssl_key
storage_engine MyISAM
sync_binlog 0
sync_frm ON
system_time_zone PDT
table_cache 4096
table_lock_wait_timeout 50
table_type MyISAM
thread_cache_size 384
thread_stack 262144
time_format %H:%i:%s
time_zone SYSTEM
timed_mutexes OFF
tmp_table_size 33554432
tmpdir /tmp/
transaction_alloc_block_size 8192
transaction_prealloc_size 4096
tx_isolation REPEATABLE-READ
updatable_views_with_limit YES
version 5.0.92-community-log
version_comment MySQL Community Edition (GPL)
version_compile_machine x86_64
version_compile_os unknown-linux-gnu
wait_timeout 30 -
OK. I tried that, but it didn't help.
Here's the frustrating situation now.
I want to use the same database with two different interfaces.
Right now, the web interface is in UTF-8. I can input, retrieve and display Japanese charcters this way, but they seem to be stored as nonsense-looking characters.
The Qt interface that I'm working on can input, retrieve and display Japanese characters, and they seem to be stored as Japanese characters too, rather than as nonsense-style stuff.
Unfortunately, if I use the Qt interface to retrieve data that was inputted with the web interface, it displays as nonsense. If I use the web interface to retrieve data that was stored with the Qt interface, it displays as question marks.
The Qt data at least looks correct in the database (viewed with phpMyAdmin). I thought maybe if I could convert all the data to be like that, it would be good, but firstly, I don't know how to do that, and secondly, as I said, when I try to display that data in a browser, it comes out as question marks, so...
Any ideas??? [frustrated or confused smilie goes here]