Problem with Japanese encoding/display



  • I have a MySQL database with lots of Japanese in it. I usually connect to it with PHP and display the data on a web page with a UTF-8 character set, which works fine.

    When I use PHP MyAdmin to view the database, I see this - 佐藤 紀子 - where I should see this - 佐藤 紀子. When I add and retrieve Japanese text, everything works fine, but that PHP MyAdmin view is odd.

    My problem here is that I'm retrieving the same data with QT and it's displaying the wrong way. I thought I might simply need to change the encoding of things - maybe the ui file - to UTF-8, but it seems that's the standard encoding in QT Creator.

    Any idea what I can do here?

    Thanks a lot.



  • Show us the code. The file encoding of the ui is not important if you get the texts from a database.



  • Thanks for the reply. Here's the code. The relevant bit is in the query at the bottom. 'snamej' is the bit with the problem.

    @#include "db.h"
    #include "ui_db.h"

    db::db(QWidget *parent) :
    QMainWindow(parent),
    ui(new Ui::db)
    {
    ui->setupUi(this);
    fillStudentDrops();
    }

    db::~db()
    {
    delete ui;
    }

    void db::on_pushButton_clicked()
    {

    }

    void db::fillStudentDrops()
    {
    connect();
    QSqlQuery query;
    query.exec ("SELECT sname, id FROM students WHERE old='n' || old='' ORDER BY sname ASC");
    while (query.next ()) {
    QString sname = query.value (0).toString ();
    QVariant id = query.value (1);
    qDebug () << sname << id;
    QString output = sname + " " +id.toString();
    ui->combo1->addItem(output, id);
    }
    }

    void db::connect()
    {
    QSqlDatabase cdb = QSqlDatabase::addDatabase ("QMYSQL");
    cdb.setHostName ("xxx.xxx.xxx.xxx");
    cdb.setDatabaseName ("mydatabase");
    cdb.setUserName ("myname");
    cdb.setPassword ("mypassword");
    if (!cdb.open()) qDebug() << "Failed to connect to root mysql admin";
    }

    void db::on_combo1_currentIndexChanged(int index)
    {
    QString stu_id = ui->combo1->itemData(index).toString();

    // connect();
    QSqlQuery query;

    // connect();
    query.exec ("SELECT sname, id, snamej, email, email2, phone, mobile, dob, info, intro, uctype, ssdiscount, startdate, pass, onapack, joint, address FROM students WHERE id=" + stu_id +"");
    while (query.next ()) {
    QString sname_t = query.value(0).toString ();
    QVariant id_t = query.value (1);
    QString snamej_t = query.value(2).toString ();
    QString email_t = query.value(3).toString ();
    QString email2_t = query.value(4).toString ();
    QString phone_t = query.value(5).toString ();
    QString mobile_t = query.value(6).toString ();
    QString dob_t = query.value(7).toString ();
    QString info_t = query.value(8).toString ();
    QString intro_t = query.value(9).toString ();
    QString uctype_t = query.value(10).toString ();
    QString ssdiscount_t = query.value(11).toString ();
    QString startdate_t = query.value(12).toString ();
    QString pass_t = query.value(13).toString ();
    QString onapack_t = query.value(14).toString ();
    QString joint_t = query.value(15).toString ();
    QString address_t = query.value(16).toString ();
    qDebug () << sname_t << id_t << snamej_t << email_t << email2_t << phone_t << mobile_t << dob_t << info_t << intro_t << uctype_t << ssdiscount_t << startdate_t << pass_t << onapack_t << joint_t << address_t;
    ui->email->setText(email_t);
    ui->phone->setText(phone_t);
    ui->email2->setText(email2_t);
    ui->mobile->setText(mobile_t);
    ui->snamej->setText(snamej_t);
    }
    }@

    Thanks for any help. It might be something obvious to you - I'm just getting started with Qt.



  • I wouldn't say it's obvious to me. I would expect Qt to handle the encoding bit from the db correctly. To be sure however, I think you could try the following:

    @QString theString = QString::fromUtf8(query.value(x).toByteArray());@

    And see if that yields the desired results.

    Of course this kind of hard coding will disqualify any possibility of changing encoding in the future (as if you would want to move away from unicode).



  • Thanks. That doesn't seem to do it though.

    Here's what I get from that: 佐�?��??��?子



  • Did you output the actual unicode code points (numbers) and compare them with the expected value?



  • Well, I've just searched around to find out how I would do what you suggest, but I'm not finding it, so...
    How would I do that?



  • Uhm, I would try storing some known characters in the database. Then I would read it out with the above method, using both toString() and toByteArray(). Then see what the actual data is and then try to match it to the unicode table. I'd probably put the same known characters into a QString and see what the contents are:

    @QString str = QString::fromUtf8("whatever\u03c0");@

    str = whateverπ (That's lower case pi)



  • I've been trying all sorts of things but I haven't found an answer yet.

    This character - 藤 - is this in unicode - \u85E4.
    I put that character into the database through my normal HTML/PHP web page. When I look at the database stuff in PHPMyAdmin, it looks like this - è—¤. It also looks like that when I call it in the Qt thing I'm building.

    I did this: @ QString st = snamej_t.toUtf8().toHex();@
    and got this: c3a8e28094c2a4
    I put that number into this page here - http://www.string-functions.com/hex-string.aspx - and got this - è—¤.

    I tried putting the same character (藤) into the database with my Qt interface and directly from the .cpp file. Both times, when I retrieved the data, I got something more strange - something like this - �?��.

    I was wondering again about Qt Creator and the encoding of the files. I changed the encoding of all files to UTF-8, but when I re-opened tham in Qt Creator, they seemed to have changed back to 'System'. As far as I can work out, the system encoding for this Windows PC I'm using should be unicode, because it's a Japanese OS.

    I hope you can help me find some kind of answer to this. It's driving me nuts.

    Thanks a lot.



  • Try that page's "Character Encoding Errors Analyzer":http://www.string-functions.com/encodingerror.aspx.

    I also think that you should look into "QTextCodec::setCodecForCStrings()":http://doc.trolltech.com/latest/qtextcodec.html#setCodecForCStrings. The results look like latin-1 versions of utf-8 encoded text.



  • Maybe "this older thread":http://developer.qt.nokia.com/forums/viewthread/7048 is of help for you.



  • Thank you both.

    I put that one character and the strange output into that error-analyzer page and got this:

    Displaying 4 results
    utf-8 (65001, Unicode (UTF-8)) -> Windows-1252 (1252, Western European (Windows))
    utf-8 (65001, Unicode (UTF-8)) -> windows-1254 (1254, Turkish (Windows))
    utf-8 (65001, Unicode (UTF-8)) -> windows-1256 (1256, Arabic (Windows))
    utf-8 (65001, Unicode (UTF-8)) -> windows-1258 (1258, Vietnamese (Windows))

    I tried this:
    @QTextCodec::setCodecForCStrings(QTextCodec::codecForName("UTF-8"));@
    but it didn't seem to do anything.

    The thread Volker pointed me to seemed very promising, but... I tried removing the collation of the MySql database through PHPMyAdmin, but it wouldn't seem to let me. When I changed it to utf8_general_ci (from utf8_unicode_ci) I was able to put the character into the database via my Qt UI and read it in PHPMyAdmin, but when I looked at my webpage (which uses a UTF-8 character set) I just got a question mark.

    Thanks for any more help. Sorry if this is just getting boring now...



  • Could you write up an example with a database dump we can test?



  • Here's a minimal case. (Is this enough?)

    @
    SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";

    /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT /;
    /
    !40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS /;
    /
    !40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION /;
    /
    !40101 SET NAMES utf8 */;

    CREATE TABLE IF NOT EXISTS students (
    id smallint(3) NOT NULL auto_increment,
    sname varchar(30) collate utf8_unicode_ci default NULL,
    snamej mediumtext collate utf8_unicode_ci NOT NULL,
    email varchar(60) collate utf8_unicode_ci default NULL,
    email2 varchar(50) collate utf8_unicode_ci NOT NULL,
    phone varchar(20) collate utf8_unicode_ci default NULL,
    mobile varchar(15) collate utf8_unicode_ci default NULL,
    dob date default NULL,
    dobY year(4) NOT NULL default '0000',
    dobM smallint(2) default NULL,
    dobD smallint(2) default NULL,
    uclass varchar(20) collate utf8_unicode_ci default NULL,
    info longtext collate utf8_unicode_ci,
    intro varchar(30) collate utf8_unicode_ci default NULL,
    lessons decimal(2,1) NOT NULL,
    freect smallint(2) NOT NULL,
    level mediumtext collate utf8_unicode_ci NOT NULL,
    type varchar(20) collate utf8_unicode_ci default NULL,
    uctype varchar(20) collate utf8_unicode_ci default NULL,
    old tinytext collate utf8_unicode_ci NOT NULL,
    ssdiscount tinytext collate utf8_unicode_ci,
    paidforby mediumtext collate utf8_unicode_ci,
    paidforby_id int(11) NOT NULL,
    paysfor mediumtext collate utf8_unicode_ci NOT NULL,
    paysfor_id int(11) NOT NULL,
    intschool tinytext collate utf8_unicode_ci,
    booked varchar(1) collate utf8_unicode_ci NOT NULL,
    startdate varchar(10) collate utf8_unicode_ci NOT NULL,
    notcomenotes longtext collate utf8_unicode_ci NOT NULL,
    paysfor2 varchar(30) collate utf8_unicode_ci NOT NULL,
    pass varchar(8) collate utf8_unicode_ci NOT NULL,
    onapack tinytext collate utf8_unicode_ci NOT NULL,
    joint tinytext collate utf8_unicode_ci NOT NULL,
    e1onlist tinyint(1) NOT NULL,
    e2onlist tinyint(1) NOT NULL,
    address varchar(200) collate utf8_unicode_ci NOT NULL,
    PRIMARY KEY (id)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1428 ;

    INSERT INTO students (id, sname, snamej, email, email2, phone, mobile, dob, dobY, dobM, dobD, uclass, info, intro, lessons, freect, level, type, uctype, old, ssdiscount, paidforby, paidforby_id, paysfor, paysfor_id, intschool, booked, startdate, notcomenotes, paysfor2, pass, onapack, joint, e1onlist, e2onlist, address) VALUES
    (1007, 'Noriko Sato', '佐藤 紀子', 'noriko@phonecompany.jp', '', '022-333-9999', '090-2222-0000', '1971-12-10', 1971, 12, 10, '', '', '', '0.0', 0, '', '', 'Korean 50', '', '', '', 0, '', 0, '', 'y', '1226732681', '', '', '26ndjokmh4', '', '', 1, 0, '');

    @

    The one character I keep referring to is here: è—¤ (in the 'snamej' field).



  • Sorry, but... bump.

    I'm getting nowhere with this encoding issue and still hoping for help.

    Thanks a lot.



  • Hi. I'm not really sure what to do anymore but bump this again. Any ideas on this?



  • Can you run
    SHOW VARIABLES; command on your MySql server and show output?



  • Thanks for the response. Here's what I got:

    Variable_name Value
    auto_increment_increment 1
    auto_increment_offset 1
    automatic_sp_privileges ON
    back_log 50
    basedir /
    binlog_cache_size 32768
    bulk_insert_buffer_size 8388608
    character_set_client utf8
    character_set_connection utf8
    character_set_database latin1
    character_set_filesystem binary
    character_set_results utf8
    character_set_server latin1
    character_set_system utf8
    character_sets_dir /usr/share/mysql/charsets/
    collation_connection utf8_general_ci
    collation_database latin1_swedish_ci
    collation_server latin1_swedish_ci
    completion_type 0
    concurrent_insert 1
    connect_timeout 10
    datadir /var/lib/mysql/
    date_format %Y-%m-%d
    datetime_format %Y-%m-%d %H:%i:%s
    default_week_format 0
    delay_key_write ON
    delayed_insert_limit 100
    delayed_insert_timeout 300
    delayed_queue_size 1000
    div_precision_increment 4
    keep_files_on_create OFF
    engine_condition_pushdown OFF
    expire_logs_days 0
    flush OFF
    flush_time 0
    ft_boolean_syntax + -><()~*:""&|
    ft_max_word_len 84
    ft_min_word_len 4
    ft_query_expansion_limit 20
    ft_stopword_file (built-in)
    group_concat_max_len 1024
    have_archive YES
    have_bdb NO
    have_blackhole_engine YES
    have_compress YES
    have_community_features NO
    have_profiling NO
    have_crypt YES
    have_csv YES
    have_dynamic_loading YES
    have_example_engine YES
    have_federated_engine YES
    have_geometry YES
    have_innodb YES
    have_isam NO
    have_merge_engine YES
    have_ndbcluster NO
    have_openssl NO
    have_ssl NO
    have_query_cache YES
    have_raid NO
    have_rtree_keys YES
    have_symlink YES
    hostname biz107.inmotionhosting.com
    init_connect
    init_file
    init_slave
    innodb_additional_mem_pool_size 1048576
    innodb_autoextend_increment 8
    innodb_buffer_pool_awe_mem_mb 0
    innodb_buffer_pool_size 134217728
    innodb_checksums ON
    innodb_commit_concurrency 0
    innodb_concurrency_tickets 500
    innodb_data_file_path ibdata1:10M:autoextend
    innodb_data_home_dir
    innodb_adaptive_hash_index ON
    innodb_doublewrite ON
    innodb_fast_shutdown 1
    innodb_file_io_threads 4
    innodb_file_per_table OFF
    innodb_flush_log_at_trx_commit 1
    innodb_flush_method
    innodb_force_recovery 0
    innodb_lock_wait_timeout 50
    innodb_locks_unsafe_for_binlog OFF
    innodb_log_arch_dir
    innodb_log_archive OFF
    innodb_log_buffer_size 1048576
    innodb_log_file_size 5242880
    innodb_log_files_in_group 2
    innodb_log_group_home_dir ./
    innodb_max_dirty_pages_pct 90
    innodb_max_purge_lag 0
    innodb_mirrored_log_groups 1
    innodb_open_files 300
    innodb_rollback_on_timeout OFF
    innodb_support_xa ON
    innodb_sync_spin_loops 20
    innodb_table_locks ON
    Variable_name Value
    innodb_thread_concurrency 8
    innodb_thread_sleep_delay 10000
    innodb_use_legacy_cardinality_algorithm ON
    interactive_timeout 30
    join_buffer_size 131072
    key_buffer_size 805306368
    key_cache_age_threshold 300
    key_cache_block_size 1024
    key_cache_division_limit 100
    language /usr/share/mysql/english/
    large_files_support ON
    large_page_size 0
    large_pages OFF
    lc_time_names en_US
    license GPL
    local_infile ON
    locked_in_memory OFF
    log ON
    log_bin OFF
    log_bin_trust_function_creators OFF
    log_error
    log_queries_not_using_indexes OFF
    log_slave_updates OFF
    log_slow_queries ON
    log_warnings 1
    long_query_time 3
    low_priority_updates OFF
    lower_case_file_system OFF
    lower_case_table_names 0
    max_allowed_packet 5242880
    max_binlog_cache_size 18446744073709547520
    max_binlog_size 1073741824
    max_connect_errors 10
    max_connections 500
    max_delayed_threads 20
    max_error_count 64
    max_heap_table_size 16777216
    max_insert_delayed_threads 20
    max_join_size 18446744073709551615
    max_length_for_sort_data 1024
    max_prepared_stmt_count 16382
    max_relay_log_size 0
    max_seeks_for_key 18446744073709551615
    max_sort_length 1024
    max_sp_recursion_depth 0
    max_tmp_tables 32
    max_user_connections 30
    max_write_lock_count 18446744073709551615
    multi_range_count 256
    myisam_data_pointer_size 6
    myisam_max_sort_file_size 9223372036853727232
    myisam_mmap_size 18446744073709551615
    myisam_recover_options OFF
    myisam_repair_threads 1
    myisam_sort_buffer_size 8388608
    myisam_stats_method nulls_unequal
    net_buffer_length 16384
    net_read_timeout 30
    net_retry_count 10
    net_write_timeout 60
    new OFF
    old_passwords OFF
    open_files_limit 8702
    optimizer_prune_level 1
    optimizer_search_depth 62
    pid_file /var/lib/mysql/biz107.inmotionhosting.com.pid
    plugin_dir
    port 3306
    preload_buffer_size 32768
    protocol_version 10
    query_alloc_block_size 8192
    query_cache_limit 1048576
    query_cache_min_res_unit 4096
    query_cache_size 536870912
    query_cache_type ON
    query_cache_wlock_invalidate OFF
    query_prealloc_size 8192
    range_alloc_block_size 4096
    read_buffer_size 268435456
    read_only OFF
    read_rnd_buffer_size 16777216
    relay_log
    relay_log_index
    relay_log_info_file relay-log.info
    relay_log_purge ON
    relay_log_space_limit 0
    rpl_recovery_rank 0
    secure_auth OFF
    secure_file_priv
    server_id 0
    skip_external_locking ON
    skip_networking OFF
    skip_show_database OFF
    slave_compressed_protocol OFF
    slave_load_tmpdir /tmp/
    slave_net_timeout 3600
    slave_skip_errors OFF
    slave_transaction_retries 10
    slow_launch_time 2
    socket /var/lib/mysql/mysql.sock
    Variable_name Value
    sort_buffer_size 268435456
    sql_big_selects ON
    sql_mode
    sql_notes ON
    sql_warnings OFF
    ssl_ca
    ssl_capath
    ssl_cert
    ssl_cipher
    ssl_key
    storage_engine MyISAM
    sync_binlog 0
    sync_frm ON
    system_time_zone PDT
    table_cache 4096
    table_lock_wait_timeout 50
    table_type MyISAM
    thread_cache_size 384
    thread_stack 262144
    time_format %H:%i:%s
    time_zone SYSTEM
    timed_mutexes OFF
    tmp_table_size 33554432
    tmpdir /tmp/
    transaction_alloc_block_size 8192
    transaction_prealloc_size 4096
    tx_isolation REPEATABLE-READ
    updatable_views_with_limit YES
    version 5.0.92-community-log
    version_comment MySQL Community Edition (GPL)
    version_compile_machine x86_64
    version_compile_os unknown-linux-gnu
    wait_timeout 30



  • Seems like server settings are good, same with my MySql server (I'm using UTF-8 for Russian symbols)

    Well...
    Just for test, right away after connection in your program try execute query:

    bq. SET CHARACTER SET utf8;

    and run program.



  • OK. I tried that, but it didn't help.

    Here's the frustrating situation now.

    I want to use the same database with two different interfaces.

    Right now, the web interface is in UTF-8. I can input, retrieve and display Japanese charcters this way, but they seem to be stored as nonsense-looking characters.

    The Qt interface that I'm working on can input, retrieve and display Japanese characters, and they seem to be stored as Japanese characters too, rather than as nonsense-style stuff.

    Unfortunately, if I use the Qt interface to retrieve data that was inputted with the web interface, it displays as nonsense. If I use the web interface to retrieve data that was stored with the Qt interface, it displays as question marks.

    The Qt data at least looks correct in the database (viewed with phpMyAdmin). I thought maybe if I could convert all the data to be like that, it would be good, but firstly, I don't know how to do that, and secondly, as I said, when I try to display that data in a browser, it comes out as question marks, so...

    Any ideas??? [frustrated or confused smilie goes here]


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.