Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Is QT checked, which encoding is using in file?
Forum Updated to NodeBB v4.3 + New Features

Is QT checked, which encoding is using in file?

Scheduled Pinned Locked Moved Solved General and Desktop
2 Posts 2 Posters 1.2k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Q Offline
    Q Offline
    qwe3
    wrote on 26 Jun 2021, 18:32 last edited by qwe3
    #1

    Hi,

    I have files with different encoding: UTF-8, UTF-8 with BOM, windows-1250, UTF-16LE, UTF-16BE.

    I know that when the file has UTF-8 encoding and I load this file to QString and next I set this QString in label->setText() I will see a strange signs. So I have to use:

        QTextStream ts1(file1.readAll(), QIODevice::ReadOnly);
        ts1.setCodec(QTextCodec::codecForName("UTF-8"));
    

    And now it works.

    When I use other encodings ( UTF-8 with BOM, windows-1250, UTF16LE, UTF-16BE ) I don't have this problem, so my question is: Can QT autodetect if the file has for example encoding UTF-8 with BOM and automatic setCodec to UTF-8?

    I think QT can't detect UTF-8 encoding because there isn't BOM here. ANSI is default. UTF-8 with BOM, UTF16LE, UTF16BE have BOM so, QT check bytes in files headers and find BOMs. But maybe I'm wrong.

    EDIT:
    In QTextStream in docs I find:

    Internally, QTextStream uses a Unicode based buffer, and QTextCodec is used by QTextStream to automatically support different character sets. By default, QTextCodec::codecForLocale() is used for reading and writing, but you can also set the codec by calling setCodec(). Automatic Unicode detection is also supported. When this feature is enabled (the default behavior), QTextStream will detect the UTF-16 or the UTF-32 BOM (Byte Order Mark) and switch to the appropriate UTF codec when reading. QTextStream does not write a BOM by default, but you can enable this by calling setGenerateByteOrderMark(true). When QTextStream operates on a QString directly, the codec is disabled.
    

    So UTF16 is detected, but what about UTF8 with Bom?

    I find here:

    https://code.woboq.org/qt5/qtbase/src/corelib/serialization/qtextstream.cpp.html
    

    text:

    QTextCodec *QTextStream::codec() const
    {
        Q_D(const QTextStream);
        return d->codec;
    }
    /*!
        If \a enabled is true, QTextStream will attempt to detect Unicode encoding
        by peeking into the stream data to see if it can find the UTF-8, UTF-16, or
        UTF-32 Byte Order Mark (BOM). If this mark is found, QTextStream will
        replace the current codec with the UTF codec.
        This function can be used together with setCodec(). It is common
        to set the codec to UTF-8, and then enable UTF-16 detection.
        \sa autoDetectUnicode(), setCodec(), QTextCodec::codecForUtfText()
    */
    void QTextStream::setAutoDetectUnicode(bool enabled)
    {
        Q_D(QTextStream);
        d->autoDetectUnicode = enabled;
    }
    
    

    So... what with UTF-8 with BOM?

    1 Reply Last reply
    0
    • C Offline
      C Offline
      ChrisW67
      wrote on 27 Jun 2021, 01:00 last edited by
      #2

      QTextCodec, by default will detect a UTF-32, -16, or -8 BOM and react accordingly. Any other collection of bytes will be interpreted as whatever QTextCodec::codecForLocale() returns on the machine running the code (UTF-8 on my Linux box, probably a WIndows-125x 8-bit encoding on Windows).

      1 Reply Last reply
      1

      1/2

      26 Jun 2021, 18:32

      • Login

      • Login or register to search.
      1 out of 2
      • First post
        1/2
        Last post
      0
      • Categories
      • Recent
      • Tags
      • Popular
      • Users
      • Groups
      • Search
      • Get Qt Extensions
      • Unsolved