Can I read/write Windows-1252 and other legacy encodings in Qt 6?
-
My application uses Qt 5 on Windows and Mac. Via QTextStream::setCodec() it is able to access 59 different text encodings.
I have mostly ported my application to Qt 6. But it seems that Qt 6 only supported a handful of text encodings. Qt 6.7 on Windows supports just:
UTF-8
UTF-16
UTF-16LE
UTF-16BE
UTF-32
UTF-32LE
UTF-32BE
ISO-8859-1My customers want to be able to read and write using encodings such as Windows-1252 and Windows-1256. This is stopping me moving to Qt 6.
Does anyone know if these legacy encodings will be added back into Qt 6? There are some open issues, but I don't see anything definitive:
https://bugreports.qt.io/browse/QTBUG-109254
https://bugreports.qt.io/browse/QTBUG-117362
https://codereview.qt-project.org/c/qt/qtbase/+/393373
https://codereview.qt-project.org/c/qt/qtbase/+/429820Any information gratefully received.
-
I cannot answer the primary question.
Are the files of a size manageable in RAM, or are we talking multi gigabyte monsters?
If they are "small" then you could possibly use QStringDecoder to process the source file into a QByteArray. Wrap the byte array with QBuffer and feed that to your existing QTextStream logic. -
If that feature is important for you, then comment to the bugreports stating the need and/or ask what's needed to finish the open patches.
If I get it correctly correctly, most of the work is already done and whats missing is the connection of ICU library to QTextStream.
-
From QStringConverter::encodingForName, it says
Such a name may, none the less, be accepted by the QStringConverter constructor when Qt is built with ICU, if ICU provides a converter with the given name.
So it should be possible to load an icu-supported codec name. (Seems to start from 6.6 according to this SO post.)
But I'm not sure if the prebuilt Qt is built with ICU or not. -
My application uses Qt 5 on Windows and Mac. Via QTextStream::setCodec() it is able to access 59 different text encodings.
I have mostly ported my application to Qt 6. But it seems that Qt 6 only supported a handful of text encodings. Qt 6.7 on Windows supports just:
UTF-8
UTF-16
UTF-16LE
UTF-16BE
UTF-32
UTF-32LE
UTF-32BE
ISO-8859-1My customers want to be able to read and write using encodings such as Windows-1252 and Windows-1256. This is stopping me moving to Qt 6.
Does anyone know if these legacy encodings will be added back into Qt 6? There are some open issues, but I don't see anything definitive:
https://bugreports.qt.io/browse/QTBUG-109254
https://bugreports.qt.io/browse/QTBUG-117362
https://codereview.qt-project.org/c/qt/qtbase/+/393373
https://codereview.qt-project.org/c/qt/qtbase/+/429820Any information gratefully received.
@AndyBrice said in Can I read/write Windows-1252 and other legacy encodings in Qt 6?:
My customers want to be able to read and write using encodings such as Windows-1252 and Windows-1256. This is stopping me moving to Qt 6.
If I can understand the need of reading existing data in old formats, however, is there really a point writing to those old formats ?
-
@AndyBrice said in Can I read/write Windows-1252 and other legacy encodings in Qt 6?:
My customers want to be able to read and write using encodings such as Windows-1252 and Windows-1256. This is stopping me moving to Qt 6.
If I can understand the need of reading existing data in old formats, however, is there really a point writing to those old formats ?
If I can understand the need of reading existing data in old formats, however, is there really a point writing to those old formats ?
Interoperability with existing, old applications?
-
@AndyBrice said in Can I read/write Windows-1252 and other legacy encodings in Qt 6?:
My customers want to be able to read and write using encodings such as Windows-1252 and Windows-1256. This is stopping me moving to Qt 6.
If I can understand the need of reading existing data in old formats, however, is there really a point writing to those old formats ?
@ankou29666 said in Can I read/write Windows-1252 and other legacy encodings in Qt 6?:
If I can understand the need of reading existing data in old formats, however, is there really a point writing to those old formats ?
In an ideal world, no. But people have to work with all sorts of legacy systems.
-
From QStringConverter::encodingForName, it says
Such a name may, none the less, be accepted by the QStringConverter constructor when Qt is built with ICU, if ICU provides a converter with the given name.
So it should be possible to load an icu-supported codec name. (Seems to start from 6.6 according to this SO post.)
But I'm not sure if the prebuilt Qt is built with ICU or not.@Bonnie said in Can I read/write Windows-1252 and other legacy encodings in Qt 6?:
So it should be possible to load an icu-supported codec name. (Seems to start from 6.6 according to this SO post.)
But I'm not sure if the prebuilt Qt is built with ICU or not.As far as I can make out, it isn't possible to access these additional Codecs from the Qt 6.7 or 6.8 binaries.
Years ago, I used to build my own Qt binaries from source. But it just got too difficult.
-
seems like the QTextCodec in Qt5 compatibility module supports Win1250 to 1258 endodings.
https://doc.qt.io/qt-6/qtextcodec.html -
I guess my other option is to build a command line encoding converter in Qt 5 and call it from my Qt 6 application. Hardly ideal though.
@AndyBrice Or you can link to icu and use its api your self just like what Qt did in its internal codes, or even use other thirdparty codec libraries.
-
seems like the QTextCodec in Qt5 compatibility module supports Win1250 to 1258 endodings.
https://doc.qt.io/qt-6/qtextcodec.html@ankou29666 said in Can I read/write Windows-1252 and other legacy encodings in Qt 6?:
seems like the QTextCodec in Qt5 compatibility module supports Win1250 to 1258 endodings.
https://doc.qt.io/qt-6/qtextcodec.htmlOk, I didn't spot that. Thanks.
So I can install handle these encodings, but to read windows-1252 encoding where I used to do this in Qt 5:
QFile f( path ); if ( f.open( QIODevice::ReadOnly ) ) { QTextStream t( &f ); QTextCodec* codec = QTextCodec::codecForName( "windows-1252" ); t.setCodec( codec ); ... }I have to do this in Qt 6:
QByteArray encodedString = "..."; // read from file QTextCodec* codec = QTextCodec::codecForName("windows-1252"); QString unencodedString = codec->toUnicode(encodedString);Is that right?
-
If willing to use a sub process, you can always use a
iconvbinary to pre/post process input or output to/from UTF-16. There are many ways to tackle this issue, which is probably part of the reason no one was motivated enough so far to push the patch to extend Qt 6'sQTextCodecto the finish line. -
@AndyBrice Or you can link to icu and use its api your self just like what Qt did in its internal codes, or even use other thirdparty codec libraries.
@Bonnie said in Can I read/write Windows-1252 and other legacy encodings in Qt 6?:
Or you can link to icu and use its api your self just like what Qt did in its internal codes, or even use other thirdparty codec libraries.
Are their prebuilt ICU binaries for Windows and Mac? I had a quick look on https://unicode-org.github.io/icu/, but didn't see them.
Do you know what the licensing of the binaries is? If they are GPL, I won't be able to use them in my commercial product.
-
If willing to use a sub process, you can always use a
iconvbinary to pre/post process input or output to/from UTF-16. There are many ways to tackle this issue, which is probably part of the reason no one was motivated enough so far to push the patch to extend Qt 6'sQTextCodecto the finish line.@IgKh said in Can I read/write Windows-1252 and other legacy encodings in Qt 6?:
If willing to use a sub process, you can always use a
iconvbinary to pre/post process input or output to/from UTF-16.Is iconv related to the ICU libraries, or completely different?
-
@IgKh said in Can I read/write Windows-1252 and other legacy encodings in Qt 6?:
If willing to use a sub process, you can always use a
iconvbinary to pre/post process input or output to/from UTF-16.Is iconv related to the ICU libraries, or completely different?
@AndyBrice
iconvisn't related to ICU, it is a very old POSIX API and a corresponding CLI binary that's included in every UNIX-like/Linux system and it is not hard to find compatible Windows versions of it. Usually can work with any text encoding ever known to mankind.It can be integrated using
QProcess, i.e. something like:QProcess* proc = new QProcess(parent); proc->setStandardInputFile("path/to/input/file"); proc->start("path/to/iconv", QStringList() << "-f" << "WINDOWS-1252" << "-t" << "UTF16");A then the
QProcesscan be used as source device forQTextStream, since it is a kind ofQIODevice. Likewise for the output. -
@AndyBrice
iconvisn't related to ICU, it is a very old POSIX API and a corresponding CLI binary that's included in every UNIX-like/Linux system and it is not hard to find compatible Windows versions of it. Usually can work with any text encoding ever known to mankind.It can be integrated using
QProcess, i.e. something like:QProcess* proc = new QProcess(parent); proc->setStandardInputFile("path/to/input/file"); proc->start("path/to/iconv", QStringList() << "-f" << "WINDOWS-1252" << "-t" << "UTF16");A then the
QProcesscan be used as source device forQTextStream, since it is a kind ofQIODevice. Likewise for the output. -
@Bonnie said in Can I read/write Windows-1252 and other legacy encodings in Qt 6?:
Or you can link to icu and use its api your self just like what Qt did in its internal codes, or even use other thirdparty codec libraries.
Are their prebuilt ICU binaries for Windows and Mac? I had a quick look on https://unicode-org.github.io/icu/, but didn't see them.
Do you know what the licensing of the binaries is? If they are GPL, I won't be able to use them in my commercial product.
@AndyBrice said in Can I read/write Windows-1252 and other legacy encodings in Qt 6?:
Are their prebuilt ICU binaries for Windows and Mac? I had a quick look on https://unicode-org.github.io/icu/, but didn't see them.
Binaries are located on their GitHub page under release: https://github.com/unicode-org/icu/releases/tag/release-76-rc. Upon a quick glance I'm not sure if any of these are for macOS, though.
@AndyBrice said in Can I read/write Windows-1252 and other legacy encodings in Qt 6?:
Do you know what the licensing of the binaries is? If they are GPL, I won't be able to use them in my commercial product.
They have a lilst of all the licenses (including 3rd party) that apply: https://github.com/unicode-org/icu?tab=License-1-ov-file. ICU itself seems to be very permissive. Some of the 3rd party libs seem to require a mention with their copyright notice. Overall it should be useable for commercial products.