Solved How to save file in specific encoding?
-
I tried the examples and referred the docs, but I am still not able to resolve this issue. I used the setCodec method to set the encoding to QTextStream output but when I open the document in another text editor, it shows UTF-8 although I have not set the codec to UTF-8. How can I save a file in different encoding other than UTF-8, is it possible in Qt?
-
@Yet-Zio said in How to save file in specific encoding?:
I have not set the codec to UTF-8
You have: QTextStream::setCodec()
-
But the docs say this for setCodec() : Sets the codec for this stream to the QTextCodec for the encoding specified by codecName. Common values for codecName include "ISO 8859-1", "UTF-8", and "UTF-16". If the encoding isn't recognized, nothing happens.
You mean I should not use setCodec() ?
-
@Yet-Zio said in How to save file in specific encoding?:
You mean I should not use setCodec() ?
No, I meant that you have to use setCodec() when you want something != UTF-8 since UTF-8 is the default: By default, QTextCodec::codecForLocale() is used, and automatic unicode detection is enabled.
-
I used setCodec("Windows-1250") for test but that doesn't seem to work. It shows UTF-8 in another text editor.
-
Not recognized then:
for(auto codecstr: QTextCodec::availableCodecs()) if(codecstr.startsWith("windows-")) qInfo() << codecstr;
Apparently it starts with lower case. Man there are a LOT of codecs in there (806 on linux).
-
I tried several aliases before too and with Other encodings for with setCodec(codecName)., it doesn't work even if its lower-case. You can try for yourself.
-
Please provide a minimal testcase.
-
// This is the code used for save method. QString fileName; // If we don't have a filename from before, get one. if (currentFile.isEmpty()) { fileName = QFileDialog::getSaveFileName(this, "Save"); currentFile = fileName; } else { fileName = currentFile; } QFile file(fileName); if (!file.open(QIODevice::WriteOnly)) { // If i use QIODevice::Text, then it can't handle line endings. warnMsgBox *openWarnBox = new warnMsgBox(this); openWarnBox->setText("Cannot save file"); openWarnBox->setDetailedText(file.errorString()); QSpacerItem* warnhorizontalSpacer = new QSpacerItem(500, 0, QSizePolicy::Minimum, QSizePolicy::Expanding); QGridLayout* warnlayout = (QGridLayout*)openWarnBox->layout(); warnlayout->addItem(warnhorizontalSpacer, warnlayout->rowCount(), 0, 1, warnlayout->columnCount()); openWarnBox->exec(); return; } QFileInfo fileInfo(file); ui->tabWidget->setTabText(ui->tabWidget->currentIndex(), fileInfo.fileName()); CodeEditor* tempEditor = (CodeEditor*)ui->tabWidget->widget(ui->tabWidget->currentIndex()); tempEditor->changesSet = false; QTextStream out(&file); QByteArray tempCodec; tempCodec.append(tempEditor->codecToUse); QTextCodec *fileCodec = QTextCodec::codecForName(tempCodec); out.setCodec(fileCodec); out << tempEditor->toPlainText(); file.close();
The codecToUse is a personal variable used for knowing the encoding used for current editor from when the file was opened. This is then set to save the file with QTextStream output with codecToUse.
If this was a problem with codecToUse, then i tried manually with "windows-1250" and others such as "ISO-..." and still the same prob
-
Make sure you are writing non trivial text that contains encoded characters. If not it will not have anything to determine what the file type is. Even then it might get it wrong. I inserted an emoji into the text to get a linux program called "uchardet" to check file types. If I didn't include an emoji in the file it would always return ASCII. If I insert an emoji it will return UTF-8 when I encode it UTF-8. If I encode windows-1250 the program returns ISO-8859-3. I have no idea what that means. However, the encoding it definitely different if non trivial data is in the file.
QFile testfile("output.txt"); if (testfile.open(QFile::WriteOnly | QFile::Truncate)){ QTextStream out(&testfile); out.setCodec("windows-1250"); //out.setCodec("UTF-8"); qInfo() << out.codec()->name(); out << "💩Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."; out << "\n"; out << "💩Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."; out << "\n"; }
-
Thanks. However I can convert characters easily between encodings, it seems there's no way to figure out the exact encoding which was used to write the file.
-
@Yet-Zio said in How to save file in specific encoding?:
there's no way to figure out the exact encoding which was used to write the file.
no, since e.g. ascii is a subset of utf-8 and others also have common stuff so how should it be possible.
-
@Christian-Ehrlicher said in How to save file in specific encoding?:
no, since e.g. ascii is a subset of utf-8
you could eventually detect the BOM for Unicode encoded files
-
@Pablo-J-Rogina But it's not mandatory. If there is a BOM then QTextStream will respect it.
-
Maybe to extend on what @fcarney has said: The encoding of a file is usually not stored in the file itself and needs to be guessed. Even for UTF8 the BOM (which would help with detection) is optional. Most text editors look at the full text and then try to guess the encoding. So, one question would be if the text editor you are using for verification is able to guess your encoding.
Then, obviously, the text file you are looking at needs something in there to be able to distinguish encodings. If you use plain ASCII characters (only 128 characters, the first 128 in Unicode) then the editor is allowed to guess any encoding it wants. So, in order for us to be able to help you should also show the text you are trying to save.
One last thing is that I am not sure how Qt would handle text (which is Unicode internally) with characters that are not in the encoding you are using for saving. One possibility would be that it just writes the unicode character as UTF8 (though I don't know). In that case it could happen that the editor guesses UTF8 again (though I find this unlikely).
Anyway, you should create text specific to the encoding you are trying out. You cannot have one text (hopefully with non-ASCII characters as otherwise it would be a futile exercise) and try to save it with any encodings just for fun. Encoding and text have to be a match.