Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

Replacing German Umlaute



  • Hello, i am downloading a csv-file, which has the German Umlaute like "ä" or "ö" in it.
    First, I saw that the "ä" was changed to "xC3\xA4" in the reply->readAll(); then I tried to replace that with "ae" but it does not seem to work.

    
    void Download::downloading()
    {
        manager = new QNetworkAccessManager(this);
        connect(manager, SIGNAL(finished(QNetworkReply*)),
                   this, SLOT(replyFinished(QNetworkReply*)));
    
        manager->get(QNetworkRequest(QUrl("https://www.feiertage.net/csvfile.php?state=SL&year=2020&type=csv")));
    }
    
    void Download::replyFinished(QNetworkReply *reply)
    {
        if(reply->error())
           {
               qDebug() << "ERROR!";
               qDebug() << reply->errorString();
           }
           else
           {
               qDebug() << reply->readAll();//"ä" is "\xC3\xA4"
               //reply->readAll().replace("\xC3\xA4", "ae"); --> this line does not work
               qDebug() << reply->readAll();
    
               QFile *file = new QFile("/somePath/myFile.csv");
               if(file->open(QFile::Append))
               {
                   file->write(reply->readAll());
                   file->flush();
                   file->close();
               }
            delete file;
           }
           reply->deleteLater();
    }
    

  • Lifetime Qt Champion

    @Chaki said in Replacing German Umlaute:

    \xC3\xA4

    This is the correct encoding for UTF-8 encoded 'ä' so all is fine - see https://www.utf8-zeichentabelle.de/



  • Thanks for your answer, but how do I replace the "\xC3\xA4"?



  • Is the above your real code?
    I think you should get an empty file since you've got all the data out before you write to the file.


  • Lifetime Qt Champion

    @Chaki you do not replace it. for writing into a file, the QByteArray is already correct.

    For displaying it to the user, use QString::fromUtf8()

    Regards



  • Oh, well! Like the others mentioned, there are so many things wrong here.

    • Calling readAll() repeatedly does not work as you would expect. Calling readAll() once will read the content of the buffer and then the buffer is empty. Your next call to readAll() will thus read from an empty buffer. It is like reading a file: your file pointer always moves forward when you read - though with a network reply you cannot go back. Directly store the result in a QByteArray if you want to use it multiple times.
    • reply->readAll().replace("\xC3\xA4", "ae"); does exactly what you tell it to do - though this is not what you want. reply->readAll() gives you a temporary object and you call replace on that temporary object. Don't expect that magically changes on temporary objects are still there. If you consider my first point, this discussion will be mute anyway.
    • As mentioned before the CSV file you download seems to be in UTF8. Thus if you just save exactly what you receive you will get a CSV file in UTF8. If you open this file in an editor you have to make sure that it supports UTF8 and detects your file as UTF8.
    • readAll() returns a QByteArray. As the name implies this is just a bunch of bytes without any interpretation like UTF8. If you know that the content is UTF8 you can use QString::fromUtf8(...) to create a new QString. As a trick (though today I would always advise to use UTF8) you can then call toLatin1() on this string to convert it to the proper encoding and it will automatically replace UTF8 characters with the correct ones. Then you don't need to call replace.

  • Lifetime Qt Champion

    @SimonSchroeder said in Replacing German Umlaute:

    As a trick (though today I would always advise to use UTF8) you can then call toLatin1() on this string to convert it to the proper encoding and it will automatically replace UTF8 characters with the correct ones. Then you don't need to call replace.

    But only if all character in that QString are in Latin1 encoding. I would not use that anymore, UTF-8 is just so more generic.

    Regards


Log in to reply