Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Qdatastreams and binary files.
Forum Updated to NodeBB v4.3 + New Features

Qdatastreams and binary files.

Scheduled Pinned Locked Moved Unsolved General and Desktop
36 Posts 7 Posters 5.2k Views 2 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • StyxS Offline
    StyxS Offline
    Styx
    wrote on last edited by
    #21

    I have to read some where around 2000 binary files non of them the same but some contain same data.

    How would i use seek and read dynamically to read each file. (Qfile api).

    1 Reply Last reply
    0
    • StyxS Styx

      @JKSH Since .data() is null terminated. Think it would be better to use shift left.

      // Assuming that your file is little-endian...
      m_binaryVersion = *reinterpret_cast<const quint8* >(data +  0 ) >> 8;
      bookCount       = *reinterpret_cast<const quint8* >(data +  5) >> 12;
      bookHash        = *reinterpret_cast<const quint32*>(data + 13) >> 16;
      
      // example
      bookCount=256 the first byte is '\0' then all the rest will be undetermined.
      

      Shouldn't have issues calling the index and then looping thru the qbytearray to print out the data as well.

      JKSHJ Offline
      JKSHJ Offline
      JKSH
      Moderators
      wrote on last edited by
      #22

      @Styx said in Qdatastreams and binary files.:

      @JKSH Since .data() is null terminated. Think it would be better to use shift left.

      m_binaryVersion = *reinterpret_cast<const quint8* >(data +  0 ) >> 8;
      bookCount       = *reinterpret_cast<const quint8* >(data +  5) >> 12;
      bookHash        = *reinterpret_cast<const quint32*>(data + 13) >> 16;
      
      // example
      bookCount=256 the first byte is '\0' then all the rest will be undetermined.
      

      I don't get it. Could you please explain how this works?

      @Styx said in Qdatastreams and binary files.:

      How would i use seek and read dynamically to read each file. (Qfile api).

      Take the code that reads one file and put it in a loop. Pass a different filename each loop iteration.

      Qt Doc Search for browsers: forum.qt.io/topic/35616/web-browser-extension-for-improved-doc-searches

      1 Reply Last reply
      1
      • StyxS Styx

        @JKSH Since .data() is null terminated. Think it would be better to use shift left.

        // Assuming that your file is little-endian...
        m_binaryVersion = *reinterpret_cast<const quint8* >(data +  0 ) >> 8;
        bookCount       = *reinterpret_cast<const quint8* >(data +  5) >> 12;
        bookHash        = *reinterpret_cast<const quint32*>(data + 13) >> 16;
        
        // example
        bookCount=256 the first byte is '\0' then all the rest will be undetermined.
        

        Shouldn't have issues calling the index and then looping thru the qbytearray to print out the data as well.

        JonBJ Online
        JonBJ Online
        JonB
        wrote on last edited by JonB
        #23

        @Styx said in Qdatastreams and binary files.:

        @JKSH Since .data() is null terminated. Think it would be better to use shift left.

        // Assuming that your file is little-endian...
        m_binaryVersion = *reinterpret_cast<const quint8* >(data +  0 ) >> 8;
        bookCount       = *reinterpret_cast<const quint8* >(data +  5) >> 12;
        bookHash        = *reinterpret_cast<const quint32*>(data + 13) >> 16;
        
        // example
        bookCount=256 the first byte is '\0' then all the rest will be undetermined.
        

        I don't know what you're trying to achieve here (as @JKSH said), but:

        • You are using shift right, not left.

        • *reinterpret_cast<const quint8* >(data + 0 ) returns a quint8. Since that is (unsigned) 8-bits in size, >> 8 always returns 0 regardless of content.

        • Similarly for *reinterpret_cast<const quint8* >(data + 5) >> 12, except that >> 12 makes even less sense for an 8-bit value.

        • QByteArray:data() is indeed (extra) \0 terminated, but that has no relevance to any of the lines of code you wrote.

        The code without any shifts written by @JKSH makes sense. I'm afraid yours does not!

        1 Reply Last reply
        2
        • JKSHJ JKSH

          @Christian-Ehrlicher said in Qdatastreams and binary files.:

          Work with a plain const char * pointer

          To add to @Christian-Ehrlicher's point: CallQByteArray:data() or QByteArray::constData() to get a raw pointer to your data. Then, you can use pointer arithmetic to extract your data.

          QByteArray ba = file.readAll();
          const char* data = ba.constData();
          
          // Assuming that your file is little-endian...
          memcpy(&m_binaryVersion, data +  0, sizeof(quint8 ));
          memcpy(&bookCount,       data +  5, sizeof(quint8 ));
          memcpy(&bookHash,        data + 13, sizeof(quint32));
          

          EDIT: Code above changed from reinterpret_cast<> to memcpy() for cross-platform safety

          JonBJ Online
          JonBJ Online
          JonB
          wrote on last edited by JonB
          #24

          @JKSH said in Qdatastreams and binary files.:

          bookHash = *reinterpret_cast<const quint32*>(data + 13);

          Have you actually tried this line? Because I would assume it will "segment fault" (or whatever, probably something else). You are trying to dereference a 32-bit int from data + 13, which will be an odd numbered address. Whoops! :) [I have a feeling static_cast<> would warn/prohibit this at compile-time?]

          You must be very careful recommending to treat a binary block like this as though you can index into it directly for the types you know were serialized there, for this kind of reason. Here you need to pull the 4 bytes out of the buffer (e.g. memcpy() directly into an &quint32 if you know endian-ness is same on host as in file), or some other safe approach.

          JKSHJ 1 Reply Last reply
          1
          • JonBJ JonB

            @JKSH said in Qdatastreams and binary files.:

            bookHash = *reinterpret_cast<const quint32*>(data + 13);

            Have you actually tried this line? Because I would assume it will "segment fault" (or whatever, probably something else). You are trying to dereference a 32-bit int from data + 13, which will be an odd numbered address. Whoops! :) [I have a feeling static_cast<> would warn/prohibit this at compile-time?]

            You must be very careful recommending to treat a binary block like this as though you can index into it directly for the types you know were serialized there, for this kind of reason. Here you need to pull the 4 bytes out of the buffer (e.g. memcpy() directly into an &quint32 if you know endian-ness is same on host as in file), or some other safe approach.

            JKSHJ Offline
            JKSHJ Offline
            JKSH
            Moderators
            wrote on last edited by
            #25

            @JonB said in Qdatastreams and binary files.:

            @JKSH said in Qdatastreams and binary files.:

            bookHash = *reinterpret_cast<const quint32*>(data + 13);

            Have you actually tried this line? Because I would assume it will "segment fault" (or whatever, probably something else). You are trying to dereference a 32-bit int from data + 13, which will be an odd numbered address. Whoops! :)

            Thanks for the heads-up. I tried compiling it using MinGW 7.3.0 32-bit, MSVC 2017 32-bit, and MSVC2017 64-bit (all with Qt 5.14.0, release mode) and got the expected results every time. However, your comment prompted me to do some digging which led me to this question: Should I worry about the alignment during pointer casting?

            I'll update my sample code.

            [I have a feeling static_cast<> would warn/prohibit this at compile-time?]

            Static casting cannot be used to convert a byte array into an integer at all, no matter where the bytes sit in memory.

            Qt Doc Search for browsers: forum.qt.io/topic/35616/web-browser-extension-for-improved-doc-searches

            1 Reply Last reply
            4
            • Christian EhrlicherC Offline
              Christian EhrlicherC Offline
              Christian Ehrlicher
              Lifetime Qt Champion
              wrote on last edited by
              #26

              @JKSH said in Qdatastreams and binary files.:

              You are trying to dereference a 32-bit int from data + 13, which will be an odd numbered address.

              This is working fine on x86_64, only slow. It does not work on some ARM processors, see e.g. here: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html

              Qt Online Installer direct download: https://download.qt.io/official_releases/online_installers/
              Visit the Qt Academy at https://academy.qt.io/catalog

              1 Reply Last reply
              5
              • JonBJ Online
                JonBJ Online
                JonB
                wrote on last edited by
                #27

                @JKSH , @Christian-Ehrlicher
                Very interesting! I thought processors just "bus-dumped" or whatever on an odd address, I didn't know they would "trap" the alignment and "recover", and thereby work but run slowly. I wonder what the last "friendly" processor architecture I saw --- Motorola 68000 family, like 68010 or 68020, not this x86-type stuff --- would have done? :)

                1 Reply Last reply
                0
                • StyxS Offline
                  StyxS Offline
                  Styx
                  wrote on last edited by
                  #28

                  So I have some 3000 files to go through and read. Currently I have been indexof and mid to find strings and variables.

                  QByteArray filedata = file.readall();
                  int j = 0;
                  while ((j = filedata.indexOf("books", j)) != -1) {
                      QDegub ()  << "Found String  index position " << j ;
                      ++j;
                  // put the qbytearray into a qstring
                  }
                  

                  This method can get ugly as some of the files have over 50 strings inside it and this would make the source code look ugly.

                  Should i just seek to the start position then read from that point on? Should i use readline? Or read? qbytearray readall then store it in another buffer. Is there a way to extract strings from a qbytearray?

                  1 Reply Last reply
                  0
                  • K Offline
                    K Offline
                    kuzulis
                    Qt Champions 2020
                    wrote on last edited by kuzulis
                    #29

                    Stop, guys... As I remember, you can read a simple data types (int, uint and etc) using the QDataStream. And even own structures, which are not written by QDataStream (use raw read for this). You even can read a strings as a RAW objects.

                    1 Reply Last reply
                    0
                    • StyxS Offline
                      StyxS Offline
                      Styx
                      wrote on last edited by
                      #30

                      @kuzulis You explain what your talking about?

                      Always thought Qdatastreams couldn't parse padding structures and that you could only read and write from it if it was done by qt.

                      Mind showing a example?

                      JonBJ 1 Reply Last reply
                      0
                      • StyxS Styx

                        @kuzulis You explain what your talking about?

                        Always thought Qdatastreams couldn't parse padding structures and that you could only read and write from it if it was done by qt.

                        Mind showing a example?

                        JonBJ Online
                        JonBJ Online
                        JonB
                        wrote on last edited by JonB
                        #31

                        @Styx
                        Despite what @kuzulis has written, it does not follow that you can use QDataStream to deserialize, say, an int, even if there is no padding at all in its serialization (I don't know what QDataStream does or does not put in). The point is you have said that the file format you are trying to read is produced by someone else, not using QDataStream to serialize, right? In that case, as an example, even if it outputs, say, a 32-bit int in 4 bytes you do not know whether that means low->high or high->low bytes. And nor does QDataStream. So how can that correctly deserialize if the way it was saved differs from however QDataStream int-order deserialization works?

                        EDIT See @kuzulis's code below which shows that you must tell QDataStream which order to expect if the output was not produced with the default QDataStream order, which then allows you to proceed.

                        1 Reply Last reply
                        0
                        • K Offline
                          K Offline
                          kuzulis
                          Qt Champions 2020
                          wrote on last edited by kuzulis
                          #32

                          What? Here an example e.g. how to parse a WAV file using the QDataStream:

                          ...
                          const quint32 kRiffId = 0x52494646;
                          const quint32 kWaveId = 0x57415645;
                          const quint32 kFmtId = 0x666d7420;
                          const quint32 kPcmFmtSize = 16; // for PCM only
                          const quint16 kAudioFormatId = 1; // WAVE_FORMAT_PCM
                          const quint32 kDataId = 0x64617461;
                          
                          ...
                              bool readFormat()
                              {
                                  file.seek(0);
                          
                                  QDataStream in(&file);
                          
                                  quint32 chunkId = 0;
                                  in.setByteOrder(QDataStream::BigEndian);
                                  in >> chunkId;
                                  if (chunkId != kRiffId) // "RIFF"
                                      return false;
                          
                                  quint32 chunkSize = 0;
                                  in.setByteOrder(QDataStream::LittleEndian);
                                  in >> chunkSize; // file size
                          
                                  quint32 formatId = 0;
                                  in.setByteOrder(QDataStream::BigEndian);
                                  in >> formatId;
                                  if (formatId != kWaveId) // "WAVE"
                                      return false;
                          
                                  quint32 subchunk1Id = 0;
                                  in.setByteOrder(QDataStream::BigEndian);
                                  in >> subchunk1Id;
                                  if (subchunk1Id != kFmtId) // "fmt "
                                      return false;
                          
                                  quint32 subchunk1Size = 0;
                                  in.setByteOrder(QDataStream::LittleEndian);
                                  in >> subchunk1Size;
                                  if (subchunk1Size != kPcmFmtSize) // for PCM format only
                                      return false;
                          
                                  quint16 audioFormat = 0;
                                  in.setByteOrder(QDataStream::LittleEndian);
                                  in >> audioFormat;
                                  if (audioFormat != kAudioFormatId) // for PCM format only
                                      return false;
                          
                                  quint16 numChannels = 0;
                                  in.setByteOrder(QDataStream::LittleEndian);
                                  in >> numChannels;
                                  if (numChannels == 0)
                                      return false;
                          
                                  quint32 sampleRate = 0;
                                  in.setByteOrder(QDataStream::LittleEndian);
                                  in >> sampleRate;
                                  if (sampleRate == 0)
                                      return false;
                          
                                  quint32 byteRate = 0;
                                  in.setByteOrder(QDataStream::LittleEndian);
                                  in >> byteRate;
                          
                                  quint16 blockAlign = 0;
                                  in.setByteOrder(QDataStream::LittleEndian);
                                  in >> blockAlign;
                                  if (blockAlign == 0)
                                      return false;
                          
                                  quint16 bitsPerSample = 0;
                                  in.setByteOrder(QDataStream::LittleEndian);
                                  in >> bitsPerSample;
                                  if (bitsPerSample == 0)
                                      return false;
                          
                                  quint32 subchunk2Id = 0;
                                  in.setByteOrder(QDataStream::BigEndian);
                                  in >> subchunk2Id;
                                  if (subchunk2Id != kDataId) // "data"
                                      return false;
                          
                                  quint32 subchunk2Size = 0;
                                  in.setByteOrder(QDataStream::LittleEndian);
                                  in >> subchunk2Size;
                                  if (subchunk2Size == 0)
                                      return false;
                          
                                  startDataOffset = sizeof(chunkId)
                                          + sizeof(chunkSize)
                                          + sizeof(formatId)
                                          + sizeof(subchunk1Id)
                                          + sizeof(subchunk1Size)
                                          + sizeof(audioFormat)
                                          + sizeof(numChannels)
                                          + sizeof(sampleRate)
                                          + sizeof(byteRate)
                                          + sizeof(blockAlign)
                                          + sizeof(bitsPerSample)
                                          + sizeof(subchunk2Id)
                                          + sizeof(subchunk2Size);
                          
                                  format.setCodec(QLatin1String(kAudioCodec));
                                  format.setChannelCount(numChannels);
                                  format.setSampleRate(sampleRate);
                                  format.setSampleSize(bitsPerSample);
                                  format.setSampleType(QAudioFormat::SignedInt); // TODO: This is correctly?
                                  format.setByteOrder(QAudioFormat::LittleEndian);
                          
                                  return file.seek(startDataOffset);
                              }
                          
                          JonBJ 1 Reply Last reply
                          1
                          • K kuzulis

                            What? Here an example e.g. how to parse a WAV file using the QDataStream:

                            ...
                            const quint32 kRiffId = 0x52494646;
                            const quint32 kWaveId = 0x57415645;
                            const quint32 kFmtId = 0x666d7420;
                            const quint32 kPcmFmtSize = 16; // for PCM only
                            const quint16 kAudioFormatId = 1; // WAVE_FORMAT_PCM
                            const quint32 kDataId = 0x64617461;
                            
                            ...
                                bool readFormat()
                                {
                                    file.seek(0);
                            
                                    QDataStream in(&file);
                            
                                    quint32 chunkId = 0;
                                    in.setByteOrder(QDataStream::BigEndian);
                                    in >> chunkId;
                                    if (chunkId != kRiffId) // "RIFF"
                                        return false;
                            
                                    quint32 chunkSize = 0;
                                    in.setByteOrder(QDataStream::LittleEndian);
                                    in >> chunkSize; // file size
                            
                                    quint32 formatId = 0;
                                    in.setByteOrder(QDataStream::BigEndian);
                                    in >> formatId;
                                    if (formatId != kWaveId) // "WAVE"
                                        return false;
                            
                                    quint32 subchunk1Id = 0;
                                    in.setByteOrder(QDataStream::BigEndian);
                                    in >> subchunk1Id;
                                    if (subchunk1Id != kFmtId) // "fmt "
                                        return false;
                            
                                    quint32 subchunk1Size = 0;
                                    in.setByteOrder(QDataStream::LittleEndian);
                                    in >> subchunk1Size;
                                    if (subchunk1Size != kPcmFmtSize) // for PCM format only
                                        return false;
                            
                                    quint16 audioFormat = 0;
                                    in.setByteOrder(QDataStream::LittleEndian);
                                    in >> audioFormat;
                                    if (audioFormat != kAudioFormatId) // for PCM format only
                                        return false;
                            
                                    quint16 numChannels = 0;
                                    in.setByteOrder(QDataStream::LittleEndian);
                                    in >> numChannels;
                                    if (numChannels == 0)
                                        return false;
                            
                                    quint32 sampleRate = 0;
                                    in.setByteOrder(QDataStream::LittleEndian);
                                    in >> sampleRate;
                                    if (sampleRate == 0)
                                        return false;
                            
                                    quint32 byteRate = 0;
                                    in.setByteOrder(QDataStream::LittleEndian);
                                    in >> byteRate;
                            
                                    quint16 blockAlign = 0;
                                    in.setByteOrder(QDataStream::LittleEndian);
                                    in >> blockAlign;
                                    if (blockAlign == 0)
                                        return false;
                            
                                    quint16 bitsPerSample = 0;
                                    in.setByteOrder(QDataStream::LittleEndian);
                                    in >> bitsPerSample;
                                    if (bitsPerSample == 0)
                                        return false;
                            
                                    quint32 subchunk2Id = 0;
                                    in.setByteOrder(QDataStream::BigEndian);
                                    in >> subchunk2Id;
                                    if (subchunk2Id != kDataId) // "data"
                                        return false;
                            
                                    quint32 subchunk2Size = 0;
                                    in.setByteOrder(QDataStream::LittleEndian);
                                    in >> subchunk2Size;
                                    if (subchunk2Size == 0)
                                        return false;
                            
                                    startDataOffset = sizeof(chunkId)
                                            + sizeof(chunkSize)
                                            + sizeof(formatId)
                                            + sizeof(subchunk1Id)
                                            + sizeof(subchunk1Size)
                                            + sizeof(audioFormat)
                                            + sizeof(numChannels)
                                            + sizeof(sampleRate)
                                            + sizeof(byteRate)
                                            + sizeof(blockAlign)
                                            + sizeof(bitsPerSample)
                                            + sizeof(subchunk2Id)
                                            + sizeof(subchunk2Size);
                            
                                    format.setCodec(QLatin1String(kAudioCodec));
                                    format.setChannelCount(numChannels);
                                    format.setSampleRate(sampleRate);
                                    format.setSampleSize(bitsPerSample);
                                    format.setSampleType(QAudioFormat::SignedInt); // TODO: This is correctly?
                                    format.setByteOrder(QAudioFormat::LittleEndian);
                            
                                    return file.seek(startDataOffset);
                                }
                            
                            JonBJ Online
                            JonBJ Online
                            JonB
                            wrote on last edited by JonB
                            #33

                            @kuzulis
                            Sorry, what I meant was, this works because you know at each stage whether you expect little-endian or big-endian order in the external data stream format, and you explicitly code to tell QDataStream that. I meant that a pure call to whichever way round the default of QDataStream expects will not work, whereas it would if QDataStream (with default settings) had also been used at the output side.

                            I won't delete my post now, but I will amend it to point out that what you have written allows it to be accomplished.

                            1 Reply Last reply
                            0
                            • StyxS Offline
                              StyxS Offline
                              Styx
                              wrote on last edited by
                              #34

                              Far as the format goes its LittleEndian (binary)... goes string to value throught the entire file. Somethings are confusing what about a uknown format? What is startDataOffset?

                              1 Reply Last reply
                              0
                              • K Offline
                                K Offline
                                kuzulis
                                Qt Champions 2020
                                wrote on last edited by kuzulis
                                #35

                                Somethings are confusing what about a uknown format?

                                What do you meant?

                                What is startDataOffset?

                                It is a position in a WAVE file from where the data samples begins (after a WAVE header).

                                PS: It is just an example.. You should himself know a format of your file. And then you can use the stream operators for the 1,2,4,8 byte - integers, the 4,8 byte floats/doubles.. And to use the readRawData() to read a BLOB's, and to use the seek() if need. Then you can use the QDataStream.

                                1 Reply Last reply
                                1
                                • StyxS Offline
                                  StyxS Offline
                                  Styx
                                  wrote on last edited by
                                  #36

                                  Format is just some custom made binary file. I'm just trying to create a parse that converts the binary to xml. I can open and read the file fine just my method of finding data and converting it isn't the best method of use. Not much good examples of Qfile or QbyteArray to do all the things i'm trying to do.

                                  1 Reply Last reply
                                  0

                                  • Login

                                  • Login or register to search.
                                  • First post
                                    Last post
                                  0
                                  • Categories
                                  • Recent
                                  • Tags
                                  • Popular
                                  • Users
                                  • Groups
                                  • Search
                                  • Get Qt Extensions
                                  • Unsolved