Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. How to design a Qt file reader ?
QtWS25 Last Chance

How to design a Qt file reader ?

Scheduled Pinned Locked Moved Unsolved General and Desktop
filereaderqfiledesign
9 Posts 3 Posters 3.1k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D Offline
    D Offline
    dridk
    wrote on 30 Jan 2016, 12:45 last edited by dridk
    #1

    Hi,
    I wonder how to design a file reader class ( in a Qt fashion ) of different format. I may have different format of huge text file which are DNA sequence ( *.fasta *.fastq ) . What do you suggest to design 1 file reader to read all avaible format ?

    One class SequenceFile with different loader directly encoded inside :

    SequenceFile::fromFasta("file.fasta")
    SequenceFile::fromFastq("file.fastq")
    

    Or create Reader for each format : FastaReader , FastqReader ...

     Sequence * seq = new Sequence(AbstractReader * reader) 
    

    The main problem is those files are huge and cannot be save into the memory..
    If you have other proposal, thanks you

    K 1 Reply Last reply 30 Jan 2016, 14:07
    0
    • D dridk
      30 Jan 2016, 12:45

      Hi,
      I wonder how to design a file reader class ( in a Qt fashion ) of different format. I may have different format of huge text file which are DNA sequence ( *.fasta *.fastq ) . What do you suggest to design 1 file reader to read all avaible format ?

      One class SequenceFile with different loader directly encoded inside :

      SequenceFile::fromFasta("file.fasta")
      SequenceFile::fromFastq("file.fastq")
      

      Or create Reader for each format : FastaReader , FastqReader ...

       Sequence * seq = new Sequence(AbstractReader * reader) 
      

      The main problem is those files are huge and cannot be save into the memory..
      If you have other proposal, thanks you

      K Offline
      K Offline
      kshegunov
      Moderators
      wrote on 30 Jan 2016, 14:07 last edited by kshegunov
      #2

      @dridk
      Hello,
      What kind of files are you working with, binary or text (I see that you mentioned text, but are all of them such)? Can you seek around them, do they have some kind of meta-information about the format embedded? What kind of data are you supposed to read from them?

      Kind regards.

      Read and abide by the Qt Code of Conduct

      M D 2 Replies Last reply 30 Jan 2016, 14:11
      0
      • K kshegunov
        30 Jan 2016, 14:07

        @dridk
        Hello,
        What kind of files are you working with, binary or text (I see that you mentioned text, but are all of them such)? Can you seek around them, do they have some kind of meta-information about the format embedded? What kind of data are you supposed to read from them?

        Kind regards.

        M Offline
        M Offline
        mrjj
        Lifetime Qt Champion
        wrote on 30 Jan 2016, 14:11 last edited by
        #3

        @kshegunov
        its related to this
        https://forum.qt.io/topic/63463/model-with-large-data-from-file

        K 1 Reply Last reply 30 Jan 2016, 14:12
        0
        • M mrjj
          30 Jan 2016, 14:11

          @kshegunov
          its related to this
          https://forum.qt.io/topic/63463/model-with-large-data-from-file

          K Offline
          K Offline
          kshegunov
          Moderators
          wrote on 30 Jan 2016, 14:12 last edited by
          #4

          @mrjj
          Oh, ok I'll look that up it was not referenced in the original post, so I had not idea. Thanks for the link.

          Read and abide by the Qt Code of Conduct

          M 1 Reply Last reply 30 Jan 2016, 14:14
          0
          • K kshegunov
            30 Jan 2016, 14:12

            @mrjj
            Oh, ok I'll look that up it was not referenced in the original post, so I had not idea. Thanks for the link.

            M Offline
            M Offline
            mrjj
            Lifetime Qt Champion
            wrote on 30 Jan 2016, 14:14 last edited by
            #5

            @kshegunov
            I think a memory mapped file with sliding section would be perfect for his needs but
            I have never tried it with Qt and the map function.
            Have you used that functionality ?

            K 1 Reply Last reply 30 Jan 2016, 14:23
            0
            • M mrjj
              30 Jan 2016, 14:14

              @kshegunov
              I think a memory mapped file with sliding section would be perfect for his needs but
              I have never tried it with Qt and the map function.
              Have you used that functionality ?

              K Offline
              K Offline
              kshegunov
              Moderators
              wrote on 30 Jan 2016, 14:23 last edited by kshegunov
              #6

              @mrjj , @dridk
              No, I haven't used them, but any binary file will do. In my investigations of large-scale scintillation detectors signal processing I firstly mapped the text file to binary. This is necessary to at least have the ability for a less-than-a-lifetime seeking. One could put up a very simple "compression" as well, because you only have 4 bases in DNA, so you can encode each base with 2 bits only! This means that an allele will take less than a byte! If the files are not too large (2-4GB) employing such a scheme will even allow you to map the whole data into memory, which is the fastest by any standard. So, my advice is:

              1. Convert the text file to a (possibly temporary) binary file on open (it'll take some time, but should be manageable)
              2. Use an appropriate encoding scheme for the data
              3. (Possibly) Have a simple QAbstractItemModel referencing that data (keeping offsets for the data stream should be sufficient)
              4. Make a custom widget that draws the data

              ADDENDUM

              Back to your original question, which I actually forgot to answer, sorry:
              Consider separating the presentation from the reading of the file. So you could have a class that reads and parses the data by accepting an open QFile/QTextStream instance. Same for the internal format you're using, if you decide to convert the file to a binary. This:

              SequenceFile::fromFasta("file.fasta");
              

              doesn't seem very promising.
              I'd suggest something of the following kind:

              class FastAReader : public QObject
              {
                  Q_OBJECT
              
              public:
                  FastAReader(QTextStream & ts)
                      : QObject(), stream(ts)
                  {
                  }
              
              signals:
                  void sectionReady(FastASectionData data);
              
              public:
                  bool nextSection()
                  {
                       if (stream.atEnd())
                           return false;
              
                       FastASectionData data;
                       // ... Read a data section and fill up your `data` variable
              
                       emit sectionReady(data);
                       return true;
                  }
              
                  bool read()
                  {
                      // Read the whole file
                      while (nextSection())
                          ;
              
                      return stream.status() == QTextStream::Ok;
                  }
              
              private:
                  QTextStream & stream;
              }
              

              For a class like this you could QObject::connect any processing object that does what you want to the data. It can be an object that writes it to your binary file, one that fills up your internal data representation or something of the sort. You could use it simply by providing a valid QTextStream and then invoking the read() function.
              Example usage:

              QFile file("myfile.fasta");
              if (!file.open(QFile::Text | QFile::ReadOnly))
                  ; //< Can't open the file, handle error appropriately
              
              QTextStream stream(&file); //< Attach a stream to the file
              FastAReader reader(stream); //< Initialize the file reader and/or parser
              
              FastADataProcessor processor; //< This would hypothetically process the data
              QObject::connect(&reader, &FastAReader::sectionReady, &processor,  &FastADataProcessor::processSection);
              
              if (!reader.read())
                  ; //< There was a problem reading the file, handle accordingly
              

              I know it's not a complete solution but I hope it'll help you for a start.

              Kind regards.

              Read and abide by the Qt Code of Conduct

              1 Reply Last reply
              1
              • K kshegunov
                30 Jan 2016, 14:07

                @dridk
                Hello,
                What kind of files are you working with, binary or text (I see that you mentioned text, but are all of them such)? Can you seek around them, do they have some kind of meta-information about the format embedded? What kind of data are you supposed to read from them?

                Kind regards.

                D Offline
                D Offline
                dridk
                wrote on 30 Jan 2016, 21:11 last edited by
                #7
                This post is deleted!
                1 Reply Last reply
                0
                • D Offline
                  D Offline
                  dridk
                  wrote on 30 Jan 2016, 21:17 last edited by dridk
                  #8

                  Thanks all for your reply. Yes it's text file, but I cannot compress because there is more than ACGT letter.
                  I didn't understand why it's faster to read binary than text file ? At the end, text file is also a binary file ?
                  I will test all of your solution and I let you know

                  1 Reply Last reply
                  0
                  • K Offline
                    K Offline
                    kshegunov
                    Moderators
                    wrote on 31 Jan 2016, 00:16 last edited by kshegunov
                    #9

                    @dridk
                    Hello,
                    What I suggested is not compression per se, but a way to encode (meaning represent) base pair data more efficiently. As I noted, this is no way a complete solution, but I think it should give you a starting point. Since adenine is complementary to thymine the first could be encoded as a bit sequence 00 and the other as 11, while cytosine and guanine could be encoded as 01 and 10 respectively. This way you can get the complementary base by only inverting bits. Suppose you have encoded half the strain of DNA, then the complementary strain you get simply by inverting all the bits. Since the base data is only 2 bits fixed size you can use offsets to calculate where that data is exactly located in a long base pair sequence. Suppose you have a sequence of alleles and you know that some gene contains 3 alleles and starts with the 35th allele of the base-pair sequence, then you can access the gene sequence very easily. The gene should start at (35 - 1) * 3 = 102th base pair (or 102 * 2 = 204th bit) and the size is simply 9 base pairs or 18 bits. I just hope my biology is not failing me with the calculations. So if you had the whole sequence mapped in a binary file, to read up the gene you seek out the correct position directly by those offsets:

                    QFile mySequenceFile("dnasequence.dna");
                    if (!file.open(QFile::ReadOnly))
                        ; //< You know the drill with handling errors
                    
                    file.seek(25); //< Go to the 25-th byte (200th bit)
                    QByteArray geneSequenceData = file.read(3);  //< Read 3 bytes (up to bit 224)
                    
                    // So in the byte array we've read we have the gene we're interested in, and it starts at the 4-th bit and ends at bit 22
                    // The total number of bits read is 24
                    

                    The whole point of having a structured binary file is to be able to seek around it without actually reading things. Obviously my example is pretty superficial and it's much better to have special class that represent a base pair sequence, class representing gene offsets and other data you might want to handle. Additionally, you probably'd need some meta-information written in that file (offsets of sequences, genes or other things) so you could locate what you need. This is not possible with text files, especially in a platform independent fashion. Moreover a sequence of 4 base pairs you encode in 4 bytes when you use text files, with the proposed encoding scheme you only need a single byte!

                    Kind regards.

                    Read and abide by the Qt Code of Conduct

                    1 Reply Last reply
                    0

                    6/9

                    30 Jan 2016, 14:23

                    • Login

                    • Login or register to search.
                    6 out of 9
                    • First post
                      6/9
                      Last post
                    0
                    • Categories
                    • Recent
                    • Tags
                    • Popular
                    • Users
                    • Groups
                    • Search
                    • Get Qt Extensions
                    • Unsolved