Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Which type of database should I choose to store JSON data?
Forum Updated to NodeBB v4.3 + New Features

Which type of database should I choose to store JSON data?

Scheduled Pinned Locked Moved Unsolved General and Desktop
10 Posts 4 Posters 900 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • N Offline
    N Offline
    npatil15
    wrote on 16 Jun 2020, 06:49 last edited by npatil15
    #1

    Hello All,

    I have number of JSON files, like 20,000 or 30,000, and more to process at once.
    I just need count about XYZ key present in particular file.
    I have found some solutions as below:

    • Store data in a variable like QHash<QString, QHash<QString, QList<s_DATA>>> HashData and extract from it for further processing. Now here it takes time to extract required data from JSON and then put into variable HashData. Extracting was done using the recursive method by digging into files.
    • Store data in Model like QSqlTableModel. Create a standard database as per JSON file structure and then extract data using applying query for further processing.
    • Store data in Model like QStandardItemModel

    I have applied the first solution and it takes normally 1.5 minutes to process 3000 files of JSON, so I'm not sure does it will be good to implement/try other solutions to make it fast enough.
    Please provide me suggestions about other solutions or if you have any other solution for it and how it will be good to implement over first solution.

    Thanks

    J 1 Reply Last reply 16 Jun 2020, 07:51
    0
    • M Offline
      M Offline
      mrjj
      Lifetime Qt Champion
      wrote on 16 Jun 2020, 07:08 last edited by
      #2

      Hi
      If you MUST have data in JSON in first place, then the cost is to parse the files so i don't see putting the json
      files into any other db or model can help with processing time at all.

      I would try another parser and see
      like this
      https://github.com/simdjson/simdjson

      Also, you might be able to optimize your code to run faster but its hard to say from pure guessing.

      1 Reply Last reply
      4
      • N npatil15
        16 Jun 2020, 06:49

        Hello All,

        I have number of JSON files, like 20,000 or 30,000, and more to process at once.
        I just need count about XYZ key present in particular file.
        I have found some solutions as below:

        • Store data in a variable like QHash<QString, QHash<QString, QList<s_DATA>>> HashData and extract from it for further processing. Now here it takes time to extract required data from JSON and then put into variable HashData. Extracting was done using the recursive method by digging into files.
        • Store data in Model like QSqlTableModel. Create a standard database as per JSON file structure and then extract data using applying query for further processing.
        • Store data in Model like QStandardItemModel

        I have applied the first solution and it takes normally 1.5 minutes to process 3000 files of JSON, so I'm not sure does it will be good to implement/try other solutions to make it fast enough.
        Please provide me suggestions about other solutions or if you have any other solution for it and how it will be good to implement over first solution.

        Thanks

        J Offline
        J Offline
        JonB
        wrote on 16 Jun 2020, 07:51 last edited by JonB
        #3

        @npatil15
        As @mrjj said, no point putting these in a database per se.

        However, if processing time is critical, the key does not occur in a fair proportion of the files, and you are willing to use a tiny bit of "fuzzy" logic: you could probably save a lot of time if there are many files to eliminate simply be pre-searching your files for XYZ key (or indeed for "XYZ":) --- remember JSON is text --- and only passing on those files which contain it to be JSON-parsed. Text searching will be fast, parsing will by much slower.

        If that improves for you, you could then further store all the files in a database table, provided the DB provider allows it to be stored as a text blob which can be searched fast, to avoid the overhead of having to open that many files.

        Finally, alternatively/additionally, you could do the reading/searching across a number of threads. I don't know how much difference it would make --- you'd have to test --- but there may be some latency in read I/O which could be saved (i.e. one thread may be able to search while another is "sleeping" on I/O read).

        1 Reply Last reply
        4
        • M Offline
          M Offline
          mrjj
          Lifetime Qt Champion
          wrote on 16 Jun 2020, 09:04 last edited by
          #4

          Hi
          Btw how big are each file?
          If they are not so big, you could also load to memory and use threading as @JonB suggest to parse them in parallel. That could potentially cut time time in 4
          or more depending on the hardware/cpu.

          1 Reply Last reply
          0
          • N Offline
            N Offline
            npatil15
            wrote on 16 Jun 2020, 09:12 last edited by
            #5

            @mrjj , normally file on average below 300kb.

            Let me correct if I'm wrong, so as per @JonB solution, I have to create n threads for n number of JSON files. So I'll try this.

            And FYI, I have used this script (QJsonPath) to dig into the JSON file to extract data and from the data, I used to take count of it only.

            J 1 Reply Last reply 16 Jun 2020, 09:19
            0
            • N npatil15
              16 Jun 2020, 09:12

              @mrjj , normally file on average below 300kb.

              Let me correct if I'm wrong, so as per @JonB solution, I have to create n threads for n number of JSON files. So I'll try this.

              And FYI, I have used this script (QJsonPath) to dig into the JSON file to extract data and from the data, I used to take count of it only.

              J Offline
              J Offline
              J.Hilk
              Moderators
              wrote on 16 Jun 2020, 09:19 last edited by
              #6

              @npatil15 said in Which type of database should I choose to store JSON data?:

              I have to create n threads for n number of JSON files.

              nonono!

              You create n threads where n < Number of CPU cores, as a rule of thumb

              To make it more professional, use https://doc.qt.io/qt-5/qthread.html#idealThreadCount


              Be aware of the Qt Code of Conduct, when posting : https://forum.qt.io/topic/113070/qt-code-of-conduct


              Q: What's that?
              A: It's blue light.
              Q: What does it do?
              A: It turns blue.

              1 Reply Last reply
              4
              • N Offline
                N Offline
                npatil15
                wrote on 16 Jun 2020, 09:28 last edited by
                #7

                At this (threadandconcurrent), I have found some threading solutions, so which I should prefer?

                I'll explain whatever I understood about implementation:

                • Create threads, based on number fo CPU core available
                • Will have one global structure where I'll store all information extracted from threads. I'll implement a mutex lock to not affect by deadlock. A semaphore will be useless because it has no dependency on other files.
                • Then I can use that global variable/structure for further processing. (This part is already working fine/fast).

                Correct me if anything is wrong.

                J J 2 Replies Last reply 16 Jun 2020, 09:31
                1
                • N npatil15
                  16 Jun 2020, 09:28

                  At this (threadandconcurrent), I have found some threading solutions, so which I should prefer?

                  I'll explain whatever I understood about implementation:

                  • Create threads, based on number fo CPU core available
                  • Will have one global structure where I'll store all information extracted from threads. I'll implement a mutex lock to not affect by deadlock. A semaphore will be useless because it has no dependency on other files.
                  • Then I can use that global variable/structure for further processing. (This part is already working fine/fast).

                  Correct me if anything is wrong.

                  J Offline
                  J Offline
                  J.Hilk
                  Moderators
                  wrote on 16 Jun 2020, 09:31 last edited by
                  #8

                  @npatil15 sounds about right, as one solution

                  but instead of

                  Will have one global structure where I'll store all information extracted from threads. I'll implement a mutex lock to not affect by deadlock. A semaphore will be useless because it has no dependency on other files.

                  you could add a copy constructor to a smaller struct and use signal & slots to transfer it across threads.

                  Depending on the complexity of your struct it may however be just as complex to write than your mutex solution :)


                  Be aware of the Qt Code of Conduct, when posting : https://forum.qt.io/topic/113070/qt-code-of-conduct


                  Q: What's that?
                  A: It's blue light.
                  Q: What does it do?
                  A: It turns blue.

                  1 Reply Last reply
                  2
                  • N npatil15
                    16 Jun 2020, 09:28

                    At this (threadandconcurrent), I have found some threading solutions, so which I should prefer?

                    I'll explain whatever I understood about implementation:

                    • Create threads, based on number fo CPU core available
                    • Will have one global structure where I'll store all information extracted from threads. I'll implement a mutex lock to not affect by deadlock. A semaphore will be useless because it has no dependency on other files.
                    • Then I can use that global variable/structure for further processing. (This part is already working fine/fast).

                    Correct me if anything is wrong.

                    J Offline
                    J Offline
                    JonB
                    wrote on 16 Jun 2020, 09:38 last edited by JonB
                    #9

                    @npatil15

                    • You do not want to try to create 25,000 threads for that number of files! :) As @mrjj & you have discovered, a small number.

                    • [I see @J-Hilk has replied while I am typing this.] You could use mutexs, or you may find that sending a (queued, cross-thread) signal is faster, you'd have to test.

                    • My guess, based on no experience and holding a finger to the wind, is that this approach will approximately halve your processing time. (Let us know!) However, if you can afford to do what I said about text-searching the files for the required string and only JSON-parsing the resultant hit files, and if that eliminates a large number of contenders (say >50%), this will make a big difference, as the JSON parsing will be really slow compared to reading/searching....

                    1 Reply Last reply
                    4
                    • N Offline
                      N Offline
                      npatil15
                      wrote on 16 Jun 2020, 09:45 last edited by
                      #10

                      @JonB Yeah I got the scenario for implementation.
                      I'll try to implement it, and get back with how it works compared to the previous implementation.

                      1 Reply Last reply
                      0

                      1/10

                      16 Jun 2020, 06:49

                      • Login

                      • Login or register to search.
                      1 out of 10
                      • First post
                        1/10
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • Users
                      • Groups
                      • Search
                      • Get Qt Extensions
                      • Unsolved