Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Huge number of threads => out of memory?
Forum Updated to NodeBB v4.3 + New Features

Huge number of threads => out of memory?

Scheduled Pinned Locked Moved Solved General and Desktop
27 Posts 5 Posters 10.8k Views 3 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • N Offline
    N Offline
    Nmut
    wrote on last edited by
    #1

    Hello

    I have to decode huge text data (about 150GB) composed of "messages" (about 5kB) and of course, I use multithreading to help to reduce the decoding time like for all my heavy tasks.

    The program is really simple, there is no interface, only data to proceed. Unfortunately, I have a problem. The main loop reads the data and sends it to threads via MyFutureWatcher::setFuture(QtConcurrent::run(this, &decodeMessage, messageString). I expected the setFuture call to be blocked after a certain number of calls waiting for some threads to be completed. This is not the case and of course as the file reading is faster than the message decoding on a i5 4 cores PC (there is no problem on a R7 6C/12T PC), there is a stack overflow somewhere after 5 minutes, strangely not trapped in debug mode.

    My first attempt to fix the problem was to add a MyFutureWatcher::waitForFinished() each 1000 messages read. Unexpectedly the problem is only reduced but not solved!?!? The only dirty way to solve it is to add a Sleep(400) :-/ to speed down the main loop. I tried in Qt5.4, Qt5.7 and Qt5.8 mingw and Qt5.7 and Qt5.8 VS2013 with the same behavior.

    My questions now:

    • Has somebody had the problem and solved it?
    • Is there something to check stacks size in the watcher to wait when stacks are full or more than a limit? At least is there a method to know the number pending threads in the watcher. I didn't see anything in the doc... :-(
    • Why MyFutureWatcher::waitForFinished() does not work as I expected? It seems to be OK in all my other programs.

    Thanx for your anwers!

    raven-worxR 1 Reply Last reply
    0
    • VRoninV Offline
      VRoninV Offline
      VRonin
      wrote on last edited by
      #2

      There is something fishy going on here. can you post the code for MyFutureWatcher and decodeMessage

      "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
      ~Napoleon Bonaparte

      On a crusade to banish setIndexWidget() from the holy land of Qt

      1 Reply Last reply
      1
      • N Nmut

        Hello

        I have to decode huge text data (about 150GB) composed of "messages" (about 5kB) and of course, I use multithreading to help to reduce the decoding time like for all my heavy tasks.

        The program is really simple, there is no interface, only data to proceed. Unfortunately, I have a problem. The main loop reads the data and sends it to threads via MyFutureWatcher::setFuture(QtConcurrent::run(this, &decodeMessage, messageString). I expected the setFuture call to be blocked after a certain number of calls waiting for some threads to be completed. This is not the case and of course as the file reading is faster than the message decoding on a i5 4 cores PC (there is no problem on a R7 6C/12T PC), there is a stack overflow somewhere after 5 minutes, strangely not trapped in debug mode.

        My first attempt to fix the problem was to add a MyFutureWatcher::waitForFinished() each 1000 messages read. Unexpectedly the problem is only reduced but not solved!?!? The only dirty way to solve it is to add a Sleep(400) :-/ to speed down the main loop. I tried in Qt5.4, Qt5.7 and Qt5.8 mingw and Qt5.7 and Qt5.8 VS2013 with the same behavior.

        My questions now:

        • Has somebody had the problem and solved it?
        • Is there something to check stacks size in the watcher to wait when stacks are full or more than a limit? At least is there a method to know the number pending threads in the watcher. I didn't see anything in the doc... :-(
        • Why MyFutureWatcher::waitForFinished() does not work as I expected? It seems to be OK in all my other programs.

        Thanx for your anwers!

        raven-worxR Offline
        raven-worxR Offline
        raven-worx
        Moderators
        wrote on last edited by
        #3

        @Nmut
        why do you create a new thread for each message?! This is a you encountered just overkill.
        Isn't it enough to simply have one thread and just send the message (one-by-one) via a queued signal to the thread?
        The thread then can use QQueue to queue each message and processes them.

        --- SUPPORT REQUESTS VIA CHAT WILL BE IGNORED ---
        If you have a question please use the forum so others can benefit from the solution in the future

        1 Reply Last reply
        0
        • N Offline
          N Offline
          Nmut
          wrote on last edited by Nmut
          #4

          @VRonin
          I will not post my code :-P.
          Here is a small code that shows the problem.

          #include "widget.h"
          
          #include <QFutureWatcher>
          #include <QtConcurrent>
          
          Widget::Widget(QWidget *parent) :
              QWidget(parent)
          {
              setupUi(this);
          }
          
          void Widget::on__pbRun_clicked()
          {
              QFutureWatcher<void> watcher;
          
              for (int i = 1; i < INT_MAX; ++i)       // from 1 to avoid watcher to wait at first loop!
              {
                  QByteArray message;     // new instance to avoid Qt lazy copy optimizations! :-D
                  message.fill('x', 4096);
                  watcher.setFuture(QtConcurrent::run(this, Widget::decodeMessage, message));
          
                  if (i % 100000 == 0)
                      watcher.waitForFinished();
              }
          
              watcher.waitForFinished();
          }
          
          void Widget::decodeMessage(const QByteArray & message)
          {
              Q_UNUSED(message)
          
              // do some really heavy and long stuff!!! :-P
              while(true);
          }
          

          In this example, the waitForFinish in the main loop works, but you have the problem of overflow if you comment this waitForFinish...

          @raven-worx
          I'm doing this way as it is very simple (2 lines of codes for thread management). As the decoding task is heavy, the thread management overhead is not noticeable.
          Anyway, your solution will not solve the stack oversize problem as the queue will crow forever.

          Thanks for your answers.

          raven-worxR 1 Reply Last reply
          0
          • N Nmut

            @VRonin
            I will not post my code :-P.
            Here is a small code that shows the problem.

            #include "widget.h"
            
            #include <QFutureWatcher>
            #include <QtConcurrent>
            
            Widget::Widget(QWidget *parent) :
                QWidget(parent)
            {
                setupUi(this);
            }
            
            void Widget::on__pbRun_clicked()
            {
                QFutureWatcher<void> watcher;
            
                for (int i = 1; i < INT_MAX; ++i)       // from 1 to avoid watcher to wait at first loop!
                {
                    QByteArray message;     // new instance to avoid Qt lazy copy optimizations! :-D
                    message.fill('x', 4096);
                    watcher.setFuture(QtConcurrent::run(this, Widget::decodeMessage, message));
            
                    if (i % 100000 == 0)
                        watcher.waitForFinished();
                }
            
                watcher.waitForFinished();
            }
            
            void Widget::decodeMessage(const QByteArray & message)
            {
                Q_UNUSED(message)
            
                // do some really heavy and long stuff!!! :-P
                while(true);
            }
            

            In this example, the waitForFinish in the main loop works, but you have the problem of overflow if you comment this waitForFinish...

            @raven-worx
            I'm doing this way as it is very simple (2 lines of codes for thread management). As the decoding task is heavy, the thread management overhead is not noticeable.
            Anyway, your solution will not solve the stack oversize problem as the queue will crow forever.

            Thanks for your answers.

            raven-worxR Offline
            raven-worxR Offline
            raven-worx
            Moderators
            wrote on last edited by raven-worx
            #5

            @Nmut said in Huge number of threads => out of memory?:

            As the decoding task is heavy, the thread management overhead is not noticeable.

            sure, the thread management slows down the whole process. Since there are so many threads one cycle/round of all threads takes longer of course.

            Anyway, your solution will not solve the stack oversize problem as the queue will crow forever.

            as i suggested to use QQueue it won't grow to infinity. Every consumption (dequeue) will remove if from the queue. Thats why it is called queue.

            --- SUPPORT REQUESTS VIA CHAT WILL BE IGNORED ---
            If you have a question please use the forum so others can benefit from the solution in the future

            N 1 Reply Last reply
            1
            • raven-worxR raven-worx

              @Nmut said in Huge number of threads => out of memory?:

              As the decoding task is heavy, the thread management overhead is not noticeable.

              sure, the thread management slows down the whole process. Since there are so many threads one cycle/round of all threads takes longer of course.

              Anyway, your solution will not solve the stack oversize problem as the queue will crow forever.

              as i suggested to use QQueue it won't grow to infinity. Every consumption (dequeue) will remove if from the queue. Thats why it is called queue.

              N Offline
              N Offline
              Nmut
              wrote on last edited by Nmut
              #6

              @raven-worx

              1. I checked the overhead of thread management, I'm not a complete beginner you know 8-D!
                In this specific case, I have 11.3 x the mono thread speed (direct call) using a 6C/12T processor, that is very satisfying result... :-P

              2. Sure it is growing as we queue faster than we dequeue! You are implementing the same behaviour as Qt one by yourself... My concern is about the control of this Qt queuing process (in QFuture watcher or in QThreadPool, I don't know).

              For sure I have a problem in my code as the dummy sample (really dummy with infinite loops :-D) I wrote has no problem when implementing the waitForFinish in the main loop, but this is only part of the problem.

              1 Reply Last reply
              0
              • JohanSoloJ Offline
                JohanSoloJ Offline
                JohanSolo
                wrote on last edited by
                #7

                I might have missed something, but wouldn't QtConcurrent help? This would at least reduce the painful task of creating threads and setting QFuture...

                `They did not know it was impossible, so they did it.'
                -- Mark Twain

                N 1 Reply Last reply
                1
                • VRoninV Offline
                  VRoninV Offline
                  VRonin
                  wrote on last edited by VRonin
                  #8

                  ok, lets remove everything that has to do with multithreading.

                  INT_MAX is (usually) 2,147,483,647

                  you are creating 2bln QByteArray each of size 4kB = 8TB of RAM

                  No system can run your code

                  QtConcurrent::mapped is probably what you need if you can find a way not to load all the input data at once

                  "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
                  ~Napoleon Bonaparte

                  On a crusade to banish setIndexWidget() from the holy land of Qt

                  1 Reply Last reply
                  2
                  • JohanSoloJ JohanSolo

                    I might have missed something, but wouldn't QtConcurrent help? This would at least reduce the painful task of creating threads and setting QFuture...

                    N Offline
                    N Offline
                    Nmut
                    wrote on last edited by Nmut
                    #9

                    @JohanSolo
                    I only use QFutureWatcher to be blocked until all the thread are finished.
                    If you check my code, it is VERY simple, as a monothreading code but 2 declarations and 2 lines of code... I don't understand your comment.

                    @VRonin
                    Sorry for the confusion, this is not real code but some dummy one to highlight my problem (no guards for overflows in Qt)!!! :-D
                    Of course infinite loops are not very good programming. And I don't deliver buggy code with smileys! :-P

                    I'm doing some coding and answering to users by phone in parallel to this thread, may be some things are not clear.

                    JohanSoloJ 1 Reply Last reply
                    0
                    • VRoninV Offline
                      VRoninV Offline
                      VRonin
                      wrote on last edited by
                      #10

                      My point is: the argument you pass to the method run in parallel has to be stored in memory both before and after executing your code (as long as the QFuture is alive).

                      So if you do something similar to the above is only natural you are running out of memory

                      "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
                      ~Napoleon Bonaparte

                      On a crusade to banish setIndexWidget() from the holy land of Qt

                      N 1 Reply Last reply
                      0
                      • N Nmut

                        @JohanSolo
                        I only use QFutureWatcher to be blocked until all the thread are finished.
                        If you check my code, it is VERY simple, as a monothreading code but 2 declarations and 2 lines of code... I don't understand your comment.

                        @VRonin
                        Sorry for the confusion, this is not real code but some dummy one to highlight my problem (no guards for overflows in Qt)!!! :-D
                        Of course infinite loops are not very good programming. And I don't deliver buggy code with smileys! :-P

                        I'm doing some coding and answering to users by phone in parallel to this thread, may be some things are not clear.

                        JohanSoloJ Offline
                        JohanSoloJ Offline
                        JohanSolo
                        wrote on last edited by JohanSolo
                        #11

                        @Nmut said in Huge number of threads => out of memory?:

                        @JohanSolo
                        I only use QFutureWatcher to be blocked until all the thread are finished.
                        If you check my code, it is VERY simple, as a monothreading code but 2 declarations and 2 lines of code... I don't understand your comment.

                        I read a bit too fast your code actually, my bad!

                        Is was suggesting without saying something similar to @VRonin, QtConcurrent::mapped does not require you to set the QFuture manually, and still blocks until the processing is finished.

                        Edit: sometimes I should just shut up.

                        `They did not know it was impossible, so they did it.'
                        -- Mark Twain

                        1 Reply Last reply
                        0
                        • VRoninV VRonin

                          My point is: the argument you pass to the method run in parallel has to be stored in memory both before and after executing your code (as long as the QFuture is alive).

                          So if you do something similar to the above is only natural you are running out of memory

                          N Offline
                          N Offline
                          Nmut
                          wrote on last edited by Nmut
                          #12

                          @VRonin
                          Yes, then I hope that some Qt multi threading guru will help me to manage by myself the size of the "stacks".
                          What solutions I imagine:

                          • check the data stack size => wait if > 200MB for example
                          • check the number of pending threads => wait if > 50 for example

                          QFutureWatcher::waitForFinish method I usually use in my multithreading codes doesn't seam to work in this case... Maybe I found a bug with my "unaccurate" usage.

                          The goal is to have very simple code that works on all the client's different desktop as fast as possible.

                          @JohanSolo
                          I never used QtConcurrent::mapped in this context. In this case I have some input text data that I interpret, I can't easily do some "batches" for QtConcurrent::mapped. Maybe you highlight a usage I don't know?
                          Edit: Every answer is a new step to the truth, whatever the direction! :-)

                          1 Reply Last reply
                          0
                          • VRoninV Offline
                            VRoninV Offline
                            VRonin
                            wrote on last edited by
                            #13

                            How are you storing/acquiring the 5kB "messages" at the moment? I mean before doing any computation on them

                            "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
                            ~Napoleon Bonaparte

                            On a crusade to banish setIndexWidget() from the holy land of Qt

                            N 1 Reply Last reply
                            0
                            • VRoninV VRonin

                              How are you storing/acquiring the 5kB "messages" at the moment? I mean before doing any computation on them

                              N Offline
                              N Offline
                              Nmut
                              wrote on last edited by
                              #14

                              @VRonin
                              I'm reading the data from different files (28 days, 1 file a day) and doing some fast pre-computations (cleaning, formatting, validation) and then I send the data in a QByteArray to the decoding method (here are the treads) to save delta with other data source in a binary format.
                              The problem I have is to adapt the data throughput (messages read from files) to the decoding threads capacity (no storage needed in this phase) to have the better efficiency whatever the client PC from 2 cores to 16 cores/32 threads (no problem here as my 6C/12T does the job correctly, the SSD throughput is the limit!). Users are so impatient! :-D

                              1 Reply Last reply
                              0
                              • VRoninV Offline
                                VRoninV Offline
                                VRonin
                                wrote on last edited by VRonin
                                #15

                                Could you save the pre-computed file (in a QTemporaryFile?), store all of them in a container (QVector<QTemporaryFile*> ?) and use QtConcurrent::mapped on them?

                                mapped will use QThread::idealThreadCount threads simultaneously

                                "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
                                ~Napoleon Bonaparte

                                On a crusade to banish setIndexWidget() from the holy land of Qt

                                N 1 Reply Last reply
                                1
                                • VRoninV VRonin

                                  Could you save the pre-computed file (in a QTemporaryFile?), store all of them in a container (QVector<QTemporaryFile*> ?) and use QtConcurrent::mapped on them?

                                  mapped will use QThread::idealThreadCount threads simultaneously

                                  N Offline
                                  N Offline
                                  Nmut
                                  wrote on last edited by
                                  #16

                                  @VRonin
                                  This can be a solution. The size of temporary data is then not a problem. But the data size (drive space usage) and the time can be a problem (disk writes and reads)...
                                  I see another simple solution when I was reading my answer before replying to you. The best way to split the tasks is to have a thread for each day, each thread is pre-processing and decoding in one pass. This is far to be optimal (bad load balancing, hard drive concurrent access) but it is a very simple solution.

                                  I continue to dig in the problem, I feel my solution is really to use QFuturWatcher::waitForFinished(). I have to understand why it doesn't work as expected.

                                  N 1 Reply Last reply
                                  0
                                  • N Nmut

                                    @VRonin
                                    This can be a solution. The size of temporary data is then not a problem. But the data size (drive space usage) and the time can be a problem (disk writes and reads)...
                                    I see another simple solution when I was reading my answer before replying to you. The best way to split the tasks is to have a thread for each day, each thread is pre-processing and decoding in one pass. This is far to be optimal (bad load balancing, hard drive concurrent access) but it is a very simple solution.

                                    I continue to dig in the problem, I feel my solution is really to use QFuturWatcher::waitForFinished(). I have to understand why it doesn't work as expected.

                                    N Offline
                                    N Offline
                                    Nmut
                                    wrote on last edited by
                                    #17

                                    I gave up with this interesting problem, too much things to manage.
                                    There is a problem with QFutureWatcher::waitForFinished in some specific cases and some missing controls for QThreadPool and QFutureWatcher, but I can find a dirty workaround... I don't have enough time to work on Qt these days.
                                    Why projects are so time critical? :-D

                                    kshegunovK 1 Reply Last reply
                                    0
                                    • N Nmut

                                      I gave up with this interesting problem, too much things to manage.
                                      There is a problem with QFutureWatcher::waitForFinished in some specific cases and some missing controls for QThreadPool and QFutureWatcher, but I can find a dirty workaround... I don't have enough time to work on Qt these days.
                                      Why projects are so time critical? :-D

                                      kshegunovK Offline
                                      kshegunovK Offline
                                      kshegunov
                                      Moderators
                                      wrote on last edited by kshegunov
                                      #18

                                      If I understand your issue correctly, you'd only need one semaphore for blocking to ensure you don't overflow the QtConcurent run queue. It'd look something like this:

                                      static QSemaphore barrier(5000);
                                      
                                      barrier.acquire();
                                      QtConcurrent::run([&barrier] () -> void {
                                          QThread::sleep(1000); //< Simulate a long running operation
                                          barrier.release(); //< Free up a resource for the data producer
                                      });
                                      

                                      You may also want to look at the producer-consumer example that comes with Qt.

                                      Read and abide by the Qt Code of Conduct

                                      N 1 Reply Last reply
                                      2
                                      • kshegunovK kshegunov

                                        If I understand your issue correctly, you'd only need one semaphore for blocking to ensure you don't overflow the QtConcurent run queue. It'd look something like this:

                                        static QSemaphore barrier(5000);
                                        
                                        barrier.acquire();
                                        QtConcurrent::run([&barrier] () -> void {
                                            QThread::sleep(1000); //< Simulate a long running operation
                                            barrier.release(); //< Free up a resource for the data producer
                                        });
                                        

                                        You may also want to look at the producer-consumer example that comes with Qt.

                                        N Offline
                                        N Offline
                                        Nmut
                                        wrote on last edited by Nmut
                                        #19

                                        @kshegunov
                                        You are right, my problem is very similar to this producer-consumer example. The producer (files reading, main loop) must wait for the consumer (data decoding) on 2 or 4 threads processors and fast HDD but on more than 10 treads processors, the consumers (the threads) must wait for the producer.
                                        I will study these examples (semaphore and wait condition examples).
                                        Unfortunately, this will not explain why the example I posted here works as expected (waitForFinished that synchronizes the threads) but not my final code.

                                        Edit: I just tried with semaphore. It works as expected BUT this is REALLY not efficient. On 12 threads, it is about 2 times slower than my version with QFutureWatcher::waitForFinished(). I don't have other machines for the moment to test with but I suppose the efficiency is poor on all type of processors.

                                        kshegunovK 1 Reply Last reply
                                        0
                                        • N Nmut

                                          @kshegunov
                                          You are right, my problem is very similar to this producer-consumer example. The producer (files reading, main loop) must wait for the consumer (data decoding) on 2 or 4 threads processors and fast HDD but on more than 10 treads processors, the consumers (the threads) must wait for the producer.
                                          I will study these examples (semaphore and wait condition examples).
                                          Unfortunately, this will not explain why the example I posted here works as expected (waitForFinished that synchronizes the threads) but not my final code.

                                          Edit: I just tried with semaphore. It works as expected BUT this is REALLY not efficient. On 12 threads, it is about 2 times slower than my version with QFutureWatcher::waitForFinished(). I don't have other machines for the moment to test with but I suppose the efficiency is poor on all type of processors.

                                          kshegunovK Offline
                                          kshegunovK Offline
                                          kshegunov
                                          Moderators
                                          wrote on last edited by kshegunov
                                          #20

                                          @Nmut said in Huge number of threads => out of memory?:

                                          Unfortunately, this will not explain why the example I posted here works as expected (waitForFinished that synchronizes the threads) but not my final code.

                                          waitForFinished is not synchronizing any threads, and you have an error in the example posted above. You're resetting the future to the watcher at each iteration and at some point you're stopping the loop to wait for the last operation (not all of them).

                                          I just tried with semaphore. It works as expected BUT this is REALLY not efficient.

                                          You'll have to show benchmark results with the test code to claim that. Designing benchmark code isn't exactly trivial.

                                          On 12 threads, it is about 2 times slower than my version with QFutureWatcher::waitForFinished(). I don't have other machines for the moment to test with but I suppose the efficiency is poor on all type of processors.

                                          Firstly, QtConcurrent::run doesn't run the job immediately, it has a thread pool and it puts the job in a queue and whenever there's a free thread from that pool then it executes it. Calling QtConcurrent::run multiple times does not create new threads.
                                          Secondly, on the i5 you have 4 cores, meaning that you can have 4 threads executing in parallel everything else is scheduled. One core can execute a single thread at any one time, the "illusion" of threading on single core is created by the OS's scheduler, which allocates time slots and puts to sleep one thread to switch execution to another (so called context switching). Having more threads than the number of cores will not give you any efficiency, on the contrary, it will even reduce the speed.

                                          Read and abide by the Qt Code of Conduct

                                          N 1 Reply Last reply
                                          5

                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • Users
                                          • Groups
                                          • Search
                                          • Get Qt Extensions
                                          • Unsolved