Qt Forum

    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Unsolved

    Update: Forum Guidelines & Code of Conduct


    Qt World Summit: Early-Bird Tickets

    Solved speed of different loop implementations

    General and Desktop
    5
    11
    2491
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • gde23
      gde23 last edited by

      Hello,

      i need to perform a lot of matrix * vector multiplications and want to find out what is the best way to store the data.
      the vector / matrix are eigen3 objects.

      For testing i have implemented some useless loops and i get different timings for each of them

      big eigen3 matrix where i pick rows:
      (at work i had to do a lot of matlab the last years so i expected this to be the fastest)

      void calcMatrix(Matrix4 M, Matrix1000 raysM)
      {
          Vector4 ray;
          #pragma omp parallel for
          for(int j=0;j<10000;j++)
          {
              for(int i = 0;i<1000;i++)
              {
                  ray = raysM.row(i);
                  ray = M*ray;
              }
          }
      }
      

      using a QList with eigen3 vectors:

      void calcList(Matrix4 M, QList<Vector4> *raysL)
      {
          Vector4 ray;
          #pragma omp parallel for
          for(int i=0;i<1000;i++)
          {
              for(int j = 0;j<10000;j++)
              {
                  ray = raysL->at(i);
                  ray = M*ray;
              }
          }
      }
      

      an QList of Objects that contain the eigen3 vectors:

      void calcListClass(Matrix4 M, QList<RayClass> *raysC)
      {
          Vector4 ray;
          #pragma omp parallel for
          for(int i=0;i<1000;i++)
          {
              for(int j = 0;j<10000;j++)
              {
                  ray = raysC->at(i).pos;
                  ray = M*ray;
              }
          }
      }
      

      an QList of Objects that contain the eigen3 vectors and have a method (trace) to compute the useless loop:

      void calcListClassMethod(Matrix4 M, QList<RayClass> *raysC)
      {
          RayClass ray;
          #pragma omp parallel for
          for(int i=0;i<1000;i++)
          {
              ray = raysC->at(i);
              ray.trace(M);
          }
      }
      

      when i measure the time each computation takes with QElapseTimer( ) i get following results:

      Eigen3: 7120 milliseconds
      QList: 5458 milliseconds
      RayClass: 5425 milliseconds
      RayClassWithMethod: 5088 milliseconds

      it seems that the Onb.method( ) one is the fastest.
      But i want to understand why. And is there maybe an even faster version that is possible??

      Thanks in advance

      1 Reply Last reply Reply Quote 0
      • SGaist
        SGaist Lifetime Qt Champion last edited by

        Hi,

        You should rather use a QVector if you want to go the Qt way. It should perform better than QList.

        Interested in AI ? www.idiap.ch
        Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

        1 Reply Last reply Reply Quote 1
        • gde23
          gde23 last edited by

          @SGaist :Thanks for the quick answer.
          I tested QVector as well as std::vector for the container, and get more or less the same result as for the QList in all cases:
          QVector seems to be slightly faster however the difference is less than 1%

          Eigen3 4x1000__________ 61874 milliseconds
          Eigen2 4x1 QList_________49248 milliseconds
          RayClass QList__________49127 milliseconds
          RayClass QVector________49536 milliseconds
          RayClassMethode QList____ 47555 milliseconds
          RayClassMethode QVector__ 47347 milliseconds
          RayClassMethode std::vector_ 47126 milliseconds

          i think i will implemet the real algorithm and test it again with the different

          kshegunov 1 Reply Last reply Reply Quote 0
          • kshegunov
            kshegunov Moderators @gde23 last edited by kshegunov

            Instead of doing matrix-vector multiplications in a loop do a single matrix-matrix multiplication and drop the OpenMP stuff. Eigen (if that's the library you're using) already features threading internally and makes use of the extensions your processor supports. Put your vectors as columns in a rectangular matrix (4x1000) and do the multiplication with the 4x4 matrix from the left. The resulting (multiplied) vectors will be the columns of the produced (4x1000) rectangular matrix. Basically:

            void calcMatrix(const Matrix<qreal, 4, 4> & M, Matrix<qreal, 4, 1000> & rays)
            {
                rays = M * rays;
            }
            

            Read and abide by the Qt Code of Conduct

            1 Reply Last reply Reply Quote 4
            • gde23
              gde23 last edited by

              @kshegunov Thanks. That is really a lot faster.
              However i'm getting in trouble for large matrices (4x10000).

              I get following error:

              /usr/include/eigen3/Eigen/src/Core/DenseStorage.h:33: error: 'OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG' is not a member of 'Eigen::internal::static_assertion<false>' EIGEN_STATIC_ASSERT(Size * sizeof(T) <= EIGEN_STACK_ALLOCATION_LIMIT, OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG);

              The matrices i created should not be on the stack, so i think eigen allocates some memory on the stack internally? Can this be changes?

              mrjj 1 Reply Last reply Reply Quote 0
              • mrjj
                mrjj Lifetime Qt Champion @gde23 last edited by

                @gde23 said in speed of different loop implementations:

                OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG

                Google tells me you can do
                #define EIGEN_STACK_ALLOCATION_LIMIT 1000000
                before including Eigen/Core
                To alter the limit.
                If that is enough, I cant tell :)

                1 Reply Last reply Reply Quote 2
                • VRonin
                  VRonin last edited by

                  #define EIGEN_STACK_ALLOCATION_LIMIT 0 removes the limit completely not sure it this will just cause stack-overflow anyway as that is a flag designed to check for this kind of problems at compile time instead of runtime

                  "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
                  ~Napoleon Bonaparte

                  On a crusade to banish setIndexWidget() from the holy land of Qt

                  1 Reply Last reply Reply Quote 1
                  • gde23
                    gde23 last edited by

                    @mrjj Thanks, that solved the problem

                    kshegunov 1 Reply Last reply Reply Quote 1
                    • kshegunov
                      kshegunov Moderators @gde23 last edited by

                      Don't mess with the stack! Instead make your (big) matrix, the one holding the vectors, dynamically sized (i.e. allocated on the heap). Use:

                      Matrix<qreal, 4, Dynamic>
                      

                      instead of a fixed number for the columns number. And don't forget to initialize it before using. Follow the documentation for more details.

                      Kind regards.

                      Read and abide by the Qt Code of Conduct

                      VRonin 1 Reply Last reply Reply Quote 6
                      • VRonin
                        VRonin @kshegunov last edited by

                        @kshegunov Can I upvote you 10 times?

                        "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
                        ~Napoleon Bonaparte

                        On a crusade to banish setIndexWidget() from the holy land of Qt

                        kshegunov 1 Reply Last reply Reply Quote 2
                        • kshegunov
                          kshegunov Moderators @VRonin last edited by

                          Yes. I allow it. :]

                          Read and abide by the Qt Code of Conduct

                          1 Reply Last reply Reply Quote 2
                          • First post
                            Last post