Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. speed of different loop implementations

speed of different loop implementations

Scheduled Pinned Locked Moved Solved General and Desktop
11 Posts 5 Posters 3.1k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G Offline
    G Offline
    gde23
    wrote on 22 Nov 2016, 20:31 last edited by
    #1

    Hello,

    i need to perform a lot of matrix * vector multiplications and want to find out what is the best way to store the data.
    the vector / matrix are eigen3 objects.

    For testing i have implemented some useless loops and i get different timings for each of them

    big eigen3 matrix where i pick rows:
    (at work i had to do a lot of matlab the last years so i expected this to be the fastest)

    void calcMatrix(Matrix4 M, Matrix1000 raysM)
    {
        Vector4 ray;
        #pragma omp parallel for
        for(int j=0;j<10000;j++)
        {
            for(int i = 0;i<1000;i++)
            {
                ray = raysM.row(i);
                ray = M*ray;
            }
        }
    }
    

    using a QList with eigen3 vectors:

    void calcList(Matrix4 M, QList<Vector4> *raysL)
    {
        Vector4 ray;
        #pragma omp parallel for
        for(int i=0;i<1000;i++)
        {
            for(int j = 0;j<10000;j++)
            {
                ray = raysL->at(i);
                ray = M*ray;
            }
        }
    }
    

    an QList of Objects that contain the eigen3 vectors:

    void calcListClass(Matrix4 M, QList<RayClass> *raysC)
    {
        Vector4 ray;
        #pragma omp parallel for
        for(int i=0;i<1000;i++)
        {
            for(int j = 0;j<10000;j++)
            {
                ray = raysC->at(i).pos;
                ray = M*ray;
            }
        }
    }
    

    an QList of Objects that contain the eigen3 vectors and have a method (trace) to compute the useless loop:

    void calcListClassMethod(Matrix4 M, QList<RayClass> *raysC)
    {
        RayClass ray;
        #pragma omp parallel for
        for(int i=0;i<1000;i++)
        {
            ray = raysC->at(i);
            ray.trace(M);
        }
    }
    

    when i measure the time each computation takes with QElapseTimer( ) i get following results:

    Eigen3: 7120 milliseconds
    QList: 5458 milliseconds
    RayClass: 5425 milliseconds
    RayClassWithMethod: 5088 milliseconds

    it seems that the Onb.method( ) one is the fastest.
    But i want to understand why. And is there maybe an even faster version that is possible??

    Thanks in advance

    1 Reply Last reply
    0
    • S Offline
      S Offline
      SGaist
      Lifetime Qt Champion
      wrote on 22 Nov 2016, 21:37 last edited by
      #2

      Hi,

      You should rather use a QVector if you want to go the Qt way. It should perform better than QList.

      Interested in AI ? www.idiap.ch
      Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

      1 Reply Last reply
      1
      • G Offline
        G Offline
        gde23
        wrote on 22 Nov 2016, 23:02 last edited by
        #3

        @SGaist :Thanks for the quick answer.
        I tested QVector as well as std::vector for the container, and get more or less the same result as for the QList in all cases:
        QVector seems to be slightly faster however the difference is less than 1%

        Eigen3 4x1000__________ 61874 milliseconds
        Eigen2 4x1 QList_________49248 milliseconds
        RayClass QList__________49127 milliseconds
        RayClass QVector________49536 milliseconds
        RayClassMethode QList____ 47555 milliseconds
        RayClassMethode QVector__ 47347 milliseconds
        RayClassMethode std::vector_ 47126 milliseconds

        i think i will implemet the real algorithm and test it again with the different

        K 1 Reply Last reply 23 Nov 2016, 01:04
        0
        • G gde23
          22 Nov 2016, 23:02

          @SGaist :Thanks for the quick answer.
          I tested QVector as well as std::vector for the container, and get more or less the same result as for the QList in all cases:
          QVector seems to be slightly faster however the difference is less than 1%

          Eigen3 4x1000__________ 61874 milliseconds
          Eigen2 4x1 QList_________49248 milliseconds
          RayClass QList__________49127 milliseconds
          RayClass QVector________49536 milliseconds
          RayClassMethode QList____ 47555 milliseconds
          RayClassMethode QVector__ 47347 milliseconds
          RayClassMethode std::vector_ 47126 milliseconds

          i think i will implemet the real algorithm and test it again with the different

          K Offline
          K Offline
          kshegunov
          Moderators
          wrote on 23 Nov 2016, 01:04 last edited by kshegunov
          #4

          Instead of doing matrix-vector multiplications in a loop do a single matrix-matrix multiplication and drop the OpenMP stuff. Eigen (if that's the library you're using) already features threading internally and makes use of the extensions your processor supports. Put your vectors as columns in a rectangular matrix (4x1000) and do the multiplication with the 4x4 matrix from the left. The resulting (multiplied) vectors will be the columns of the produced (4x1000) rectangular matrix. Basically:

          void calcMatrix(const Matrix<qreal, 4, 4> & M, Matrix<qreal, 4, 1000> & rays)
          {
              rays = M * rays;
          }
          

          Read and abide by the Qt Code of Conduct

          1 Reply Last reply
          4
          • G Offline
            G Offline
            gde23
            wrote on 23 Nov 2016, 14:39 last edited by
            #5

            @kshegunov Thanks. That is really a lot faster.
            However i'm getting in trouble for large matrices (4x10000).

            I get following error:

            /usr/include/eigen3/Eigen/src/Core/DenseStorage.h:33: error: 'OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG' is not a member of 'Eigen::internal::static_assertion<false>' EIGEN_STATIC_ASSERT(Size * sizeof(T) <= EIGEN_STACK_ALLOCATION_LIMIT, OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG);

            The matrices i created should not be on the stack, so i think eigen allocates some memory on the stack internally? Can this be changes?

            mrjjM 1 Reply Last reply 23 Nov 2016, 14:53
            0
            • G gde23
              23 Nov 2016, 14:39

              @kshegunov Thanks. That is really a lot faster.
              However i'm getting in trouble for large matrices (4x10000).

              I get following error:

              /usr/include/eigen3/Eigen/src/Core/DenseStorage.h:33: error: 'OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG' is not a member of 'Eigen::internal::static_assertion<false>' EIGEN_STATIC_ASSERT(Size * sizeof(T) <= EIGEN_STACK_ALLOCATION_LIMIT, OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG);

              The matrices i created should not be on the stack, so i think eigen allocates some memory on the stack internally? Can this be changes?

              mrjjM Offline
              mrjjM Offline
              mrjj
              Lifetime Qt Champion
              wrote on 23 Nov 2016, 14:53 last edited by
              #6

              @gde23 said in speed of different loop implementations:

              OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG

              Google tells me you can do
              #define EIGEN_STACK_ALLOCATION_LIMIT 1000000
              before including Eigen/Core
              To alter the limit.
              If that is enough, I cant tell :)

              1 Reply Last reply
              2
              • VRoninV Offline
                VRoninV Offline
                VRonin
                wrote on 23 Nov 2016, 15:12 last edited by
                #7

                #define EIGEN_STACK_ALLOCATION_LIMIT 0 removes the limit completely not sure it this will just cause stack-overflow anyway as that is a flag designed to check for this kind of problems at compile time instead of runtime

                "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
                ~Napoleon Bonaparte

                On a crusade to banish setIndexWidget() from the holy land of Qt

                1 Reply Last reply
                1
                • G Offline
                  G Offline
                  gde23
                  wrote on 23 Nov 2016, 15:15 last edited by
                  #8

                  @mrjj Thanks, that solved the problem

                  K 1 Reply Last reply 23 Nov 2016, 15:39
                  1
                  • G gde23
                    23 Nov 2016, 15:15

                    @mrjj Thanks, that solved the problem

                    K Offline
                    K Offline
                    kshegunov
                    Moderators
                    wrote on 23 Nov 2016, 15:39 last edited by
                    #9

                    Don't mess with the stack! Instead make your (big) matrix, the one holding the vectors, dynamically sized (i.e. allocated on the heap). Use:

                    Matrix<qreal, 4, Dynamic>
                    

                    instead of a fixed number for the columns number. And don't forget to initialize it before using. Follow the documentation for more details.

                    Kind regards.

                    Read and abide by the Qt Code of Conduct

                    VRoninV 1 Reply Last reply 23 Nov 2016, 15:56
                    6
                    • K kshegunov
                      23 Nov 2016, 15:39

                      Don't mess with the stack! Instead make your (big) matrix, the one holding the vectors, dynamically sized (i.e. allocated on the heap). Use:

                      Matrix<qreal, 4, Dynamic>
                      

                      instead of a fixed number for the columns number. And don't forget to initialize it before using. Follow the documentation for more details.

                      Kind regards.

                      VRoninV Offline
                      VRoninV Offline
                      VRonin
                      wrote on 23 Nov 2016, 15:56 last edited by
                      #10

                      @kshegunov Can I upvote you 10 times?

                      "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
                      ~Napoleon Bonaparte

                      On a crusade to banish setIndexWidget() from the holy land of Qt

                      K 1 Reply Last reply 23 Nov 2016, 20:59
                      2
                      • VRoninV VRonin
                        23 Nov 2016, 15:56

                        @kshegunov Can I upvote you 10 times?

                        K Offline
                        K Offline
                        kshegunov
                        Moderators
                        wrote on 23 Nov 2016, 20:59 last edited by
                        #11

                        Yes. I allow it. :]

                        Read and abide by the Qt Code of Conduct

                        1 Reply Last reply
                        2

                        1/11

                        22 Nov 2016, 20:31

                        • Login

                        • Login or register to search.
                        1 out of 11
                        • First post
                          1/11
                          Last post
                        0
                        • Categories
                        • Recent
                        • Tags
                        • Popular
                        • Users
                        • Groups
                        • Search
                        • Get Qt Extensions
                        • Unsolved