speed of different loop implementations



  • Hello,

    i need to perform a lot of matrix * vector multiplications and want to find out what is the best way to store the data.
    the vector / matrix are eigen3 objects.

    For testing i have implemented some useless loops and i get different timings for each of them

    big eigen3 matrix where i pick rows:
    (at work i had to do a lot of matlab the last years so i expected this to be the fastest)

    void calcMatrix(Matrix4 M, Matrix1000 raysM)
    {
        Vector4 ray;
        #pragma omp parallel for
        for(int j=0;j<10000;j++)
        {
            for(int i = 0;i<1000;i++)
            {
                ray = raysM.row(i);
                ray = M*ray;
            }
        }
    }
    

    using a QList with eigen3 vectors:

    void calcList(Matrix4 M, QList<Vector4> *raysL)
    {
        Vector4 ray;
        #pragma omp parallel for
        for(int i=0;i<1000;i++)
        {
            for(int j = 0;j<10000;j++)
            {
                ray = raysL->at(i);
                ray = M*ray;
            }
        }
    }
    

    an QList of Objects that contain the eigen3 vectors:

    void calcListClass(Matrix4 M, QList<RayClass> *raysC)
    {
        Vector4 ray;
        #pragma omp parallel for
        for(int i=0;i<1000;i++)
        {
            for(int j = 0;j<10000;j++)
            {
                ray = raysC->at(i).pos;
                ray = M*ray;
            }
        }
    }
    

    an QList of Objects that contain the eigen3 vectors and have a method (trace) to compute the useless loop:

    void calcListClassMethod(Matrix4 M, QList<RayClass> *raysC)
    {
        RayClass ray;
        #pragma omp parallel for
        for(int i=0;i<1000;i++)
        {
            ray = raysC->at(i);
            ray.trace(M);
        }
    }
    

    when i measure the time each computation takes with QElapseTimer( ) i get following results:

    Eigen3: 7120 milliseconds
    QList: 5458 milliseconds
    RayClass: 5425 milliseconds
    RayClassWithMethod: 5088 milliseconds

    it seems that the Onb.method( ) one is the fastest.
    But i want to understand why. And is there maybe an even faster version that is possible??

    Thanks in advance


  • Lifetime Qt Champion

    Hi,

    You should rather use a QVector if you want to go the Qt way. It should perform better than QList.



  • @SGaist :Thanks for the quick answer.
    I tested QVector as well as std::vector for the container, and get more or less the same result as for the QList in all cases:
    QVector seems to be slightly faster however the difference is less than 1%

    Eigen3 4x1000__________ 61874 milliseconds
    Eigen2 4x1 QList_________49248 milliseconds
    RayClass QList__________49127 milliseconds
    RayClass QVector________49536 milliseconds
    RayClassMethode QList____ 47555 milliseconds
    RayClassMethode QVector__ 47347 milliseconds
    RayClassMethode std::vector_ 47126 milliseconds

    i think i will implemet the real algorithm and test it again with the different


  • Qt Champions 2016

    Instead of doing matrix-vector multiplications in a loop do a single matrix-matrix multiplication and drop the OpenMP stuff. Eigen (if that's the library you're using) already features threading internally and makes use of the extensions your processor supports. Put your vectors as columns in a rectangular matrix (4x1000) and do the multiplication with the 4x4 matrix from the left. The resulting (multiplied) vectors will be the columns of the produced (4x1000) rectangular matrix. Basically:

    void calcMatrix(const Matrix<qreal, 4, 4> & M, Matrix<qreal, 4, 1000> & rays)
    {
        rays = M * rays;
    }
    


  • @kshegunov Thanks. That is really a lot faster.
    However i'm getting in trouble for large matrices (4x10000).

    I get following error:

    /usr/include/eigen3/Eigen/src/Core/DenseStorage.h:33: error: 'OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG' is not a member of 'Eigen::internal::static_assertion<false>' EIGEN_STATIC_ASSERT(Size * sizeof(T) <= EIGEN_STACK_ALLOCATION_LIMIT, OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG);

    The matrices i created should not be on the stack, so i think eigen allocates some memory on the stack internally? Can this be changes?


  • Qt Champions 2016

    @gde23 said in speed of different loop implementations:

    OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG

    Google tells me you can do
    #define EIGEN_STACK_ALLOCATION_LIMIT 1000000
    before including Eigen/Core
    To alter the limit.
    If that is enough, I cant tell :)



  • #define EIGEN_STACK_ALLOCATION_LIMIT 0 removes the limit completely not sure it this will just cause stack-overflow anyway as that is a flag designed to check for this kind of problems at compile time instead of runtime



  • @mrjj Thanks, that solved the problem


  • Qt Champions 2016

    Don't mess with the stack! Instead make your (big) matrix, the one holding the vectors, dynamically sized (i.e. allocated on the heap). Use:

    Matrix<qreal, 4, Dynamic>
    

    instead of a fixed number for the columns number. And don't forget to initialize it before using. Follow the documentation for more details.

    Kind regards.



  • @kshegunov Can I upvote you 10 times?


  • Qt Champions 2016

    Yes. I allow it. :]


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.