Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Special Interest Groups
  3. C++ Gurus
  4. casts, if's and array-access.
Forum Updated to NodeBB v4.3 + New Features

casts, if's and array-access.

Scheduled Pinned Locked Moved Unsolved C++ Gurus
16 Posts 5 Posters 4.0k Views 4 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J.HilkJ J.Hilk

    @jsulm it returns a default value, 0 is this case.

    @sierdzio I'll look into SIMD instructions, sounds at the very least, interesting;-)

    jsulmJ Offline
    jsulmJ Offline
    jsulm
    Lifetime Qt Champion
    wrote on last edited by
    #7

    @J.Hilk I don't think SIMD will improve anything in this case. SIMD helps if you have many iterations but in this case it is only one. It will probably take longer to load the data into appropriate SIMD registers then do those 3 if's.

    https://forum.qt.io/topic/113070/qt-code-of-conduct

    1 Reply Last reply
    3
    • J.HilkJ J.Hilk

      @jsulm it returns a default value, 0 is this case.

      @sierdzio I'll look into SIMD instructions, sounds at the very least, interesting;-)

      kshegunovK Offline
      kshegunovK Offline
      kshegunov
      Moderators
      wrote on last edited by kshegunov
      #8

      You could in principle go with one if, but as @sierdzio said: benchmark your code first and make sure this is a bottle neck. Here's one suggestion how you could "simplify" the checking:

      T m_data[15][15][255]
      static const T * const end = &m_data[14][14][254];
      
      // Then in the function that makes the check:
      T * item = &m_data[x][y][z]; //< You may need to modify this, so there's no dereferencing, but you get the point
      return item > end ? defaultValue : *item;
      

      Read and abide by the Qt Code of Conduct

      1 Reply Last reply
      3
      • J.HilkJ Offline
        J.HilkJ Offline
        J.Hilk
        Moderators
        wrote on last edited by
        #9

        Thanks to everyone for the contribution.

        as suggested I did some testing, with the 3 methodes discuss in this thread.

        I'll post my code, and my result for further discussions:
        Its my first test in this direction so I'm not sure if I've done everything like I should have.

        I forced the random numbers to be valid, to guarantee the max time for each validation check in each method. I also fell back to Qt for this...

        //storage  Class 
        //-----------------------------------------headers-------------------------
        #ifndef DATASTORAGE_H
        #define DATASTORAGE_H
        
        #include <QObject>
        
        class dataStorage : public QObject
        {
            Q_OBJECT
        public:
            explicit dataStorage(QObject *parent = nullptr);
        
            uint64_t &getvalueMethod_1(const uint &x, const uint &y, const uint &z);
            uint64_t &getvalueMethod_2(const uint &x, const uint &y, const uint &z);
            uint64_t &getvalueMethod_3(const uint &x, const uint &y, const uint &z);
        signals:
        
        public slots:
        
        protected:
            uint64_t m_data[16][16][256];
            const uint64_t *m_last;
        };
        
        #endif // DATASTORAGE_H
        
        //------------------------------Sources------------------------
        
        #include "datastorage.h"
        
        dataStorage::dataStorage(QObject *parent) : QObject(parent)
        {
            for(int x(0); x < 16;x++)
                for(int y(0); y < 16; y++)
                    for(int z(0); z<256;z++)
                        m_data[x][y][z] = qrand();
            m_last = &m_data[15][15][255];
        }
        
        uint64_t &dataStorage::getvalueMethod_1(const uint &x, const uint &y, const uint &z)
        {
            if(x < 16)
                if(y <16)
                    if(z < 256)
                        return m_data[x][y][z];
            return m_data[0][0][0];
        }
        
        uint64_t &dataStorage::getvalueMethod_2(const uint &x, const uint &y, const uint &z)
        {
            if(x <16 && y < 16 && z < 256)
                return m_data[x][y][z];
            return m_data[0][0][0];
        }
        
        uint64_t &dataStorage::getvalueMethod_3(const uint &x, const uint &y, const uint &z)
        {
            uint64_t *item = &m_data[x][y][z];
            return item > m_last ? m_data[0][0][0] : *item;
        }
        
        // -------------------------------------Main---------------------------------
        #include "datastorage.h"
        
        #include <QDebug>
        #include <QVector>
        #include <QElapsedTimer>
        
        uint getRand(int max, int min){
            return rand()%(max-min+1)+min;
        }
        
        int main(int argc, char *argv[])
        {
            QCoreApplication a(argc, argv);
        
            dataStorage dStorage;
        
            QVector<QVector<uint> > vindex;
        
            for(uint32_t i = 0; i < 10000000; i++ )
                vindex.append(QVector<uint>({getRand(15,0),getRand(15,0),getRand(255,0)}));
        
            QElapsedTimer eTime;
        
            uint64_t value;
            qint64 average1(0),average2(0),average3(0);
        
            for(int k(0); k <20; k++){
                qDebug() << "Test-cycle:" << k+1;
                eTime.start();
                for(int j(0); j <10; j++)
                    for(uint32_t i = 0; i < 10000000;i++){
                        value = dStorage.getvalueMethod_1(vindex.at(i).at(0),vindex.at(i).at(1), vindex.at(i).at(2));
                    }
                qDebug() << "Time for Method 1: "<< eTime.elapsed();
                average1 += eTime.elapsed();
        
                eTime.start();
                for(int j(0); j <10; j++)
                    for(uint32_t i = 0; i < 10000000;i++){
                        value = dStorage.getvalueMethod_2(vindex.at(i).at(0),vindex.at(i).at(1), vindex.at(i).at(2));
                    }
                qDebug() << "Time for Method 2: "<< eTime.elapsed();
                average2 += eTime.elapsed();
        
                eTime.start();
                for(int j(0); j <10; j++)
                    for(uint32_t i = 0; i < 10000000;i++){
                        value = dStorage.getvalueMethod_3(vindex.at(i).at(0),vindex.at(i).at(1), vindex.at(i).at(2));
                    }
                qDebug() << "Time for Method 3: "<< eTime.elapsed();
                average3 += eTime.elapsed();
            }
        
            average1 = average1/20; average2 = average2/20; average3 = average3/20;
            qDebug() << "Average Time for Method 1:" << average1;
            qDebug() << "Average Time for Method 2:" << average2;
            qDebug() << "Average Time for Method 3:" << average3;
        
            return a.exec();
        }
        
        

        the result was:

        • Average Time for Method 1: 11757ms
        • Average Time for Method 2: 11587ms
        • Average Time for Method 3: 12012ms

        My conclusion of this would be:
        Overall the 3 methods are equally fast!
        With method 3, the pointer check, being slightly slower, and the inline if-check beeing the fastest


        Be aware of the Qt Code of Conduct, when posting : https://forum.qt.io/topic/113070/qt-code-of-conduct


        Q: What's that?
        A: It's blue light.
        Q: What does it do?
        A: It turns blue.

        kshegunovK 1 Reply Last reply
        1
        • J.HilkJ J.Hilk

          Thanks to everyone for the contribution.

          as suggested I did some testing, with the 3 methodes discuss in this thread.

          I'll post my code, and my result for further discussions:
          Its my first test in this direction so I'm not sure if I've done everything like I should have.

          I forced the random numbers to be valid, to guarantee the max time for each validation check in each method. I also fell back to Qt for this...

          //storage  Class 
          //-----------------------------------------headers-------------------------
          #ifndef DATASTORAGE_H
          #define DATASTORAGE_H
          
          #include <QObject>
          
          class dataStorage : public QObject
          {
              Q_OBJECT
          public:
              explicit dataStorage(QObject *parent = nullptr);
          
              uint64_t &getvalueMethod_1(const uint &x, const uint &y, const uint &z);
              uint64_t &getvalueMethod_2(const uint &x, const uint &y, const uint &z);
              uint64_t &getvalueMethod_3(const uint &x, const uint &y, const uint &z);
          signals:
          
          public slots:
          
          protected:
              uint64_t m_data[16][16][256];
              const uint64_t *m_last;
          };
          
          #endif // DATASTORAGE_H
          
          //------------------------------Sources------------------------
          
          #include "datastorage.h"
          
          dataStorage::dataStorage(QObject *parent) : QObject(parent)
          {
              for(int x(0); x < 16;x++)
                  for(int y(0); y < 16; y++)
                      for(int z(0); z<256;z++)
                          m_data[x][y][z] = qrand();
              m_last = &m_data[15][15][255];
          }
          
          uint64_t &dataStorage::getvalueMethod_1(const uint &x, const uint &y, const uint &z)
          {
              if(x < 16)
                  if(y <16)
                      if(z < 256)
                          return m_data[x][y][z];
              return m_data[0][0][0];
          }
          
          uint64_t &dataStorage::getvalueMethod_2(const uint &x, const uint &y, const uint &z)
          {
              if(x <16 && y < 16 && z < 256)
                  return m_data[x][y][z];
              return m_data[0][0][0];
          }
          
          uint64_t &dataStorage::getvalueMethod_3(const uint &x, const uint &y, const uint &z)
          {
              uint64_t *item = &m_data[x][y][z];
              return item > m_last ? m_data[0][0][0] : *item;
          }
          
          // -------------------------------------Main---------------------------------
          #include "datastorage.h"
          
          #include <QDebug>
          #include <QVector>
          #include <QElapsedTimer>
          
          uint getRand(int max, int min){
              return rand()%(max-min+1)+min;
          }
          
          int main(int argc, char *argv[])
          {
              QCoreApplication a(argc, argv);
          
              dataStorage dStorage;
          
              QVector<QVector<uint> > vindex;
          
              for(uint32_t i = 0; i < 10000000; i++ )
                  vindex.append(QVector<uint>({getRand(15,0),getRand(15,0),getRand(255,0)}));
          
              QElapsedTimer eTime;
          
              uint64_t value;
              qint64 average1(0),average2(0),average3(0);
          
              for(int k(0); k <20; k++){
                  qDebug() << "Test-cycle:" << k+1;
                  eTime.start();
                  for(int j(0); j <10; j++)
                      for(uint32_t i = 0; i < 10000000;i++){
                          value = dStorage.getvalueMethod_1(vindex.at(i).at(0),vindex.at(i).at(1), vindex.at(i).at(2));
                      }
                  qDebug() << "Time for Method 1: "<< eTime.elapsed();
                  average1 += eTime.elapsed();
          
                  eTime.start();
                  for(int j(0); j <10; j++)
                      for(uint32_t i = 0; i < 10000000;i++){
                          value = dStorage.getvalueMethod_2(vindex.at(i).at(0),vindex.at(i).at(1), vindex.at(i).at(2));
                      }
                  qDebug() << "Time for Method 2: "<< eTime.elapsed();
                  average2 += eTime.elapsed();
          
                  eTime.start();
                  for(int j(0); j <10; j++)
                      for(uint32_t i = 0; i < 10000000;i++){
                          value = dStorage.getvalueMethod_3(vindex.at(i).at(0),vindex.at(i).at(1), vindex.at(i).at(2));
                      }
                  qDebug() << "Time for Method 3: "<< eTime.elapsed();
                  average3 += eTime.elapsed();
              }
          
              average1 = average1/20; average2 = average2/20; average3 = average3/20;
              qDebug() << "Average Time for Method 1:" << average1;
              qDebug() << "Average Time for Method 2:" << average2;
              qDebug() << "Average Time for Method 3:" << average3;
          
              return a.exec();
          }
          
          

          the result was:

          • Average Time for Method 1: 11757ms
          • Average Time for Method 2: 11587ms
          • Average Time for Method 3: 12012ms

          My conclusion of this would be:
          Overall the 3 methods are equally fast!
          With method 3, the pointer check, being slightly slower, and the inline if-check beeing the fastest

          kshegunovK Offline
          kshegunovK Offline
          kshegunov
          Moderators
          wrote on last edited by kshegunov
          #10

          Can you pull the assembly for those three.
          Additional notes:

          1. You will want to make them methods inline, otherwise I feel the call instruction cost is pretty significant compared to the function's body (and also the QVector::at where the arguments are taken from the precomputed array).
            Edit: Have also a brief look at this table

          2. I couldn't see getRand defined, but if it does what I think it does, then you have a systematic error - you're accessing only real elements of the array (not out-of-bound ones). Which may put test 3 at a disadvantage (or it may not, I'm not sure either way).

          3. There's no difference and can never be any between:

          if(x < 16)
                  if(y <16)
                      if(z < 256)
                          return m_data[x][y][z];
          
          if(x <16 && y < 16 && z < 256)
          

          which is due to the short-circuit evaluation of && and ||. That's also the most probable reason you get virtually identical results for test 1 and 2.

          Read and abide by the Qt Code of Conduct

          J.HilkJ 1 Reply Last reply
          1
          • kshegunovK kshegunov

            Can you pull the assembly for those three.
            Additional notes:

            1. You will want to make them methods inline, otherwise I feel the call instruction cost is pretty significant compared to the function's body (and also the QVector::at where the arguments are taken from the precomputed array).
              Edit: Have also a brief look at this table

            2. I couldn't see getRand defined, but if it does what I think it does, then you have a systematic error - you're accessing only real elements of the array (not out-of-bound ones). Which may put test 3 at a disadvantage (or it may not, I'm not sure either way).

            3. There's no difference and can never be any between:

            if(x < 16)
                    if(y <16)
                        if(z < 256)
                            return m_data[x][y][z];
            
            if(x <16 && y < 16 && z < 256)
            

            which is due to the short-circuit evaluation of && and ||. That's also the most probable reason you get virtually identical results for test 1 and 2.

            J.HilkJ Offline
            J.HilkJ Offline
            J.Hilk
            Moderators
            wrote on last edited by
            #11

            hi, @kshegunov

            like you suggested, I wrapped the getvalueMethod_X functions in an inline statement.

            getRand is defined right above main:

            uint getRand(int max, int min){
                return rand()%(max-min+1)+min;
            }
            

            But you're right it would be better to also have values that are outside the intended scope:
            so I made the following changes:

            for(uint32_t i = 0; i < 3333333; i++ )
                    vindex.append(QVector<uint>({getRand(15,0),getRand(15,0),getRand(255,0)}));
                for(uint32_t i = 3333333; i < 6666666; i++ )
                    vindex.append(QVector<uint>({getRand(30,15),getRand(15,0),getRand(255,0)}));
                for(uint32_t i = 6666666; i < 10000000; i++ )
                    vindex.append(QVector<uint>({getRand(15,0),getRand(15,0),getRand(400,254)}));
            

            and pushed the number of cycles to 60.

            I'm however not sure, what you mean with:

            and also the QVector::at where the arguments are taken from the precomputed array
            

            AFAIK uint64_t m_data[16][16][256]; doesn't have an at opperator. I changed the initializaion however, for the CPU to have one less step:

            inline uint64_t &getvalueMethod_3(const uint &x, const uint &y, const uint &z)
                {
                    uint64_t *item(&m_data[x][y][z]);
                    return item > m_last ? m_data[0][0][0] : *item;
                }
            

            The result with those changes:

            • Average Time for Method 1: 10813
            • Average Time for Method 2: 10819
            • Average Time for Method 3: 11313

            Overall 10% faster than previously, so the call instruction did indeed take up a significant amount of time!
            Method 1 and 2 are even more equal, like expected.
            But no change for method 3.


            Be aware of the Qt Code of Conduct, when posting : https://forum.qt.io/topic/113070/qt-code-of-conduct


            Q: What's that?
            A: It's blue light.
            Q: What does it do?
            A: It turns blue.

            kshegunovK 1 Reply Last reply
            1
            • J.HilkJ J.Hilk

              hi, @kshegunov

              like you suggested, I wrapped the getvalueMethod_X functions in an inline statement.

              getRand is defined right above main:

              uint getRand(int max, int min){
                  return rand()%(max-min+1)+min;
              }
              

              But you're right it would be better to also have values that are outside the intended scope:
              so I made the following changes:

              for(uint32_t i = 0; i < 3333333; i++ )
                      vindex.append(QVector<uint>({getRand(15,0),getRand(15,0),getRand(255,0)}));
                  for(uint32_t i = 3333333; i < 6666666; i++ )
                      vindex.append(QVector<uint>({getRand(30,15),getRand(15,0),getRand(255,0)}));
                  for(uint32_t i = 6666666; i < 10000000; i++ )
                      vindex.append(QVector<uint>({getRand(15,0),getRand(15,0),getRand(400,254)}));
              

              and pushed the number of cycles to 60.

              I'm however not sure, what you mean with:

              and also the QVector::at where the arguments are taken from the precomputed array
              

              AFAIK uint64_t m_data[16][16][256]; doesn't have an at opperator. I changed the initializaion however, for the CPU to have one less step:

              inline uint64_t &getvalueMethod_3(const uint &x, const uint &y, const uint &z)
                  {
                      uint64_t *item(&m_data[x][y][z]);
                      return item > m_last ? m_data[0][0][0] : *item;
                  }
              

              The result with those changes:

              • Average Time for Method 1: 10813
              • Average Time for Method 2: 10819
              • Average Time for Method 3: 11313

              Overall 10% faster than previously, so the call instruction did indeed take up a significant amount of time!
              Method 1 and 2 are even more equal, like expected.
              But no change for method 3.

              kshegunovK Offline
              kshegunovK Offline
              kshegunov
              Moderators
              wrote on last edited by kshegunov
              #12

              @J.Hilk said in casts, if's and array-access.:

              I'm however not sure, what you mean with

              I meant here:

              value = dStorage.getvalueMethod_1(vindex.at(i).at(0),vindex.at(i).at(1), vindex.at(i).at(2));
              

              but I just realized QVector::at returns a const reference so nevermind that. However you may want to consider using a statically sized c-style array, as you may be getting a lot of cache misses due to the implicit sharing pointer indirection from the vector -
              another thing to try.

              But no change for method 3.

              I never gave any guarantees it'd be faster; that's why you're profiling to begin with ;)
              And while on the subject, did you get that assembly? And do say how do you compile your tests, with -g -O2?

              Read and abide by the Qt Code of Conduct

              J.HilkJ 1 Reply Last reply
              1
              • kshegunovK kshegunov

                @J.Hilk said in casts, if's and array-access.:

                I'm however not sure, what you mean with

                I meant here:

                value = dStorage.getvalueMethod_1(vindex.at(i).at(0),vindex.at(i).at(1), vindex.at(i).at(2));
                

                but I just realized QVector::at returns a const reference so nevermind that. However you may want to consider using a statically sized c-style array, as you may be getting a lot of cache misses due to the implicit sharing pointer indirection from the vector -
                another thing to try.

                But no change for method 3.

                I never gave any guarantees it'd be faster; that's why you're profiling to begin with ;)
                And while on the subject, did you get that assembly? And do say how do you compile your tests, with -g -O2?

                J.HilkJ Offline
                J.HilkJ Offline
                J.Hilk
                Moderators
                wrote on last edited by
                #13

                @kshegunov

                I actually tested it with '[]' calls and it doubled the execution time.

                may be getting a lot of cache misses due to the implicit sharing pointer indirection from the vector

                the variables are passed as:
                inline uint64_t &getvalueMethod_2(const uint &x, const uint &y, const uint &z) with QVector::at returning a const reference I'm directly passing the memory, am I not?

                And while on the subject, did you get that assembly?

                I'm sorry, but I'm not familiar with that term! Its something I would do to start my lawn mower. But thats probably just me as a non native speaker x)

                And do say how do you compile your tests, with -g -O2?

                Haven't touch the default compiler settings. My project, that is potentially using this class, would use the default settings as well, so I thought better not to touch it.

                I did however make an embarrassing mistake, that I shall not comment further on...

                but I'm now down to 300-1000 nsec per function call. The 3 functions are virtually the same in access time.

                Save to say: "Use whatever is better for your OCD-Code-structure!"


                Be aware of the Qt Code of Conduct, when posting : https://forum.qt.io/topic/113070/qt-code-of-conduct


                Q: What's that?
                A: It's blue light.
                Q: What does it do?
                A: It turns blue.

                kshegunovK 1 Reply Last reply
                1
                • sierdzioS Offline
                  sierdzioS Offline
                  sierdzio
                  Moderators
                  wrote on last edited by
                  #14

                  By assembly, @kshegunov meant this: https://stackoverflow.com/questions/137038/how-do-you-get-assembler-output-from-c-c-source-in-gcc

                  By comparing assembly code that the compiler produces we can really see how well optimized the code is after the compiler had done it's magic.

                  I did however make an embarrassing mistake, that I shall not comment further on...
                  but I'm now down to 300-1000 nsec per function call. The 3 functions are virtually the same in access time.

                  Oh now you've made me super curious! :-)

                  All in all, good to know that all these solutions are equally valid. We've all learned a bit here.

                  (Z(:^

                  1 Reply Last reply
                  1
                  • J.HilkJ J.Hilk

                    @kshegunov

                    I actually tested it with '[]' calls and it doubled the execution time.

                    may be getting a lot of cache misses due to the implicit sharing pointer indirection from the vector

                    the variables are passed as:
                    inline uint64_t &getvalueMethod_2(const uint &x, const uint &y, const uint &z) with QVector::at returning a const reference I'm directly passing the memory, am I not?

                    And while on the subject, did you get that assembly?

                    I'm sorry, but I'm not familiar with that term! Its something I would do to start my lawn mower. But thats probably just me as a non native speaker x)

                    And do say how do you compile your tests, with -g -O2?

                    Haven't touch the default compiler settings. My project, that is potentially using this class, would use the default settings as well, so I thought better not to touch it.

                    I did however make an embarrassing mistake, that I shall not comment further on...

                    but I'm now down to 300-1000 nsec per function call. The 3 functions are virtually the same in access time.

                    Save to say: "Use whatever is better for your OCD-Code-structure!"

                    kshegunovK Offline
                    kshegunovK Offline
                    kshegunov
                    Moderators
                    wrote on last edited by
                    #15

                    @J.Hilk said in casts, if's and array-access.:

                    I actually tested it with '[]' calls and it doubled the execution time.

                    This is very odd, are you sure you didn't do anything strange. I'd expect it to be a bit faster, but noticeable.

                    I'm directly passing the memory, am I not?

                    Yes, but the QVector has a data pointer inside which is there to implement implicit sharing. So you're actually getting something like:

                    QVectorData * d; //< Already in the vector implementation
                    return d->elements[index]; //< Indirection through the data pointer
                    

                    albeit most of it is probably inlined by the compiler. Still the memory fetch is there.

                    I'm sorry, but I'm not familiar with that term! Its something I would do to start my lawn mower. But thats probably just me as a non native speaker x)

                    Okay, I mean the compiled binary code - the assembly. You can get that from Creator by selecting Debug > Operate by instruction from the menu when you have an active debug session.

                    Haven't touch the default compiler settings.

                    Well, perhaps you should or rather create a profile target for Creator. It'd include -g -O2 switches if you work with gcc, meaning the compiler will generate debug info, but will also perform the standard release optimizations. The idea is to have the optimized code (what you'd have in release) without stripping all the debug info - allowing for a reasonable profiling.

                    but I'm now down to 300-1000 nsec per function call.

                    That's in agreement with the table I posted earlier - a couple or so (short) instructions; a few clock ticks.

                    The 3 functions are virtually the same in access time.

                    Don't rush it, we would still love to do some more dissecting ;P
                    I share @sierdzio's curiosity.

                    Read and abide by the Qt Code of Conduct

                    1 Reply Last reply
                    0
                    • SGaistS Offline
                      SGaistS Offline
                      SGaist
                      Lifetime Qt Champion
                      wrote on last edited by
                      #16

                      Hi,

                      Shouldn't it rather be:

                      QVector<int> paramVector = vindex.at(i);
                      value = dStorage.getvalueMethod_1(paramVector.at(0), paramVector.at(1), paramVector.at(2));
                      

                      to ensure the compiler can optimise access a bit ?

                      Interested in AI ? www.idiap.ch
                      Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

                      1 Reply Last reply
                      1

                      • Login

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • Users
                      • Groups
                      • Search
                      • Get Qt Extensions
                      • Unsolved