Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Special Interest Groups
  3. C++ Gurus
  4. Integer vs floating point division

Integer vs floating point division

Scheduled Pinned Locked Moved Unsolved C++ Gurus
8 Posts 4 Posters 2.9k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • nageshN Offline
    nageshN Offline
    nagesh
    wrote on last edited by kshegunov
    #1

    @jeremy_k

    It is also likely faster.
    

    I got following thread which discusses about floating arithmatic vs integer. It states floating aritchmatic whether it's multiplication or divisionis faster than integer.

    [Forked from QProgressbar not updating ~kshegunov]

    jeremy_kJ kshegunovK 2 Replies Last reply
    0
    • nageshN nagesh

      @jeremy_k

      It is also likely faster.
      

      I got following thread which discusses about floating arithmatic vs integer. It states floating aritchmatic whether it's multiplication or divisionis faster than integer.

      [Forked from QProgressbar not updating ~kshegunov]

      jeremy_kJ Offline
      jeremy_kJ Offline
      jeremy_k
      wrote on last edited by
      #2

      It's the conversion to and from double that caught my attention.

      Asking a question about code? http://eel.is/iso-c++/testcase/

      1 Reply Last reply
      1
      • nageshN nagesh

        @jeremy_k

        It is also likely faster.
        

        I got following thread which discusses about floating arithmatic vs integer. It states floating aritchmatic whether it's multiplication or divisionis faster than integer.

        [Forked from QProgressbar not updating ~kshegunov]

        kshegunovK Offline
        kshegunovK Offline
        kshegunov
        Moderators
        wrote on last edited by kshegunov
        #3

        @nagesh said in QProgressbar not updating:

        I got following thread which discusses about floating arithmatic vs integer. It states floating aritchmatic whether it's multiplication or divisionis faster than integer.

        Your conclusion from that thread is wrong on so many levels, but if you want to discuss it, we should fork this ...

        Read and abide by the Qt Code of Conduct

        1 Reply Last reply
        0
        • nageshN Offline
          nageshN Offline
          nagesh
          wrote on last edited by
          #4

          @kshegunov fork the thread for discussion.. Whether it's true that floating arithmetic operation (division / multiplication) is faster than integer operation.

          Chris KawaC kshegunovK 2 Replies Last reply
          0
          • nageshN nagesh

            @kshegunov fork the thread for discussion.. Whether it's true that floating arithmetic operation (division / multiplication) is faster than integer operation.

            Chris KawaC Offline
            Chris KawaC Offline
            Chris Kawa
            Lifetime Qt Champion
            wrote on last edited by Chris Kawa
            #5

            @nagesh said in Integer vs floating point division:

            Whether it's true that floating arithmetic operation (division / multiplication) is faster than integer operation.

            There's no one answer to that. It depends on the underlying hardware. For couple of decades now most consumer grade CPUs have dedicated silicon for floating point operations so that's often true, but there's also the embedded and smaller scale market where CPUs could have half precision floats in hardware and floats and doubles emulated in software or no hardware floating point support at all. Performance of floating point vs integer will also vary depending on a CPU vendor and model.

            The point though is that, as @jeremy_k noted, the conversions are the expensive part, so don't do something like this:

            int foo = 666;
            int bar = foo / 42.0;  //foo converted to double -> double division -> result truncation to int
            

            If you want integer results try to stick to integer operations (by rearranging expressions if needed) and if you want floating point results stick to floating point. Also watch out for float to double conversions, as those can be costly too, so don't forget your f postfix when using float constants.

            1 Reply Last reply
            6
            • nageshN nagesh

              @kshegunov fork the thread for discussion.. Whether it's true that floating arithmetic operation (division / multiplication) is faster than integer operation.

              kshegunovK Offline
              kshegunovK Offline
              kshegunov
              Moderators
              wrote on last edited by kshegunov
              #6

              Right, so to start the ball rolling I should note that what you see in C++ isn't what you get translated into asm, and what you see in asm may not be what is executed on the CPU (directly). There are several significant points to cover, so let's make a list (lists are the best, aren't they):

              1. There's no information of how said code was compiled, what flags were passed to the compiler
                1.1) As there's no information about the compiler and flags, there's no knowledge of what assembly was generated
                1.2) There's an implicit truncation for each FP operation, which may or may not matter
              2. The accepted answer talks about latency, but that's relevant only sometimes and is most certainly not a sole reason to claim performance
              3. The accepted answer claims all integer divisions are done in the natural DWORD (a.k.a. int32), which isn't the case. If the compiler can it will not promote the arguments needlessly

              So from the top:

              1.1:
              The compiler flags matter as they influence the order (and sometimes the types) of instructions generated. It's more relevant in that sense to show the generated asm (with notable mention of the compiler that produced it) instead of reasoning about C++ code.

              1.2:
              This is possibly hurting the performance of the FP code, as it typically requires a cvttss2si instruction to convert back to an integer before the addition happens. On the other hand if the CPU supports it up to eight float divisions can be done at once (and this usually happens with arrays if the code is vectorized).

              Here's an example of unrolling the loop which triggers the compiler to automatically vectorize (SIMD) the division (divps): https://godbolt.org/z/bEEE59PTh

              Notice how the four divisions are now a single instruction as the compiler recognized that there's no dependence between them and aggressively optimized for it.

              2:
              This is a long one. There's a different number of FPU and ALUs on a modern CPU (they're per core), so you should take that into account. This is also why there's a computer inside the computer (ghost in the machine) - what you see as instructions is not necessarily what's executed in that specific order. The CPU may (and it will) reorder the operations based on how full the pipeline is and what can be put where, to fill the empty gaps (in terms of free cycles) - this is the so called out-of-order execution.

              Out-of-order execution is also the reason you have atomic semantics with integers, this isn't because the increment isn't atomic as such, but rather because the CPU may reorder it somewhere else, which you don't typically want if it is to be shared between cores.

              All that said, latency comes into play whenever the pipeline is full. That's the number of cycles you need to wait before your instruction can get executed, but note, this doesn't happen all the time, hence it's a very poor predictor of "performance".
              So back to our specific case. My CPU (a Ryzen) has 4 ALUs and a single FPU per core, meaning that I can get 4 integer operations in parallel out of the box, while I get only one FP operation per turn. This matters very much whenever you crunch numbers, where you want to squeeze out every last bit out of the FPU. For integer operations that's already done for you by the prefetcher (if the CPU decides it's possible) so you get it "for free" so to speak.

              3:
              The claim that the division is always going to happen in a DWORD (a.k.a. 32 bit integer) is flat wrong, at least on the side of the asm, how the CPU does it under the hood is really an implementation which have no control over.
              So here's the example: https://godbolt.org/z/3xjx375b3
              (Also very similar for signed integers: https://godbolt.org/z/EnbxcKe8v)

              Notice that the div uses a source operand that is of WORD size, thus it operates solely on the 16 bits, not on the whole of the eax register (32 bits). This can also exemplified from the next line, where the compiler reads back the value directly from the low WORD of the eax register: mov WORD PTR [rsp-104+rcx], ax.

              So is this faster? Well, unfortunately it really depends, but if the ALUs are free to do it, maybe, even probably. If the FP operation gets vectorized on the other hand, probably not. If the FP results are truncated to integers after the fact, however, probably sticking to integer divisions is going to be better.

              As you can see there's a lot of "if"s and "when"s here, but just claiming that an FP operation is faster than an integer one isn't correct, not by a long shot.

              Read and abide by the Qt Code of Conduct

              1 Reply Last reply
              5
              • nageshN Offline
                nageshN Offline
                nagesh
                wrote on last edited by nagesh
                #7

                Thanks @Chris-Kawa and @kshegunov for the detailed answer to the problem statement.

                I do agree that there is no one answer to this problem statement, as it depends upon lot of factors such as hardware architecture, compiler flag options, FPU and many more.

                From the above discussion it clarifies that outcome of the result is uncertain/depends upon factors/scenario. (well detailed in @kshegunov post)

                So my next questions is to broaden the discussion (slight deviation from topic) will be as follows:

                1. What does the programmer needs to follow as the guidelines
                  by keeping this uncertainty in mind?
                  Questioning here because since there is uncertainty about the performance
                  (integer vs float) how can the single guidelines suits here?

                2. If at all some guidelines for performance is set, Does the programmer needs to worry or pay attention to the each instruction for performance point of view? or when it really matters?

                  Eg: Does the performance really matter in updating the slider value in event driven/timer or in the Computation which is taking place in Time critical domain in real time systems (eg:Missile Fire Control system)

                3. Some times by rearranging expressions the original meaning of the statement is difficult to grasp from the code maintainability point of view.

                   stick to integer operations (by rearranging expressions if needed)
                
                1. Can this be followed as guidelines?
                however, probably sticking to integer divisions is going to be better
                

                There was some discussion in thread for multiplication vs division performance

                Is it true/universal that multiplication is faster than division?
                If the above statements true (multiplication is faster) with many instruction cycles,
                then why don't we write the instruction using multiplication.

                How about this statements performance?

                constexpr float MultFactor = (1/42.0f); //One Time calculation
                int foo = 666;
                int bar = foo * MultFactor;
                
                1. code sample to check performance of integer division vs float multiplication as discussed in point(4)
                void PerformanceTest()
                {
                
                    int randomvalue   = 0;
                    int intExpCount   = 0;//keep track of no. of times int division is faster
                    int floatExpCount = 0; //keep track of no. of times float multiplication is faster
                    int loopCount     = 100;
                
                    constexpr float MultFactor = (1/42.0f); //One Time calculation
                    for( int i=0; i<loopCount; ++i )
                    {
                        randomvalue = qrand();
                
                        QElapsedTimer elTimer;
                
                        elTimer.start();
                        int val3 = randomvalue;
                        int val4 = val3 * MultFactor; //modified expression
                        //int val4 = val3 / 42.0f; //original expression
                        quint64 elapsed2 = elTimer.nsecsElapsed();
                
                        elTimer.start();
                        int val1 = randomvalue;
                        int val2 = val1 / 42;
                        quint64 elapsed1 = elTimer.nsecsElapsed();
                
                        //        qDebug()<<"elapsed1 \t"<<elapsed1;
                        //        qDebug()<<"elapsed2 \t"<<elapsed2;
                        //        qDebug()<<"Diff Time \t"<<DiffTime;
                        //        qDebug()<<"\tRandValue is "<<BigValue<<"Answer is "<<val2<<" "<<val4<<"";
                
                        qint64 DiffTime = ( elapsed1 - elapsed2 );
                        if(DiffTime >= 0)
                            ++floatExpCount;//floating operation takes less time
                        else
                            ++intExpCount;
                    }
                
                    float percentfloat = (floatExpCount * 100)  / loopCount;
                    float percentint   = (intExpCount   * 100 ) / loopCount;
                
                    qDebug()<<"(% Performance) ExprMultiplication in Float \t"<<percentfloat;
                    qDebug()<<"(% Performance) ExprDivision in int \t"<<percentint;
                }
                

                TimeElapsed is used to check, how fast the computation is performed.

                kshegunovK 1 Reply Last reply
                0
                • nageshN nagesh

                  Thanks @Chris-Kawa and @kshegunov for the detailed answer to the problem statement.

                  I do agree that there is no one answer to this problem statement, as it depends upon lot of factors such as hardware architecture, compiler flag options, FPU and many more.

                  From the above discussion it clarifies that outcome of the result is uncertain/depends upon factors/scenario. (well detailed in @kshegunov post)

                  So my next questions is to broaden the discussion (slight deviation from topic) will be as follows:

                  1. What does the programmer needs to follow as the guidelines
                    by keeping this uncertainty in mind?
                    Questioning here because since there is uncertainty about the performance
                    (integer vs float) how can the single guidelines suits here?

                  2. If at all some guidelines for performance is set, Does the programmer needs to worry or pay attention to the each instruction for performance point of view? or when it really matters?

                    Eg: Does the performance really matter in updating the slider value in event driven/timer or in the Computation which is taking place in Time critical domain in real time systems (eg:Missile Fire Control system)

                  3. Some times by rearranging expressions the original meaning of the statement is difficult to grasp from the code maintainability point of view.

                     stick to integer operations (by rearranging expressions if needed)
                  
                  1. Can this be followed as guidelines?
                  however, probably sticking to integer divisions is going to be better
                  

                  There was some discussion in thread for multiplication vs division performance

                  Is it true/universal that multiplication is faster than division?
                  If the above statements true (multiplication is faster) with many instruction cycles,
                  then why don't we write the instruction using multiplication.

                  How about this statements performance?

                  constexpr float MultFactor = (1/42.0f); //One Time calculation
                  int foo = 666;
                  int bar = foo * MultFactor;
                  
                  1. code sample to check performance of integer division vs float multiplication as discussed in point(4)
                  void PerformanceTest()
                  {
                  
                      int randomvalue   = 0;
                      int intExpCount   = 0;//keep track of no. of times int division is faster
                      int floatExpCount = 0; //keep track of no. of times float multiplication is faster
                      int loopCount     = 100;
                  
                      constexpr float MultFactor = (1/42.0f); //One Time calculation
                      for( int i=0; i<loopCount; ++i )
                      {
                          randomvalue = qrand();
                  
                          QElapsedTimer elTimer;
                  
                          elTimer.start();
                          int val3 = randomvalue;
                          int val4 = val3 * MultFactor; //modified expression
                          //int val4 = val3 / 42.0f; //original expression
                          quint64 elapsed2 = elTimer.nsecsElapsed();
                  
                          elTimer.start();
                          int val1 = randomvalue;
                          int val2 = val1 / 42;
                          quint64 elapsed1 = elTimer.nsecsElapsed();
                  
                          //        qDebug()<<"elapsed1 \t"<<elapsed1;
                          //        qDebug()<<"elapsed2 \t"<<elapsed2;
                          //        qDebug()<<"Diff Time \t"<<DiffTime;
                          //        qDebug()<<"\tRandValue is "<<BigValue<<"Answer is "<<val2<<" "<<val4<<"";
                  
                          qint64 DiffTime = ( elapsed1 - elapsed2 );
                          if(DiffTime >= 0)
                              ++floatExpCount;//floating operation takes less time
                          else
                              ++intExpCount;
                      }
                  
                      float percentfloat = (floatExpCount * 100)  / loopCount;
                      float percentint   = (intExpCount   * 100 ) / loopCount;
                  
                      qDebug()<<"(% Performance) ExprMultiplication in Float \t"<<percentfloat;
                      qDebug()<<"(% Performance) ExprDivision in int \t"<<percentint;
                  }
                  

                  TimeElapsed is used to check, how fast the computation is performed.

                  kshegunovK Offline
                  kshegunovK Offline
                  kshegunov
                  Moderators
                  wrote on last edited by
                  #8

                  @nagesh said in Integer vs floating point division:

                  So my next questions is to broaden the discussion (slight deviation from topic) will be as follows:

                  1. What does the programmer needs to follow as the guidelines
                    by keeping this uncertainty in mind?
                    Questioning here because since there is uncertainty about the performance
                    (integer vs float) how can the single guidelines suits here?

                  Conventional wisdom applies:
                  The output type rules them all, so choose the operation type based on that, which is what Chris mentioned. If you need the end result as int do integer division, if you need a float - do a FP division. Don't do type conversions between the two just because you can, be it explicitly or implicitly.

                  <rant> C/C++ are notorious for their implicit type conversions, don't fall into that trap </rant>

                  1. If at all some guidelines for performance is set, Does the programmer needs to worry or pay attention to the each instruction for performance point of view? or when it really matters?

                  99.999999% of the time you write decent code in C++ and leave the compiler to figure out the instructions, and I really mean it. Compilers are very, very, very smart nowadays. If you're optimizing for performance, algorithmic complexity is king, then cache coherency and optimizing for cache misses is going to matter much more than anything else you can do on the lower-level, after that comes vectorization (SSE/AVX) and last comes fiddling with single instructions.

                  Eg: Does the performance really matter in updating the slider value in event driven/timer

                  Not really.

                  or in the Computation which is taking place in Time critical domain in real time systems (eg:Missile Fire Control system)

                  This is entirely different universe. Time critical applications don't run on a preemptive scheduler, so there the design is very much different to begin with.

                  1. Some times by rearranging expressions the original meaning of the statement is difficult to grasp from the code maintainability point of view.

                  Write a comment to explain if you feel it's not straightforward. While the compiler is going to optimize many calculations it will not do magic by any measure - it can optimize out something iff (if and only if) it can prove the result is going to be correct.

                  1. Can this be followed as guidelines?
                    Is it true/universal that multiplication is faster than division?

                  Mostly true, yes.

                  If the above statements true (multiplication is faster) with many instruction cycles,
                  then why don't we write the instruction using multiplication.

                  I assume you mean FP multiplication by the reciprocal. You don't write it as a multiplication because of rounding errors. Even if your reciprocal is exact to the extent of the representable significance of the floating point number, the multiplication introduces half an epsilon of rounding error. This in essence means that your last binary digit might not be correct if you use multiplication by a reciprocal instead of a division.

                  There are two cases where you can:

                  1. The reciprocal is known and exactly representable, e.g. 0.5, 0.25 and all the integers are exactly representable.
                  2. You don't care about the last bit of the mantissa.

                  How about this statements performance?

                  constexpr float MultFactor = (1/42.0f); //One Time calculation
                  int foo = 666;
                  int bar = foo * MultFactor;
                  

                  The performance of this is absolutely constant. Everything is evaluated at compile time.

                  1. code sample to check performance of integer division vs float multiplication as discussed in point(4)

                  Designing such tests is hard and fiddly. You must be very sure of what exactly the compiler generates and how to break some of the possible optimizations it may do. If you notice in my example code above I used a volatile as an index to read from the array. This I did so the compiler can never realize (as it's obligated to always do a read from memory for each volatile) that it can remove the whole of the for loop and just calculate one single value.

                  I'll think about how you could conceivably test this, bit it may take some time ...

                  Read and abide by the Qt Code of Conduct

                  1 Reply Last reply
                  5

                  • Login

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • Users
                  • Groups
                  • Search
                  • Get Qt Extensions
                  • Unsolved