Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Special Interest Groups
  3. C++ Gurus
  4. How to increase speed of large for loops
Forum Update on Monday, May 27th 2025

How to increase speed of large for loops

Scheduled Pinned Locked Moved Unsolved C++ Gurus
30 Posts 8 Posters 6.5k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • aha_1980A aha_1980

    @kane9x

    If your code does not work as fast as expected, you should do two things first:

    1. Ask yourself if you use the best algorithm for the given problem
    2. Profile your alogrithm to find out the slowest part. Store the result for later comparism.

    You cannot start optimizing before these two steps are finished. Next, set up good unit tests that make sure the behavior does not change when refactoring. Then, replace the slowest part with a better implementation.

    Regards

    K Offline
    K Offline
    kane9x
    Banned
    wrote on last edited by
    #11

    @aha_1980 Thank you, I think what I should do now is to find some algorithm to help me optimize it

    JonBJ 1 Reply Last reply
    0
    • K kane9x

      @aha_1980 Thank you, I think what I should do now is to find some algorithm to help me optimize it

      JonBJ Offline
      JonBJ Offline
      JonB
      wrote on last edited by
      #12

      @kane9x

      nearly about 8e+12 iterations

      I don't know quite what you're trying to do why, but if you mean you have approx a trillion iterations/square roots etc. to calculate that's a very large number to be executing if speed is critical....

      kshegunovK 1 Reply Last reply
      1
      • JonBJ JonB

        @kane9x

        nearly about 8e+12 iterations

        I don't know quite what you're trying to do why, but if you mean you have approx a trillion iterations/square roots etc. to calculate that's a very large number to be executing if speed is critical....

        kshegunovK Offline
        kshegunovK Offline
        kshegunov
        Moderators
        wrote on last edited by kshegunov
        #13

        @JonB said in How to increase speed of large for loops:

        I don't know quite what you're trying to do why, but if you mean you have approx a trillion iterations/square roots etc. to calculate that's a very large number to be executing if speed is critical....

        Maybe I can help with your confusion. The OP is trying to calculate the euclidean distance for a set of three points and do so by using a permutation of those three points from the whole set. Something they should've precalculated and stored and something they should've used the SIMD instructions for.

        Read and abide by the Qt Code of Conduct

        kshegunovK JonBJ 2 Replies Last reply
        3
        • kshegunovK kshegunov

          @JonB said in How to increase speed of large for loops:

          I don't know quite what you're trying to do why, but if you mean you have approx a trillion iterations/square roots etc. to calculate that's a very large number to be executing if speed is critical....

          Maybe I can help with your confusion. The OP is trying to calculate the euclidean distance for a set of three points and do so by using a permutation of those three points from the whole set. Something they should've precalculated and stored and something they should've used the SIMD instructions for.

          kshegunovK Offline
          kshegunovK Offline
          kshegunov
          Moderators
          wrote on last edited by
          #14

          PS.
          Just to be clear, the indirection cloudB->points[v[p0]] is a cache line invalidation every time.

          Read and abide by the Qt Code of Conduct

          1 Reply Last reply
          2
          • kshegunovK kshegunov

            @JonB said in How to increase speed of large for loops:

            I don't know quite what you're trying to do why, but if you mean you have approx a trillion iterations/square roots etc. to calculate that's a very large number to be executing if speed is critical....

            Maybe I can help with your confusion. The OP is trying to calculate the euclidean distance for a set of three points and do so by using a permutation of those three points from the whole set. Something they should've precalculated and stored and something they should've used the SIMD instructions for.

            JonBJ Offline
            JonBJ Offline
            JonB
            wrote on last edited by
            #15

            @kshegunov said in How to increase speed of large for loops:

            Maybe I can help with your confusion. The OP is trying to calculate the euclidean distance for a set of three points

            Yes, I realised it was this sort of thing. However, AFAIK Euclid did not have the aid of a PC and presumably would have struggled to calculate a trillion distances by hand... :)

            kshegunovK 1 Reply Last reply
            0
            • JonBJ JonB

              @kshegunov said in How to increase speed of large for loops:

              Maybe I can help with your confusion. The OP is trying to calculate the euclidean distance for a set of three points

              Yes, I realised it was this sort of thing. However, AFAIK Euclid did not have the aid of a PC and presumably would have struggled to calculate a trillion distances by hand... :)

              kshegunovK Offline
              kshegunovK Offline
              kshegunov
              Moderators
              wrote on last edited by kshegunov
              #16

              @JonB said in How to increase speed of large for loops:

              AFAIK Euclid did not have the aid of a PC and presumably would have struggled to calculate a trillion distances by hand...

              Probably not. But I imagine, him being a smart guy, he'd've tabulated whatever he had already calculated so he didn't need to do it again ... at least seems logical to me.

              Read and abide by the Qt Code of Conduct

              JonBJ 1 Reply Last reply
              0
              • kshegunovK kshegunov

                @JonB said in How to increase speed of large for loops:

                AFAIK Euclid did not have the aid of a PC and presumably would have struggled to calculate a trillion distances by hand...

                Probably not. But I imagine, him being a smart guy, he'd've tabulated whatever he had already calculated so he didn't need to do it again ... at least seems logical to me.

                JonBJ Offline
                JonBJ Offline
                JonB
                wrote on last edited by
                #17

                @kshegunov
                Trouble is, writing down the answers to a trillion square roots takes a lot of space. And with that many even look-up time is going to get considerable....

                kshegunovK 1 Reply Last reply
                0
                • JonBJ JonB

                  @kshegunov
                  Trouble is, writing down the answers to a trillion square roots takes a lot of space. And with that many even look-up time is going to get considerable....

                  kshegunovK Offline
                  kshegunovK Offline
                  kshegunov
                  Moderators
                  wrote on last edited by kshegunov
                  #18

                  @JonB said in How to increase speed of large for loops:

                  Trouble is, writing down the answers to a trillion square roots takes a lot of space. And with that many even look-up time is going to get considerable....

                  Mayhaps. I do like the "we create hardware out of software" approach, I admit, unfortunately this rarely works in practice. Leaving the metaphors to rest for a moment, I implore you to really try to imagine how this is supposed to work and do the following:

                  1. Notice the inner loop is only interesting if the distance between two points is more than some magic number (not having semi-divine in-code numbers is a matter for another discussion).
                  2. Notice the inner if is checking if two distances (between two pairs of points) are larger than some arbitrary numbers.
                  3. Notice that the distance between two points is the same no matter which is first and which is second.
                  4. Notice that distances are recalculated for every conceivable case of point pairing.
                  5. Finally (and least importantly), notice that the indirection through some permutation vector brakes data locality and thus invalidates the cache.

                  Now after a quick think, I hallucinate that 1), 2), 3) and 4) can be fixed rather easily in a single step, without throwing recursive template instantiations at pow, mind you. My "genius" idea is as follows:

                  1. Go through the pairs of points and save in a container only these pairs (and the distance between them) that satisfy the threshold.
                    1.1) When doing that it's useful to not repeat, thus the distance from A to B is going to be the same as the distance from B to A, unless living in an alternate world. This should help shave off some unnecessary duplication.
                    1.2) Before doing that it's also useful to throw away the permutation vector if possible, so 5) to be solved by construction.
                  2. For the resulting container from 1) (probably a vector) one can see that the innermost if is directly satisfied for any pair of elements ...
                  3. Step 1) can be parallelized very easily for additional yield.
                  4. Step 1) can make use of SSE/AVX.

                  Read and abide by the Qt Code of Conduct

                  JonBJ 1 Reply Last reply
                  1
                  • kshegunovK kshegunov

                    @JonB said in How to increase speed of large for loops:

                    Trouble is, writing down the answers to a trillion square roots takes a lot of space. And with that many even look-up time is going to get considerable....

                    Mayhaps. I do like the "we create hardware out of software" approach, I admit, unfortunately this rarely works in practice. Leaving the metaphors to rest for a moment, I implore you to really try to imagine how this is supposed to work and do the following:

                    1. Notice the inner loop is only interesting if the distance between two points is more than some magic number (not having semi-divine in-code numbers is a matter for another discussion).
                    2. Notice the inner if is checking if two distances (between two pairs of points) are larger than some arbitrary numbers.
                    3. Notice that the distance between two points is the same no matter which is first and which is second.
                    4. Notice that distances are recalculated for every conceivable case of point pairing.
                    5. Finally (and least importantly), notice that the indirection through some permutation vector brakes data locality and thus invalidates the cache.

                    Now after a quick think, I hallucinate that 1), 2), 3) and 4) can be fixed rather easily in a single step, without throwing recursive template instantiations at pow, mind you. My "genius" idea is as follows:

                    1. Go through the pairs of points and save in a container only these pairs (and the distance between them) that satisfy the threshold.
                      1.1) When doing that it's useful to not repeat, thus the distance from A to B is going to be the same as the distance from B to A, unless living in an alternate world. This should help shave off some unnecessary duplication.
                      1.2) Before doing that it's also useful to throw away the permutation vector if possible, so 5) to be solved by construction.
                    2. For the resulting container from 1) (probably a vector) one can see that the innermost if is directly satisfied for any pair of elements ...
                    3. Step 1) can be parallelized very easily for additional yield.
                    4. Step 1) can make use of SSE/AVX.
                    JonBJ Offline
                    JonBJ Offline
                    JonB
                    wrote on last edited by
                    #19

                    @kshegunov Indeedy, these are all good points, FAO the OP, @kane9x, not me!

                    kshegunovK 1 Reply Last reply
                    0
                    • JonBJ JonB

                      @kshegunov Indeedy, these are all good points, FAO the OP, @kane9x, not me!

                      kshegunovK Offline
                      kshegunovK Offline
                      kshegunov
                      Moderators
                      wrote on last edited by kshegunov
                      #20

                      And I was just thinkin' we are having a nice easygoing conversation ... 'cause when I see this:

                      template< int exponent, typename T >
                      T power( T base )
                      {
                          // ...
                      }
                      

                      I cringe so badly my face is contorted for a week.

                      Read and abide by the Qt Code of Conduct

                      JonBJ 1 Reply Last reply
                      0
                      • kshegunovK kshegunov

                        And I was just thinkin' we are having a nice easygoing conversation ... 'cause when I see this:

                        template< int exponent, typename T >
                        T power( T base )
                        {
                            // ...
                        }
                        

                        I cringe so badly my face is contorted for a week.

                        JonBJ Offline
                        JonBJ Offline
                        JonB
                        wrote on last edited by JonB
                        #21

                        @kshegunov
                        No idea what's foul about it, or the bit you've quoted, so you'd better explain? Unless you mean the whole idea of using templates, which of course I never used: C didn't need them, C++ added them as an obfuscation layer, so I'm quite happy without ;-)

                        Mind you, I looked at @JohanSolo's code above. His definition is a recursive one (return power< exponent / 2 >( base * base ) * base;). I'm surprised. This would be all very well in my old Prolog, but I don't think the C++ compiler is going to recognise & remove tail recursion in the definition. So I don't know what he means by "trivially replaced", why would one want to use such a definition?

                        kshegunovK 1 Reply Last reply
                        0
                        • JohanSoloJ Offline
                          JohanSoloJ Offline
                          JohanSolo
                          wrote on last edited by
                          #22

                          I never though my little post could produce so much noise... First the snippet is not mine, as I already stated, I took it from a lecture I followed at CERN in 2009. The lecturer was Dr Walter Brown, who was presented as: "Dr. Brown has worked for Fermilab since 1996. He is now part of the Computing Division's Future Programs and Experiments Quadrant, specializing in C++ consulting and programming. He participates in the international C++ standardization process and is responsible for several aspects of the forthcoming updated C++ Standard. In addition, he is the Project Editor for the forthcoming C++ Standard on Mathematical Special Functions."

                          About the recursive template: the compiler expands it at compile time, therefore leading to power< 4 >( x ) being replaced by x*x * x*x, which is apparently (or at least was) way faster than calling std::pow. Therefore, I expect power< 2 >( something ) to be faster than std::pow( something, 2 ).

                          `They did not know it was impossible, so they did it.'
                          -- Mark Twain

                          JonBJ 1 Reply Last reply
                          2
                          • JonBJ JonB

                            @kshegunov
                            No idea what's foul about it, or the bit you've quoted, so you'd better explain? Unless you mean the whole idea of using templates, which of course I never used: C didn't need them, C++ added them as an obfuscation layer, so I'm quite happy without ;-)

                            Mind you, I looked at @JohanSolo's code above. His definition is a recursive one (return power< exponent / 2 >( base * base ) * base;). I'm surprised. This would be all very well in my old Prolog, but I don't think the C++ compiler is going to recognise & remove tail recursion in the definition. So I don't know what he means by "trivially replaced", why would one want to use such a definition?

                            kshegunovK Offline
                            kshegunovK Offline
                            kshegunov
                            Moderators
                            wrote on last edited by kshegunov
                            #23

                            @JonB said in How to increase speed of large for loops:

                            No idea what's foul about it, or the bit you've quoted, so you'd better explain? Unless you mean the whole idea of using templates, which of course I never used: C didn't need them, C++ added them as an obfuscation layer, so I'm quite happy without ;-)

                            Recurrently instantiating a function for no apparent reason, basically invoking the sophisticated copy-paste machinery that is the compiler's template engine to produce: x * x, especially when the latter would suffice.

                            Mind you, I looked at @JohanSolo's code above. His definition is a recursive one (return power< exponent / 2 >( base * base ) * base;). I'm surprised. This would be all very well in my old Prolog, but I don't think the C++ compiler is going to recognise & remove tail recursion in the definition. So I don't know what he means by "trivially replaced", why would one want to use such a definition?

                            Code inlining is kind of a religion. Surely it has its values in the proper places, and most certainly templates make some things easier, then again ... it's very much like chocolate, when you don't eat it, you want it, when you eat it, you want more of it, but in the ultimate scheme of things it makes you fat ...

                            The most ugly thing about templates, however, is that everything has to be defined for instantiation to take place, which is of course expected. So you can't have abstractions manifested without spilling the guts of the implementations. And of course there exists no such thing as binary compatibility, as everything is recompiled every time ... such a wonderful idea.

                            @JohanSolo said in How to increase speed of large for loops:

                            I never though my little post could produce so much noise...

                            Well yeah, I'm from eastern europe - all simmering under the hood.

                            First the snippet is not mine, as I already stated, I took it from a lecture I followed at CERN in 2009.

                            Yes, I glanced at the slides. FYI even boost's math module doesn't do that kind of nonsense because fast exponentiation algorithms for integral powers was (and is known) for 50+ years. And if the compiler actually inlines all the (unnecessary) instantiations, depending on the optimizations it applies, you could end up in the same x * x * x * ... * x case. The point is computers are rather stupid, they do what we tell them to do, and ultimately everything you write is going to be compiled to binary, not to a cool concept from a book (or lecture, or w/e).

                            The lecturer was Dr Walter Brown, who was presented as: "Dr. Brown has worked for Fermilab since 1996. He is now part of the Computing Division's Future Programs and Experiments Quadrant, specializing in C++ consulting and programming. He participates in the international C++ standardization process and is responsible for several aspects of the forthcoming updated C++ Standard. In addition, he is the Project Editor for the forthcoming C++ Standard on Mathematical Special Functions."

                            Good for him. I don't know him, nor do I hold people in esteem for their titles. He might be a contemporary Einstein for all I know, but I place merit whenever I judge there to be reason for. In this case, I have not. The lecture, and all the proof of it boiling down to a synthetic test, is not nearly enough for me.
                            Just as a disclaimer, I've seen quite a lot of "scientific code" to be cynical to the point of not believing academia can (or should) write programs.

                            About the recursive template: the compiler expands it at compile time, therefore leading to power< 4 >( x ) being replaced by x*x * x*x

                            No it leads to power<4>(x) being replaced by power<2>(x) * power<2>(x) where power<2> is a distinct function. This may lead to x * x * x * x in assembly, which of course would have the same performance as multiplying the argument manually, or it may lead to be evaluated as (x * x), which is then multiplied by itself, where you may gain a multiplication. The point is your template can't tell the compiler how to produce the efficient binary code.

                            Therefore, I expect power< 2 >( something ) to be faster than std::pow( something, 2 ).

                            I expect them to be exactly the same up to a couple of push/pops and a single call.


                            I did find it rather surprising that pow and sqrt were implicated here. I'd like to top off this missive with a quotation that I love from a fictional character:

                            You wake up in the morning, your paint's peeling, your curtains are gone, and the water is boiling. Which problem do you deal with first?
                            ...
                            None of them! The building's on fire!

                            Read and abide by the Qt Code of Conduct

                            1 Reply Last reply
                            3
                            • JohanSoloJ JohanSolo

                              I never though my little post could produce so much noise... First the snippet is not mine, as I already stated, I took it from a lecture I followed at CERN in 2009. The lecturer was Dr Walter Brown, who was presented as: "Dr. Brown has worked for Fermilab since 1996. He is now part of the Computing Division's Future Programs and Experiments Quadrant, specializing in C++ consulting and programming. He participates in the international C++ standardization process and is responsible for several aspects of the forthcoming updated C++ Standard. In addition, he is the Project Editor for the forthcoming C++ Standard on Mathematical Special Functions."

                              About the recursive template: the compiler expands it at compile time, therefore leading to power< 4 >( x ) being replaced by x*x * x*x, which is apparently (or at least was) way faster than calling std::pow. Therefore, I expect power< 2 >( something ) to be faster than std::pow( something, 2 ).

                              JonBJ Offline
                              JonBJ Offline
                              JonB
                              wrote on last edited by JonB
                              #24

                              @JohanSolo

                              I never though my little post could produce so much noise...

                              It's OK, this is all a friendly debate, not a mud-slinging contest!

                              @JohanSolo , @kshegunov
                              I don't know what you are going on about with this power() stuff and in-line expansion. Just maybe the compiler is clever enough to in-line expand to avoid recursion if your code goes power<4>(x), where the 4 is a compile-time constant. However, that definition of power<> takes the exponent as a variable/parameter. So if your code calls power<n>(x) where n is a variable, I don't see how any amount of in-lining or optimizations can do anything at all, and you are left with code which will compile to a ridiculously inefficient (time & space) tail-recursive implementation, which you would be mad to use. If you're going to do in-lining, it seems to me it should be done iteratively rather than recursively in C++, no? That is what I was commenting on....

                              JohanSoloJ 1 Reply Last reply
                              0
                              • JonBJ JonB

                                @JohanSolo

                                I never though my little post could produce so much noise...

                                It's OK, this is all a friendly debate, not a mud-slinging contest!

                                @JohanSolo , @kshegunov
                                I don't know what you are going on about with this power() stuff and in-line expansion. Just maybe the compiler is clever enough to in-line expand to avoid recursion if your code goes power<4>(x), where the 4 is a compile-time constant. However, that definition of power<> takes the exponent as a variable/parameter. So if your code calls power<n>(x) where n is a variable, I don't see how any amount of in-lining or optimizations can do anything at all, and you are left with code which will compile to a ridiculously inefficient (time & space) tail-recursive implementation, which you would be mad to use. If you're going to do in-lining, it seems to me it should be done iteratively rather than recursively in C++, no? That is what I was commenting on....

                                JohanSoloJ Offline
                                JohanSoloJ Offline
                                JohanSolo
                                wrote on last edited by
                                #25

                                @JonB said in How to increase speed of large for loops:

                                @JohanSolo
                                However, that definition of power<> takes the exponent as a variable/parameter. So if your code calls power<n>(x) where n is a variable

                                In the power< n >( x ) expression, n must be known at compile time, it's a template parameter. If it is a variable it won't compile (I've just checked to be 1000% sure).

                                `They did not know it was impossible, so they did it.'
                                -- Mark Twain

                                JonBJ 1 Reply Last reply
                                2
                                • JohanSoloJ JohanSolo

                                  @JonB said in How to increase speed of large for loops:

                                  @JohanSolo
                                  However, that definition of power<> takes the exponent as a variable/parameter. So if your code calls power<n>(x) where n is a variable

                                  In the power< n >( x ) expression, n must be known at compile time, it's a template parameter. If it is a variable it won't compile (I've just checked to be 1000% sure).

                                  JonBJ Offline
                                  JonBJ Offline
                                  JonB
                                  wrote on last edited by JonB
                                  #26

                                  @JohanSolo
                                  Ohhh, I had no idea templates worked like that...! I get it now.

                                  I hope the compiler generated code copies your (first) parameter into a temporary variable/register when it expands that code in-line, else it could actually be slower....

                                  In any case, to belabour the perhaps-obvious: the squaring won't take much time, it's the square-rooting which will be slow....

                                  kshegunovK 1 Reply Last reply
                                  0
                                  • JonBJ JonB

                                    @JohanSolo
                                    Ohhh, I had no idea templates worked like that...! I get it now.

                                    I hope the compiler generated code copies your (first) parameter into a temporary variable/register when it expands that code in-line, else it could actually be slower....

                                    In any case, to belabour the perhaps-obvious: the squaring won't take much time, it's the square-rooting which will be slow....

                                    kshegunovK Offline
                                    kshegunovK Offline
                                    kshegunov
                                    Moderators
                                    wrote on last edited by
                                    #27

                                    @JonB said in How to increase speed of large for loops:

                                    Ohhh, I had no idea templates worked like that...!

                                    Your childish naïveté really made me chuckle. :)
                                    A template is (hence the name) a template for a function or class. It's nothing by itself, it does not produce binary on its own. The magic happens when instantiation takes place, that is when you supply all the dependent types (the stuff in the angle brackets) to it. Then the compiler knows how to generate actual code out of it and does so. That's all my babbling about inlining too - since the compiler has all the code for a fully specialized (i.e. everything templatey provided) template it can inline whatever it decides into the binary, which it often does. The thing is, however, that each instantiation is a distinct function/class. So power<3> is a function, which is different from power<2>, which is different from power<1> and so on ...

                                    Read and abide by the Qt Code of Conduct

                                    JonBJ 1 Reply Last reply
                                    2
                                    • kshegunovK kshegunov

                                      @JonB said in How to increase speed of large for loops:

                                      Ohhh, I had no idea templates worked like that...!

                                      Your childish naïveté really made me chuckle. :)
                                      A template is (hence the name) a template for a function or class. It's nothing by itself, it does not produce binary on its own. The magic happens when instantiation takes place, that is when you supply all the dependent types (the stuff in the angle brackets) to it. Then the compiler knows how to generate actual code out of it and does so. That's all my babbling about inlining too - since the compiler has all the code for a fully specialized (i.e. everything templatey provided) template it can inline whatever it decides into the binary, which it often does. The thing is, however, that each instantiation is a distinct function/class. So power<3> is a function, which is different from power<2>, which is different from power<1> and so on ...

                                      JonBJ Offline
                                      JonBJ Offline
                                      JonB
                                      wrote on last edited by JonB
                                      #28

                                      @kshegunov said in How to increase speed of large for loops:

                                      Your childish naïveté

                                      Maybe a touch harsh :) [Though brownie points for typing in those two accents.]

                                      I thought when I saw templates they were to do with providing type-"independent" generic functions, aka "generics" e.g. in C#. Nothing to do with in-lining....

                                      kshegunovK 1 Reply Last reply
                                      0
                                      • JonBJ JonB

                                        @kshegunov said in How to increase speed of large for loops:

                                        Your childish naïveté

                                        Maybe a touch harsh :) [Though brownie points for typing in those two accents.]

                                        I thought when I saw templates they were to do with providing type-"independent" generic functions, aka "generics" e.g. in C#. Nothing to do with in-lining....

                                        kshegunovK Offline
                                        kshegunovK Offline
                                        kshegunov
                                        Moderators
                                        wrote on last edited by
                                        #29

                                        @JonB said in How to increase speed of large for loops:

                                        Maybe a touch harsh :)

                                        Oh, you know I wrote that with loving condescension, as I usually do! ;)

                                        I thought when I saw templates they were to do with providing type-"independent" generic functions, aka "generics", e.g. in C#.

                                        Well, yes, templates are for that - providing type independent code, or rather as you put it generic, because there may be limitations put on the types involved (i.e. the type may be required to be floating point, or integral). This is fine and very useful for many purposes. Consider writing an algorithm that operates on a matrix, the matrix may contain rational numbers, or floating point, or complex, or quaternions. The point is the algorithm is the same regardless of the type it operates on (with some sane limitations).

                                        C# isn't a good example as it doesn't compile to native assembly. It has, much like its mother Java, an interpreter for opcode. That is, source is compiled to an intermediary opcode (which is similar to assembly), which is then interpreted by a virtual machine.

                                        Nothing to do with in-lining....

                                        Well, not nothing. Templates' instantiations are known fully, including all dependent types and the whole source. While you can have them hidden in a source file and prevent inlining, that's an extremely rare case. Usually the point of them being stored in the headers without exposing only the instantiated types is to allow the compiler to freely inline everything it wants to. So they're also used to hint that to the compiler. In this case that's the idea, otherwise you'd just write the simple fast exponentiation which takes 2 arguments (i.e. basically a rewrite of std::pow) instead of giving the compiler enough rope to hang itself. As I said, every instantiation is different, so the compiler is going to generate one function for each template argument, so calling power<8>, leads to your compiler generating code for power<8>, power<4>, power<2>, power<1> and so on. These are separate functions, mind you. Then it may (or may not) decide to inline some (or all) of them into the others.

                                        Read and abide by the Qt Code of Conduct

                                        1 Reply Last reply
                                        3
                                        • Kent-DorfmanK Offline
                                          Kent-DorfmanK Offline
                                          Kent-Dorfman
                                          wrote on last edited by
                                          #30

                                          I'm surprised no one bit on pointer access of the array elements instead of using array indexing throughout. Pointer access is usually faster than array indexing, but that is an implementation specific detail. I'd also replace pow() with pure multiplication of the terms as (x*x),

                                          1 Reply Last reply
                                          1

                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • Users
                                          • Groups
                                          • Search
                                          • Get Qt Extensions
                                          • Unsolved