Solved Speed Optimization of C++ console application
-
@beecksche said in Speed Optimization of C++ console application:
QMAKE_CXXFLAGS_RELEASE += -fp:fast
Don't use this unless you really, really, really (and I can't emphasize that enough) know what you're doing (which is almost never). This can break promises made by the IEEE FP standard in regards to behavior and optimize out expressions that are not to be optimized. It can break proper rounding and error propagation, and floating point exceptions' diagnostics.
-
A code slowdown of a factor of 10 wouldn't be normal just with an optimization flag of -O0 vs -O2. Something else is going on here. The OP only states C++ in VS2017...No mention of CLR or native code generation in VS. Actually I'd expect the converse of the reported behaviour, where the native C++ QT runs faster if the VS C++ code is done as CLR and not native. If I had to WAG, I'd guess that the VS code is taking advantage of a .net library optimization that isn't present in native C++ QT. Without seeing the algorithms and the library links it's hard to know what exactly is going on. Heap managed memory could also play a large part in the time differences being reported.
-
@Kent-Dorfman said in Speed Optimization of C++ console application:
A code slowdown of a factor of 10 wouldn't be normal just with an optimization flag of -O0 vs -O2.
Why not? Did you see the code? Maybe there are lots of asserts in there or other stuff... without code it's just wild guessing.
-
@Kent-Dorfman said in Speed Optimization of C++ console application:
A code slowdown of a factor of 10 wouldn't be normal just with an optimization flag of -O0 vs -O2.
Actually it can be pretty normal. I've at least two rather small codebases that exhibit such speedups between debug and release (i.e.
-g -O0
vs-O2
). There's nothing odd about it because debug mode represents what you wrote faithfully, which isn't at all true for release builds.Something else is going on here.
Not necessarily. Depends on the type of code. If you have code with a lot of templates for example the debug build is going to put a
call
instruction on every function call and do the regularpush
,pop
on the stack. When the optimizer runs almost, to all, of this gets stripped down and the code is inlined, to an extreme degree. So yes, 10 time speedup between debug and release is nothing to be suspicious about. -
@dooley
What the others are saying about optimization vs debug is probably correct, you can be surprised by how much difference it can make depending.However, if you are sure about your compiler flags etc. but are still stumped by speed behaviour, it may be time to compile/link for profiling your application. Both
gcc
&msvc
have profiling (unless the free msvc does not, I don't know). This does take a bit of reading first time to set up and interpret output, but well worth it if you wish to investigate speed/performance over time in future. -
@kshegunov I wrote absolutely nothing about "-g". I still maintain that simple -O0 vs -O2 is NOT going to divide performance by a factor of 10. I cannot begin to imagine how badly a person would have to design their algorithm to validate that level of performance hit. something other than compiler optimization is causing his hit...
-
@Kent-Dorfman said in Speed Optimization of C++ console application:
@kshegunov I wrote absolutely nothing about "-g".
Fair enough.
I still maintain that simple -O0 vs -O2 is NOT going to divide performance by a factor of 10. I cannot begin to imagine how badly a person would have to design their algorithm to validate that level of performance hit.
https://bitbucket.org/kshegunov/ans-utilities/src/master/hermite/
Knock yourself out, if you so desire. I'm certainly not investing the time to see if
-g
makes a significant difference, which I strongly suspect it doesn't. -
@Kent-Dorfman said in Speed Optimization of C++ console application:
@kshegunov I wrote absolutely nothing about "-g". I still maintain that simple -O0 vs -O2 is NOT going to divide performance by a factor of 10. I cannot begin to imagine how badly a person would have to design their algorithm to validate that level of performance hit. something other than compiler optimization is causing his hit...
That strongly depends on the algorightm, I'd say.
Just imagine, a non optimized build that does not fit in the cache, so the CPU has to re-load stuff from memory all the time vs. the optimized build that runs fluently.
Factor 10 is probably not the normal case where you have to wait for I/O anyway, but for heavy computing it is easily possible.
Regards
-
This is starting to sound like a coding challenge. Can you write an algorithm that is slow the compiler can optimize and make fast? Like turning lead to gold.
-
@fcarney said in Speed Optimization of C++ console application:
Can you write an algorithm that is slow the compiler can optimize and make fast? Like turning lead to gold.
As I wrote, any template nonsense you have (the deeper and nastier the better) fits into this category.
-
Eh, the OP kind of disappeared so I guess it isn't that important to him. I'm more interested in knowing whether the windoze version in this exercise was compiled to CLR bytecode, which he never answered, and which IMHO invalidates any real comparison.