I think the compiler does a fairly good job at inlining on its own, since doing it explicitly results in nothing, either that, or the method is too complex to inline.
bq. In high performance code, appending to a string in the inner loop is insane.
I reserve plenty of space to ensure I avoid potential reallocation.
What I found to be odd is that passing a QChar reference is actually faster than passing a copy, even thou I am using a 64bit build and QChar is only 16bit, worst case scenario performance should be the same but it takes a small hit when passing by value.
I compared this algorithm to the performance of word count in MS Word. Keeping in mind word only counts characters and words and does not even go into the details of unique words and word usage count, analyzing the same text in MS Word takes about 4-5 seconds, for my algorithm it takes only 230 msec, and I still have the option to make it multithreaded.
But hey, I am pretty happy with having "ONLY" 25x times better performance than a top tier professional text editor that doesn't even do as much work :)