Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Most performant byte reordering
Forum Updated to NodeBB v4.3 + New Features

Most performant byte reordering

Scheduled Pinned Locked Moved Solved General and Desktop
15 Posts 7 Posters 1.8k Views 2 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Kent-DorfmanK Offline
    Kent-DorfmanK Offline
    Kent-Dorfman
    wrote on last edited by
    #4

    The first thing you need to do is to verify that your remote slave is in fact sending the bytes in the order that you think it is. It probably is not.

    I light my way forward with the fires of all the bridges I've burned behind me.

    1 Reply Last reply
    0
    • JonBJ JonB

      @jars121 said in Most performant byte reordering:

      The SPI master initiates transfers with a word length of 32 bits.

      I know nothing about "SPI", but from the code & output you show it looks like it is being sent as a 32-bit integer in reverse order?

      J Offline
      J Offline
      jars121
      wrote on last edited by
      #5

      @JonB said in Most performant byte reordering:

      @jars121 said in Most performant byte reordering:

      The SPI master initiates transfers with a word length of 32 bits.

      I know nothing about "SPI", but from the code & output you show it looks like it is being sent as a 32-bit integer in reverse order?

      Thanks for your input. It certainly appears that way, but as I'll detail in a response below, the data on the wire is in the correct order.

      @Christian-Ehrlicher said in Most performant byte reordering:

      @jars121 said in Most performant byte reordering:

      as well as the most performant way to ensure the data is correctly ordered.

      It depends on your compiler how good it's compiled into bytecode. A simple loop like this is should be enough:

      void reverse(const char *in, char *out)
      {
          for (int i = 0; i < 256; i += 4) {
              const auto ofs = i * 4;
              out[ofs + 0] = in[ofs + 3];
              out[ofs + 1] = in[ofs + 2];
              out[ofs + 2] = in[ofs + 1];
              out[ofs + 3] = in[ofs + 0];
          }
      }
      

      Thanks for providing that! This is the loop approach I'd already tested which works perfectly well. I'm hoping to understand why the ordering issue is occurring so perhaps another approach could be explored.

      @Kent-Dorfman said in Most performant byte reordering:

      The first thing you need to do is to verify that your remote slave is in fact sending the bytes in the order that you think it is. It probably is not.

      I've checked the MISO line with my oscilloscope and can see that the data out of the slave is in the correct order. I.e. 0, 1, 2, 3, 4, 5, 6, 7, 8. This leads me to believe that the 32-bit SPI word length is the culprit here and is using a reverse byte order for some reason.

      1 Reply Last reply
      0
      • J Offline
        J Offline
        jars121
        wrote on last edited by
        #6

        I've come across the following, which is included in the description of the spi_transfer struct within the Linux kernel SPI driver:

        In-memory data values are always in native CPU byte order, translated from the wire byte order (big-endian except with SPI_LSB_FIRST)
        

        I've tried setting SPI_LSB_FIRST, but this has no impact (despite not returning an error), so it may be a hardware limitation.

        1 Reply Last reply
        0
        • S Offline
          S Offline
          SimonSchroeder
          wrote on last edited by
          #7

          The reason for this is most likely endianness: There is big endian and little endian. If one computer uses one and the second the other byte order transmission will reverse the byte order. In a simplified view Intel x86 was the only one doing little endian and everybody else was doing big endian. In the modern world ARM processors share little endian with x86, but can be switched to big endian.

          However, I cannot provide you with any short-cut solution as I don't know SPI either. I would expect that all processors have a way to do the byte swap efficiently (maybe some SSE on x86). https://stackoverflow.com/questions/105252/how-do-i-convert-between-big-endian-and-little-endian-values-in-c lists some built-in commands to do byte swaps with VS and GCC.

          1 Reply Last reply
          1
          • J Offline
            J Offline
            jars121
            wrote on last edited by
            #8

            Thanks for your input everyone. This is definitely an Endianness issue. In the end I've packaged each 32-bit sequence in reverse order on the remote processor so the data is received and parsed correctly on the SPI Master. I had hoped there was a hardware configuration that would change the SPI parsing Endianness but it looks like there wasn't.

            JonBJ artwawA 3 Replies Last reply
            0
            • J jars121

              Thanks for your input everyone. This is definitely an Endianness issue. In the end I've packaged each 32-bit sequence in reverse order on the remote processor so the data is received and parsed correctly on the SPI Master. I had hoped there was a hardware configuration that would change the SPI parsing Endianness but it looks like there wasn't.

              JonBJ Offline
              JonBJ Offline
              JonB
              wrote on last edited by
              #9

              @jars121
              Now that we are happy you do indeed need to swap the bytes/endianness, let's go back to your original question:

              Most performant byte reordering

              The algorithm @Christian-Ehrlicher showed you does indeed do the job, simply. But is it the most "performant"? I haven't looked at the code it generates, and I don't know how clever compiling optimized might make it.

              But "byte swappers" have been around for a long time in C/C++. Presumably they can take advantage of machine code to be efficient. You don't say which platform/compiler you are on, but I note (for 32-bit) that MSVC has

              unsigned long _byteswap_ulong(unsigned long value);
              

              and GCC has

              uint32_t __builtin_bswap32 (uint32_t x)
              

              If you are going to do this a lot and really care about "performant" you might examine how these compare to your own code?! :)

              1 Reply Last reply
              0
              • J jars121

                Thanks for your input everyone. This is definitely an Endianness issue. In the end I've packaged each 32-bit sequence in reverse order on the remote processor so the data is received and parsed correctly on the SPI Master. I had hoped there was a hardware configuration that would change the SPI parsing Endianness but it looks like there wasn't.

                artwawA Offline
                artwawA Offline
                artwaw
                wrote on last edited by artwaw
                #10

                @jars121 if you know that your source uses certain endianness your can make use of this https://doc.qt.io/qt-5/qtendian.html#details and save yourself trouble?

                For more information please re-read.

                Kind Regards,
                Artur

                1 Reply Last reply
                2
                • J jars121

                  Thanks for your input everyone. This is definitely an Endianness issue. In the end I've packaged each 32-bit sequence in reverse order on the remote processor so the data is received and parsed correctly on the SPI Master. I had hoped there was a hardware configuration that would change the SPI parsing Endianness but it looks like there wasn't.

                  JonBJ Offline
                  JonBJ Offline
                  JonB
                  wrote on last edited by
                  #11

                  @jars121
                  From @artwaw's link, one of the qFromLittleEndian/qFromBigEndian() looks like it will do your swapping, and only if necessary on platform. Whether it does it efficiently I don't know because I didn't look at its definition....

                  1 Reply Last reply
                  1
                  • kkoehneK Offline
                    kkoehneK Offline
                    kkoehne
                    Moderators
                    wrote on last edited by
                    #12

                    @JonB said in Most performant byte reordering:

                    Whether it does it efficiently I don't know because I didn't look at its definition....

                    It's not hard to find the definition though ...

                    https://code.woboq.org/qt5/qtbase/src/corelib/global/qendian.cpp.html

                    You see that there's special ifdef's for SSSE3, AVX2, and SSE2 . An because Thiago (the Qt Core maintainer) works for Intel, I think it's most likely it's rather optimized at least on the x86/x64 architectures ;)

                    Director R&D, The Qt Company

                    JonBJ S 2 Replies Last reply
                    1
                    • kkoehneK kkoehne

                      @JonB said in Most performant byte reordering:

                      Whether it does it efficiently I don't know because I didn't look at its definition....

                      It's not hard to find the definition though ...

                      https://code.woboq.org/qt5/qtbase/src/corelib/global/qendian.cpp.html

                      You see that there's special ifdef's for SSSE3, AVX2, and SSE2 . An because Thiago (the Qt Core maintainer) works for Intel, I think it's most likely it's rather optimized at least on the x86/x64 architectures ;)

                      JonBJ Offline
                      JonBJ Offline
                      JonB
                      wrote on last edited by JonB
                      #13

                      @kkoehne
                      Indeed. And the fact that there are "special calls" in code looks promising. But knowing when those code cases apply and whether they are "performant" compared to one's own C++ loop is beyond me! Hence left as an exercise to the reader ;-)

                      1 Reply Last reply
                      1
                      • Christian EhrlicherC Offline
                        Christian EhrlicherC Offline
                        Christian Ehrlicher
                        Lifetime Qt Champion
                        wrote on last edited by
                        #14

                        You're aware that we're talking about 256 bytes here? How high is the data rate that we have to discuss about if simd instructions are really needed? Measure before use!

                        Qt Online Installer direct download: https://download.qt.io/official_releases/online_installers/
                        Visit the Qt Academy at https://academy.qt.io/catalog

                        1 Reply Last reply
                        3
                        • kkoehneK kkoehne

                          @JonB said in Most performant byte reordering:

                          Whether it does it efficiently I don't know because I didn't look at its definition....

                          It's not hard to find the definition though ...

                          https://code.woboq.org/qt5/qtbase/src/corelib/global/qendian.cpp.html

                          You see that there's special ifdef's for SSSE3, AVX2, and SSE2 . An because Thiago (the Qt Core maintainer) works for Intel, I think it's most likely it's rather optimized at least on the x86/x64 architectures ;)

                          S Offline
                          S Offline
                          SimonSchroeder
                          wrote on last edited by
                          #15

                          @kkoehne said in Most performant byte reordering:

                          You see that there's special ifdef's for SSSE3, AVX2, and SSE2 . An because Thiago (the Qt Core maintainer) works for Intel, I think it's most likely it's rather optimized at least on the x86/x64 architectures ;)

                          There are #ifdefs distinguishing between different platforms. It is not decided at runtime which version you choose. I am not sure for which platform Qt is precompiled (and most people will use a precompiled version of Qt). Most definitely you will not get AVX2. For that extra bit of performance (if there is some) you would need to compile Qt yourself accordingly. And this totally depends on which processors (up to which age) you target.

                          Also note that the source code only has SIMD implementations for x86. For other processors, like ARM, there is no optimization.

                          1 Reply Last reply
                          0

                          • Login

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • Users
                          • Groups
                          • Search
                          • Get Qt Extensions
                          • Unsolved