Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Most performant byte reordering
Forum Updated to NodeBB v4.3 + New Features

Most performant byte reordering

Scheduled Pinned Locked Moved Solved General and Desktop
15 Posts 7 Posters 1.7k Views 2 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    jars121
    wrote on last edited by
    #1

    Hi,

    I receive SPI data from a remote processor which is acting as the SPI slave. The remote processor packs bytes into a buffer, which is shifted out via DMA. The SPI master initiates transfers with a word length of 32 bits. I'm having an issue whereby the received bytes are incorrectly ordered:

    //Array as sent via DMA from the remote SPI slave

    uint8_t remoteArray[256];
    for (uint8_t i = 0; i < 256; i++) {
        remoteArray[i] = i;
    }
    

    //Array as received on the SPI master

    for (uint8_t i = 0; i < 256; i++) {
        qDebug() << masterArray[i];
    }
    

    I expected the masterArray output to be 0, 1, 2, 3, 4, 5, 6, 7, 8, etc., but each group of 4 digits is backwards. I.e. 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8. I can reorder these in a loop, but I'd like to understand what's happened here, as well as the most performant way to ensure the data is correctly ordered.

    If I use an SPI transfer word length of 8 on the SPI master, the data is correctly ordered. However I need to use a word length of 32 bits for hardware-specific reasons.

    JonBJ 1 Reply Last reply
    0
    • J jars121

      Hi,

      I receive SPI data from a remote processor which is acting as the SPI slave. The remote processor packs bytes into a buffer, which is shifted out via DMA. The SPI master initiates transfers with a word length of 32 bits. I'm having an issue whereby the received bytes are incorrectly ordered:

      //Array as sent via DMA from the remote SPI slave

      uint8_t remoteArray[256];
      for (uint8_t i = 0; i < 256; i++) {
          remoteArray[i] = i;
      }
      

      //Array as received on the SPI master

      for (uint8_t i = 0; i < 256; i++) {
          qDebug() << masterArray[i];
      }
      

      I expected the masterArray output to be 0, 1, 2, 3, 4, 5, 6, 7, 8, etc., but each group of 4 digits is backwards. I.e. 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8. I can reorder these in a loop, but I'd like to understand what's happened here, as well as the most performant way to ensure the data is correctly ordered.

      If I use an SPI transfer word length of 8 on the SPI master, the data is correctly ordered. However I need to use a word length of 32 bits for hardware-specific reasons.

      JonBJ Offline
      JonBJ Offline
      JonB
      wrote on last edited by
      #2

      @jars121 said in Most performant byte reordering:

      The SPI master initiates transfers with a word length of 32 bits.

      I know nothing about "SPI", but from the code & output you show it looks like it is being sent as a 32-bit integer in reverse order?

      J 1 Reply Last reply
      0
      • Christian EhrlicherC Online
        Christian EhrlicherC Online
        Christian Ehrlicher
        Lifetime Qt Champion
        wrote on last edited by
        #3

        @jars121 said in Most performant byte reordering:

        as well as the most performant way to ensure the data is correctly ordered.

        It depends on your compiler how good it's compiled into bytecode. A simple loop like this is should be enough:

        void reverse(const char *in, char *out)
        {
            for (int i = 0; i < 256; i += 4) {
                const auto ofs = i * 4;
                out[ofs + 0] = in[ofs + 3];
                out[ofs + 1] = in[ofs + 2];
                out[ofs + 2] = in[ofs + 1];
                out[ofs + 3] = in[ofs + 0];
            }
        }
        

        Qt Online Installer direct download: https://download.qt.io/official_releases/online_installers/
        Visit the Qt Academy at https://academy.qt.io/catalog

        1 Reply Last reply
        0
        • Kent-DorfmanK Offline
          Kent-DorfmanK Offline
          Kent-Dorfman
          wrote on last edited by
          #4

          The first thing you need to do is to verify that your remote slave is in fact sending the bytes in the order that you think it is. It probably is not.

          1 Reply Last reply
          0
          • JonBJ JonB

            @jars121 said in Most performant byte reordering:

            The SPI master initiates transfers with a word length of 32 bits.

            I know nothing about "SPI", but from the code & output you show it looks like it is being sent as a 32-bit integer in reverse order?

            J Offline
            J Offline
            jars121
            wrote on last edited by
            #5

            @JonB said in Most performant byte reordering:

            @jars121 said in Most performant byte reordering:

            The SPI master initiates transfers with a word length of 32 bits.

            I know nothing about "SPI", but from the code & output you show it looks like it is being sent as a 32-bit integer in reverse order?

            Thanks for your input. It certainly appears that way, but as I'll detail in a response below, the data on the wire is in the correct order.

            @Christian-Ehrlicher said in Most performant byte reordering:

            @jars121 said in Most performant byte reordering:

            as well as the most performant way to ensure the data is correctly ordered.

            It depends on your compiler how good it's compiled into bytecode. A simple loop like this is should be enough:

            void reverse(const char *in, char *out)
            {
                for (int i = 0; i < 256; i += 4) {
                    const auto ofs = i * 4;
                    out[ofs + 0] = in[ofs + 3];
                    out[ofs + 1] = in[ofs + 2];
                    out[ofs + 2] = in[ofs + 1];
                    out[ofs + 3] = in[ofs + 0];
                }
            }
            

            Thanks for providing that! This is the loop approach I'd already tested which works perfectly well. I'm hoping to understand why the ordering issue is occurring so perhaps another approach could be explored.

            @Kent-Dorfman said in Most performant byte reordering:

            The first thing you need to do is to verify that your remote slave is in fact sending the bytes in the order that you think it is. It probably is not.

            I've checked the MISO line with my oscilloscope and can see that the data out of the slave is in the correct order. I.e. 0, 1, 2, 3, 4, 5, 6, 7, 8. This leads me to believe that the 32-bit SPI word length is the culprit here and is using a reverse byte order for some reason.

            1 Reply Last reply
            0
            • J Offline
              J Offline
              jars121
              wrote on last edited by
              #6

              I've come across the following, which is included in the description of the spi_transfer struct within the Linux kernel SPI driver:

              In-memory data values are always in native CPU byte order, translated from the wire byte order (big-endian except with SPI_LSB_FIRST)
              

              I've tried setting SPI_LSB_FIRST, but this has no impact (despite not returning an error), so it may be a hardware limitation.

              1 Reply Last reply
              0
              • S Offline
                S Offline
                SimonSchroeder
                wrote on last edited by
                #7

                The reason for this is most likely endianness: There is big endian and little endian. If one computer uses one and the second the other byte order transmission will reverse the byte order. In a simplified view Intel x86 was the only one doing little endian and everybody else was doing big endian. In the modern world ARM processors share little endian with x86, but can be switched to big endian.

                However, I cannot provide you with any short-cut solution as I don't know SPI either. I would expect that all processors have a way to do the byte swap efficiently (maybe some SSE on x86). https://stackoverflow.com/questions/105252/how-do-i-convert-between-big-endian-and-little-endian-values-in-c lists some built-in commands to do byte swaps with VS and GCC.

                1 Reply Last reply
                1
                • J Offline
                  J Offline
                  jars121
                  wrote on last edited by
                  #8

                  Thanks for your input everyone. This is definitely an Endianness issue. In the end I've packaged each 32-bit sequence in reverse order on the remote processor so the data is received and parsed correctly on the SPI Master. I had hoped there was a hardware configuration that would change the SPI parsing Endianness but it looks like there wasn't.

                  JonBJ artwawA 3 Replies Last reply
                  0
                  • J jars121

                    Thanks for your input everyone. This is definitely an Endianness issue. In the end I've packaged each 32-bit sequence in reverse order on the remote processor so the data is received and parsed correctly on the SPI Master. I had hoped there was a hardware configuration that would change the SPI parsing Endianness but it looks like there wasn't.

                    JonBJ Offline
                    JonBJ Offline
                    JonB
                    wrote on last edited by
                    #9

                    @jars121
                    Now that we are happy you do indeed need to swap the bytes/endianness, let's go back to your original question:

                    Most performant byte reordering

                    The algorithm @Christian-Ehrlicher showed you does indeed do the job, simply. But is it the most "performant"? I haven't looked at the code it generates, and I don't know how clever compiling optimized might make it.

                    But "byte swappers" have been around for a long time in C/C++. Presumably they can take advantage of machine code to be efficient. You don't say which platform/compiler you are on, but I note (for 32-bit) that MSVC has

                    unsigned long _byteswap_ulong(unsigned long value);
                    

                    and GCC has

                    uint32_t __builtin_bswap32 (uint32_t x)
                    

                    If you are going to do this a lot and really care about "performant" you might examine how these compare to your own code?! :)

                    1 Reply Last reply
                    0
                    • J jars121

                      Thanks for your input everyone. This is definitely an Endianness issue. In the end I've packaged each 32-bit sequence in reverse order on the remote processor so the data is received and parsed correctly on the SPI Master. I had hoped there was a hardware configuration that would change the SPI parsing Endianness but it looks like there wasn't.

                      artwawA Offline
                      artwawA Offline
                      artwaw
                      wrote on last edited by artwaw
                      #10

                      @jars121 if you know that your source uses certain endianness your can make use of this https://doc.qt.io/qt-5/qtendian.html#details and save yourself trouble?

                      For more information please re-read.

                      Kind Regards,
                      Artur

                      1 Reply Last reply
                      2
                      • J jars121

                        Thanks for your input everyone. This is definitely an Endianness issue. In the end I've packaged each 32-bit sequence in reverse order on the remote processor so the data is received and parsed correctly on the SPI Master. I had hoped there was a hardware configuration that would change the SPI parsing Endianness but it looks like there wasn't.

                        JonBJ Offline
                        JonBJ Offline
                        JonB
                        wrote on last edited by
                        #11

                        @jars121
                        From @artwaw's link, one of the qFromLittleEndian/qFromBigEndian() looks like it will do your swapping, and only if necessary on platform. Whether it does it efficiently I don't know because I didn't look at its definition....

                        1 Reply Last reply
                        1
                        • kkoehneK Offline
                          kkoehneK Offline
                          kkoehne
                          Moderators
                          wrote on last edited by
                          #12

                          @JonB said in Most performant byte reordering:

                          Whether it does it efficiently I don't know because I didn't look at its definition....

                          It's not hard to find the definition though ...

                          https://code.woboq.org/qt5/qtbase/src/corelib/global/qendian.cpp.html

                          You see that there's special ifdef's for SSSE3, AVX2, and SSE2 . An because Thiago (the Qt Core maintainer) works for Intel, I think it's most likely it's rather optimized at least on the x86/x64 architectures ;)

                          Director R&D, The Qt Company

                          JonBJ S 2 Replies Last reply
                          1
                          • kkoehneK kkoehne

                            @JonB said in Most performant byte reordering:

                            Whether it does it efficiently I don't know because I didn't look at its definition....

                            It's not hard to find the definition though ...

                            https://code.woboq.org/qt5/qtbase/src/corelib/global/qendian.cpp.html

                            You see that there's special ifdef's for SSSE3, AVX2, and SSE2 . An because Thiago (the Qt Core maintainer) works for Intel, I think it's most likely it's rather optimized at least on the x86/x64 architectures ;)

                            JonBJ Offline
                            JonBJ Offline
                            JonB
                            wrote on last edited by JonB
                            #13

                            @kkoehne
                            Indeed. And the fact that there are "special calls" in code looks promising. But knowing when those code cases apply and whether they are "performant" compared to one's own C++ loop is beyond me! Hence left as an exercise to the reader ;-)

                            1 Reply Last reply
                            1
                            • Christian EhrlicherC Online
                              Christian EhrlicherC Online
                              Christian Ehrlicher
                              Lifetime Qt Champion
                              wrote on last edited by
                              #14

                              You're aware that we're talking about 256 bytes here? How high is the data rate that we have to discuss about if simd instructions are really needed? Measure before use!

                              Qt Online Installer direct download: https://download.qt.io/official_releases/online_installers/
                              Visit the Qt Academy at https://academy.qt.io/catalog

                              1 Reply Last reply
                              3
                              • kkoehneK kkoehne

                                @JonB said in Most performant byte reordering:

                                Whether it does it efficiently I don't know because I didn't look at its definition....

                                It's not hard to find the definition though ...

                                https://code.woboq.org/qt5/qtbase/src/corelib/global/qendian.cpp.html

                                You see that there's special ifdef's for SSSE3, AVX2, and SSE2 . An because Thiago (the Qt Core maintainer) works for Intel, I think it's most likely it's rather optimized at least on the x86/x64 architectures ;)

                                S Offline
                                S Offline
                                SimonSchroeder
                                wrote on last edited by
                                #15

                                @kkoehne said in Most performant byte reordering:

                                You see that there's special ifdef's for SSSE3, AVX2, and SSE2 . An because Thiago (the Qt Core maintainer) works for Intel, I think it's most likely it's rather optimized at least on the x86/x64 architectures ;)

                                There are #ifdefs distinguishing between different platforms. It is not decided at runtime which version you choose. I am not sure for which platform Qt is precompiled (and most people will use a precompiled version of Qt). Most definitely you will not get AVX2. For that extra bit of performance (if there is some) you would need to compile Qt yourself accordingly. And this totally depends on which processors (up to which age) you target.

                                Also note that the source code only has SIMD implementations for x86. For other processors, like ARM, there is no optimization.

                                1 Reply Last reply
                                0

                                • Login

                                • Login or register to search.
                                • First post
                                  Last post
                                0
                                • Categories
                                • Recent
                                • Tags
                                • Popular
                                • Users
                                • Groups
                                • Search
                                • Get Qt Extensions
                                • Unsolved