Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Cache locality and QString
Forum Updated to NodeBB v4.3 + New Features

Cache locality and QString

Scheduled Pinned Locked Moved Unsolved General and Desktop
15 Posts 6 Posters 774 Views 3 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    A Offline
    AndyBrice
    wrote on 11 Jun 2025, 16:13 last edited by
    #1

    I have a data table, which is effectively a QList< QList< QString > >, where is row of data is QList< QString >. I am looking into ways to speed up writing to and reading rows of data. What I ideally need is for the data for consecutive QStrings in a list to have cache locality. So that fetching the data for 2 adjacent QStrings would mean the second QString's data stands a good chance of already being in cache after the first string is read.

    I guess I could use something like QString::fromRawData() and manage all the memory myself. But I lose the benefits of reference counting and it looks quite ugly and error prone.

    Better might to be have some sort of custom memory allocator. But I don't see any way to do that.

    Has anyone tried to do something like this?

    1 Reply Last reply
    0
    • K Offline
      K Offline
      Kent-Dorfman
      wrote on 14 Jun 2025, 07:35 last edited by
      #2

      I'm not aware of any QT containers providing for custom allocators in the way that the STL containers do. Maybe someone else knows better...however, if you're really to the point of considering a custom allocator solution then I'd strongly suggest takign a step back and reconsidering your overall design. custom heap management is complex and would probably just end up being a bandaid on a larger algorithmic problem. I think to get any real useful assistance you'd need to explain in detail about your data table: expected size, how it is populated and how it is accessed, etc.

      1 Reply Last reply
      3
      • C Offline
        C Offline
        ChrisW67
        wrote on 15 Jun 2025, 04:21 last edited by
        #3

        QString is an implicitly shared class, which is achieved through a private object held as a pointer in the QString. Even if the QString objects were consecutive in memory, in general the string data they contain is heap-allocated elsewhere.

        If you have a large number of QStrings of a small number of shared variants you may get a degree caching efficiency where the shared data blocks are cached.

        A 1 Reply Last reply 15 Jun 2025, 15:04
        3
        • C ChrisW67
          15 Jun 2025, 04:21

          QString is an implicitly shared class, which is achieved through a private object held as a pointer in the QString. Even if the QString objects were consecutive in memory, in general the string data they contain is heap-allocated elsewhere.

          If you have a large number of QStrings of a small number of shared variants you may get a degree caching efficiency where the shared data blocks are cached.

          A Offline
          A Offline
          AndyBrice
          wrote on 15 Jun 2025, 15:04 last edited by
          #4

          @ChrisW67 said in Cache locality and QString:

          QString is an implicitly shared class, which is achieved through a private object held as a pointer in the QString. Even if the QString objects were consecutive in memory, in general the string data they contain is heap-allocated elsewhere.

          Yes, I understand that.

          @ChrisW67 said in Cache locality and QString:

          If you have a large number of QStrings of a small number of shared variants you may get a degree caching efficiency where the shared data blocks are cached.

          I try to do that.

          I am just investigating whether there is any way to be more efficient about allocating the memory for QStrings (and the data they reference) likely to be read at the same time, for less memory allocations and better cache locality.

          I guess you could allocate a row as a single QString (rather than a QList< QString> ) and then make each cell a QStringView into the QString. But that doesn't work well if you want to change one of the QStrings.

          1 Reply Last reply
          0
          • S Offline
            S Offline
            SGaist
            Lifetime Qt Champion
            wrote on 15 Jun 2025, 19:39 last edited by
            #5

            Hi,

            Another point: how big are your strings ?
            I remember work from a couple of years ago regarding SSO (Small String Optimization) that would allow not to allocate extra data on the heap but I am currently failing to find the references.

            Interested in AI ? www.idiap.ch
            Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

            A 1 Reply Last reply 16 Jun 2025, 15:32
            0
            • G Offline
              G Offline
              GrecKo
              Qt Champions 2018
              wrote on 15 Jun 2025, 22:34 last edited by
              #6

              Isn't cache locality while still being able to write to your strings a bit contradictory?
              Unless you won't resize the strings (or have a larger than needed capacity buffer for each).

              A 1 Reply Last reply 16 Jun 2025, 15:35
              0
              • S SGaist
                15 Jun 2025, 19:39

                Hi,

                Another point: how big are your strings ?
                I remember work from a couple of years ago regarding SSO (Small String Optimization) that would allow not to allocate extra data on the heap but I am currently failing to find the references.

                A Offline
                A Offline
                AndyBrice
                wrote on 16 Jun 2025, 15:32 last edited by
                #7

                @SGaist said in Cache locality and QString:

                Another point: how big are your strings ?

                Mostly 30 characters or less. But a table might have millions of rows and thousands of columns.

                1 Reply Last reply
                0
                • G GrecKo
                  15 Jun 2025, 22:34

                  Isn't cache locality while still being able to write to your strings a bit contradictory?
                  Unless you won't resize the strings (or have a larger than needed capacity buffer for each).

                  A Offline
                  A Offline
                  AndyBrice
                  wrote on 16 Jun 2025, 15:35 last edited by
                  #8

                  @GrecKo said in Cache locality and QString:

                  Isn't cache locality while still being able to write to your strings a bit contradictory?

                  Yes, a bit. ;0)

                  Sometimes you only want to read values from a data table, in which case cache locality is important. Other times you need to modify arbitrary values in the table, in which case cache locality is not really an issue.

                  1 Reply Last reply
                  0
                  • S Offline
                    S Offline
                    SGaist
                    Lifetime Qt Champion
                    wrote on 16 Jun 2025, 18:14 last edited by
                    #9

                    Out of curiosity, what kind of data are you handling to have that much rows/cols ?

                    Interested in AI ? www.idiap.ch
                    Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

                    1 Reply Last reply
                    0
                    • A Offline
                      A Offline
                      AndyBrice
                      wrote on 16 Jun 2025, 19:35 last edited by
                      #10

                      It is data wrangling software for re-shaping, re-formatting, merging, cleaning etc Excel, CSV, JSON, XML etc:

                      https://www.easydatatransform.com/

                      It it pretty fast already. But a bit of extra performance never hurts. ;0)

                      1 Reply Last reply
                      0
                      • K Offline
                        K Offline
                        Kent-Dorfman
                        wrote on 16 Jun 2025, 22:58 last edited by Kent-Dorfman
                        #11

                        So essentially a two-dimensional dynamically sized array of strings?

                        Were I in your shoes, and having known the relatively short, but maximum allowable, length of the strings then I would opt for a vector of fixed sized char[] entries. That way you ARE allowing for cache hits on row-major adjacent columns of data. The second you go for a dynamically allocated string resource you throw away any guarantees of cache availability of adjacent cells.

                        A 1 Reply Last reply 17 Jun 2025, 08:23
                        1
                        • K Kent-Dorfman
                          16 Jun 2025, 22:58

                          So essentially a two-dimensional dynamically sized array of strings?

                          Were I in your shoes, and having known the relatively short, but maximum allowable, length of the strings then I would opt for a vector of fixed sized char[] entries. That way you ARE allowing for cache hits on row-major adjacent columns of data. The second you go for a dynamically allocated string resource you throw away any guarantees of cache availability of adjacent cells.

                          A Offline
                          A Offline
                          AndyBrice
                          wrote on 17 Jun 2025, 08:23 last edited by
                          #12

                          @Kent-Dorfman said in Cache locality and QString:

                          So essentially a two-dimensional dynamically sized array of strings?

                          Yes.

                          @Kent-Dorfman said in Cache locality and QString:

                          I would opt for a vector of fixed sized char[] entries.

                          I am reading in CSV files, Excel file etc, so the strings can be any length at all.

                          I can scan the entire file to look for the longest string. But that comes with it's own issues.

                          @Kent-Dorfman said in Cache locality and QString:

                          The second you go for a dynamically allocated string resource you throw away any guarantees of cache availability of adjacent cells.

                          Agreed. But even getting SOME cache hits would improve performance.

                          Also, if you are creating a million QString s in one go, it seems a bit inefficient to do a million separate memory allocations (assuming that is what QString does).

                          JonBJ 1 Reply Last reply 17 Jun 2025, 08:47
                          0
                          • A AndyBrice
                            17 Jun 2025, 08:23

                            @Kent-Dorfman said in Cache locality and QString:

                            So essentially a two-dimensional dynamically sized array of strings?

                            Yes.

                            @Kent-Dorfman said in Cache locality and QString:

                            I would opt for a vector of fixed sized char[] entries.

                            I am reading in CSV files, Excel file etc, so the strings can be any length at all.

                            I can scan the entire file to look for the longest string. But that comes with it's own issues.

                            @Kent-Dorfman said in Cache locality and QString:

                            The second you go for a dynamically allocated string resource you throw away any guarantees of cache availability of adjacent cells.

                            Agreed. But even getting SOME cache hits would improve performance.

                            Also, if you are creating a million QString s in one go, it seems a bit inefficient to do a million separate memory allocations (assuming that is what QString does).

                            JonBJ Offline
                            JonBJ Offline
                            JonB
                            wrote on 17 Jun 2025, 08:47 last edited by JonB
                            #13

                            @AndyBrice
                            Then you really need to do your own "memory allocation". Of course separate memory allocations for many QStrings will not (guaranteed) lead to some huge contiguous memory layout. Nor do I know of any other memory allocator which would guarantee to lay out many separate allocation of variable lengths as consecutive.

                            I really wonder just how much real-time improvement you would see even if the memory was contiguous? You would need to try your own memory allocation to compare how much difference it really makes in practice, with everything else going on in your code.

                            1 Reply Last reply
                            0
                            • A Offline
                              A Offline
                              AndyBrice
                              wrote on 17 Jun 2025, 10:22 last edited by
                              #14

                              Ok, thanks for the feedback. It looks like there is no straightforward ways to improve performance, while keeping the flexibility I need.

                              K 1 Reply Last reply 17 Jun 2025, 19:18
                              0
                              • A AndyBrice
                                17 Jun 2025, 10:22

                                Ok, thanks for the feedback. It looks like there is no straightforward ways to improve performance, while keeping the flexibility I need.

                                K Offline
                                K Offline
                                Kent-Dorfman
                                wrote on 17 Jun 2025, 19:18 last edited by
                                #15

                                @AndyBrice said in Cache locality and QString:

                                Ok, thanks for the feedback. It looks like there is no straightforward ways to improve performance, while keeping the flexibility I need.

                                Correct. The optimization will come at a cost of working only when on a predictable subset of real world data. Because you've stated that you need a generalizaed solution, the optmization tricks wont work reliably.

                                If you can assign hard limitations to your dataset...THEN you can consider what kinds of optimiztions make sense.

                                1 Reply Last reply
                                0

                                1/15

                                11 Jun 2025, 16:13

                                • Login

                                • Login or register to search.
                                1 out of 15
                                • First post
                                  1/15
                                  Last post
                                0
                                • Categories
                                • Recent
                                • Tags
                                • Popular
                                • Users
                                • Groups
                                • Search
                                • Get Qt Extensions
                                • Unsolved