Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Cache locality and QString
Forum Updated to NodeBB v4.3 + New Features

Cache locality and QString

Scheduled Pinned Locked Moved Unsolved General and Desktop
15 Posts 6 Posters 314 Views 2 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • GrecKoG Offline
    GrecKoG Offline
    GrecKo
    Qt Champions 2018
    wrote last edited by
    #6

    Isn't cache locality while still being able to write to your strings a bit contradictory?
    Unless you won't resize the strings (or have a larger than needed capacity buffer for each).

    AndyBriceA 1 Reply Last reply
    0
    • SGaistS SGaist

      Hi,

      Another point: how big are your strings ?
      I remember work from a couple of years ago regarding SSO (Small String Optimization) that would allow not to allocate extra data on the heap but I am currently failing to find the references.

      AndyBriceA Offline
      AndyBriceA Offline
      AndyBrice
      wrote last edited by
      #7

      @SGaist said in Cache locality and QString:

      Another point: how big are your strings ?

      Mostly 30 characters or less. But a table might have millions of rows and thousands of columns.

      1 Reply Last reply
      0
      • GrecKoG GrecKo

        Isn't cache locality while still being able to write to your strings a bit contradictory?
        Unless you won't resize the strings (or have a larger than needed capacity buffer for each).

        AndyBriceA Offline
        AndyBriceA Offline
        AndyBrice
        wrote last edited by
        #8

        @GrecKo said in Cache locality and QString:

        Isn't cache locality while still being able to write to your strings a bit contradictory?

        Yes, a bit. ;0)

        Sometimes you only want to read values from a data table, in which case cache locality is important. Other times you need to modify arbitrary values in the table, in which case cache locality is not really an issue.

        1 Reply Last reply
        0
        • SGaistS Offline
          SGaistS Offline
          SGaist
          Lifetime Qt Champion
          wrote last edited by
          #9

          Out of curiosity, what kind of data are you handling to have that much rows/cols ?

          Interested in AI ? www.idiap.ch
          Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

          1 Reply Last reply
          0
          • AndyBriceA Offline
            AndyBriceA Offline
            AndyBrice
            wrote last edited by
            #10

            It is data wrangling software for re-shaping, re-formatting, merging, cleaning etc Excel, CSV, JSON, XML etc:

            https://www.easydatatransform.com/

            It it pretty fast already. But a bit of extra performance never hurts. ;0)

            1 Reply Last reply
            0
            • Kent-DorfmanK Offline
              Kent-DorfmanK Offline
              Kent-Dorfman
              wrote last edited by Kent-Dorfman
              #11

              So essentially a two-dimensional dynamically sized array of strings?

              Were I in your shoes, and having known the relatively short, but maximum allowable, length of the strings then I would opt for a vector of fixed sized char[] entries. That way you ARE allowing for cache hits on row-major adjacent columns of data. The second you go for a dynamically allocated string resource you throw away any guarantees of cache availability of adjacent cells.

              AndyBriceA 1 Reply Last reply
              1
              • Kent-DorfmanK Kent-Dorfman

                So essentially a two-dimensional dynamically sized array of strings?

                Were I in your shoes, and having known the relatively short, but maximum allowable, length of the strings then I would opt for a vector of fixed sized char[] entries. That way you ARE allowing for cache hits on row-major adjacent columns of data. The second you go for a dynamically allocated string resource you throw away any guarantees of cache availability of adjacent cells.

                AndyBriceA Offline
                AndyBriceA Offline
                AndyBrice
                wrote last edited by
                #12

                @Kent-Dorfman said in Cache locality and QString:

                So essentially a two-dimensional dynamically sized array of strings?

                Yes.

                @Kent-Dorfman said in Cache locality and QString:

                I would opt for a vector of fixed sized char[] entries.

                I am reading in CSV files, Excel file etc, so the strings can be any length at all.

                I can scan the entire file to look for the longest string. But that comes with it's own issues.

                @Kent-Dorfman said in Cache locality and QString:

                The second you go for a dynamically allocated string resource you throw away any guarantees of cache availability of adjacent cells.

                Agreed. But even getting SOME cache hits would improve performance.

                Also, if you are creating a million QString s in one go, it seems a bit inefficient to do a million separate memory allocations (assuming that is what QString does).

                JonBJ 1 Reply Last reply
                0
                • AndyBriceA AndyBrice

                  @Kent-Dorfman said in Cache locality and QString:

                  So essentially a two-dimensional dynamically sized array of strings?

                  Yes.

                  @Kent-Dorfman said in Cache locality and QString:

                  I would opt for a vector of fixed sized char[] entries.

                  I am reading in CSV files, Excel file etc, so the strings can be any length at all.

                  I can scan the entire file to look for the longest string. But that comes with it's own issues.

                  @Kent-Dorfman said in Cache locality and QString:

                  The second you go for a dynamically allocated string resource you throw away any guarantees of cache availability of adjacent cells.

                  Agreed. But even getting SOME cache hits would improve performance.

                  Also, if you are creating a million QString s in one go, it seems a bit inefficient to do a million separate memory allocations (assuming that is what QString does).

                  JonBJ Offline
                  JonBJ Offline
                  JonB
                  wrote last edited by JonB
                  #13

                  @AndyBrice
                  Then you really need to do your own "memory allocation". Of course separate memory allocations for many QStrings will not (guaranteed) lead to some huge contiguous memory layout. Nor do I know of any other memory allocator which would guarantee to lay out many separate allocation of variable lengths as consecutive.

                  I really wonder just how much real-time improvement you would see even if the memory was contiguous? You would need to try your own memory allocation to compare how much difference it really makes in practice, with everything else going on in your code.

                  1 Reply Last reply
                  0
                  • AndyBriceA Offline
                    AndyBriceA Offline
                    AndyBrice
                    wrote last edited by
                    #14

                    Ok, thanks for the feedback. It looks like there is no straightforward ways to improve performance, while keeping the flexibility I need.

                    Kent-DorfmanK 1 Reply Last reply
                    0
                    • AndyBriceA AndyBrice

                      Ok, thanks for the feedback. It looks like there is no straightforward ways to improve performance, while keeping the flexibility I need.

                      Kent-DorfmanK Offline
                      Kent-DorfmanK Offline
                      Kent-Dorfman
                      wrote last edited by
                      #15

                      @AndyBrice said in Cache locality and QString:

                      Ok, thanks for the feedback. It looks like there is no straightforward ways to improve performance, while keeping the flexibility I need.

                      Correct. The optimization will come at a cost of working only when on a predictable subset of real world data. Because you've stated that you need a generalizaed solution, the optmization tricks wont work reliably.

                      If you can assign hard limitations to your dataset...THEN you can consider what kinds of optimiztions make sense.

                      1 Reply Last reply
                      0

                      • Login

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • Users
                      • Groups
                      • Search
                      • Get Qt Extensions
                      • Unsolved