Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Will the function unicode() always return the same address in the lifetime of a QString object?
Forum Updated to NodeBB v4.3 + New Features

Will the function unicode() always return the same address in the lifetime of a QString object?

Scheduled Pinned Locked Moved Solved General and Desktop
8 Posts 4 Posters 1.0k Views 3 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • O Offline
    O Offline
    okellogg
    wrote on last edited by
    #1

    Hi,

    If we define an object of type QString, is it safe to do pointer calculations using the return value from QString::unicode()?

    As an example, see
    https://github.com/KDE/kdevelop/blob/3.5/lib/cppparser/lexer.h#245

     inline const CHARTYPE* offset( int offset ) const {
        return m_source.unicode() + offset;
    }
    
    inline int getOffset( const QChar* p ) const {
        return int(p - (m_source.unicode()));
    }
    

    Let's say that usrPtr is initially assigned from m_source.unicode(), then lots of operations are done on m_source, and later m_source.getOffset(usrPtr) is called. Will the value returned still be "compatible" with usrPtr? Or is it possible that a different buffer (i.e. different start address) may be returned?

    jsulmJ 1 Reply Last reply
    0
    • O okellogg

      @jeremy_k Thanks for your reply.
      I had written

      [...] is it possible that a different buffer (i.e. different start address) may be returned?
      

      I found that on recent Qt versions, two calls to unicode() on the same variable, one near the start of lifetime and another after many string manipulations such as "insert", may in fact return different buffers.
      This change appears to have happened in some version after Qt 5.9 - I had tested with up to Qt 5.9 and did not have this problem, even going back to Qt4.

      jeremy_kJ Offline
      jeremy_kJ Offline
      jeremy_k
      wrote on last edited by
      #5

      @okellogg said in Will the function unicode() always return the same address in the lifetime of a QString object?:

      I found that on recent Qt versions, two calls to unicode() on the same variable, one near the start of lifetime and another after many string manipulations such as "insert", may in fact return different buffers.
      This change appears to have happened in some version after Qt 5.9 - I had tested with up to Qt 5.9 and did not have this problem, even going back to Qt4.

      While changes in the implementation (and operating system, malloc implementation, other memory allocations in the process, etc) may alter when two calls to QString::unicode() separated by a string modification return different pointers, the possibility is not new.

      • Qt 5.5 QString::unicode()
      • Qt 4.7 QString::unicode()
      • Qt 3.3 QString::unicode()
      • Qt 2.3 QString::unicode()

      They all say the same thing: The result remains valid until the string is modified.
      Code that fails to take this into account risks encountering C++ undefined behavior.

      Approaching this from an implementation standpoint, this should not be a surprise to anybody familiar with realloc() or memory management in general.

      Asking a question about code? http://eel.is/iso-c++/testcase/

      1 Reply Last reply
      5
      • jeremy_kJ Offline
        jeremy_kJ Offline
        jeremy_k
        wrote on last edited by
        #2

        From the documentation https://doc.qt.io/qt-6/qstring.html#unicode:

        The result remains valid until the string is modified.

        If the operations only involve const member functions, storing the pointer is fine. Otherwise, there is no guarantee. It's worth noting that there are some functions, such as operator[], that have both const and non-const versions.

        Asking a question about code? http://eel.is/iso-c++/testcase/

        O 1 Reply Last reply
        5
        • O okellogg

          Hi,

          If we define an object of type QString, is it safe to do pointer calculations using the return value from QString::unicode()?

          As an example, see
          https://github.com/KDE/kdevelop/blob/3.5/lib/cppparser/lexer.h#245

           inline const CHARTYPE* offset( int offset ) const {
              return m_source.unicode() + offset;
          }
          
          inline int getOffset( const QChar* p ) const {
              return int(p - (m_source.unicode()));
          }
          

          Let's say that usrPtr is initially assigned from m_source.unicode(), then lots of operations are done on m_source, and later m_source.getOffset(usrPtr) is called. Will the value returned still be "compatible" with usrPtr? Or is it possible that a different buffer (i.e. different start address) may be returned?

          jsulmJ Offline
          jsulmJ Offline
          jsulm
          Lifetime Qt Champion
          wrote on last edited by
          #3
          This post is deleted!
          1 Reply Last reply
          0
          • jeremy_kJ jeremy_k

            From the documentation https://doc.qt.io/qt-6/qstring.html#unicode:

            The result remains valid until the string is modified.

            If the operations only involve const member functions, storing the pointer is fine. Otherwise, there is no guarantee. It's worth noting that there are some functions, such as operator[], that have both const and non-const versions.

            O Offline
            O Offline
            okellogg
            wrote on last edited by
            #4

            @jeremy_k Thanks for your reply.
            I had written

            [...] is it possible that a different buffer (i.e. different start address) may be returned?
            

            I found that on recent Qt versions, two calls to unicode() on the same variable, one near the start of lifetime and another after many string manipulations such as "insert", may in fact return different buffers.
            This change appears to have happened in some version after Qt 5.9 - I had tested with up to Qt 5.9 and did not have this problem, even going back to Qt4.

            jeremy_kJ 1 Reply Last reply
            0
            • O okellogg

              @jeremy_k Thanks for your reply.
              I had written

              [...] is it possible that a different buffer (i.e. different start address) may be returned?
              

              I found that on recent Qt versions, two calls to unicode() on the same variable, one near the start of lifetime and another after many string manipulations such as "insert", may in fact return different buffers.
              This change appears to have happened in some version after Qt 5.9 - I had tested with up to Qt 5.9 and did not have this problem, even going back to Qt4.

              jeremy_kJ Offline
              jeremy_kJ Offline
              jeremy_k
              wrote on last edited by
              #5

              @okellogg said in Will the function unicode() always return the same address in the lifetime of a QString object?:

              I found that on recent Qt versions, two calls to unicode() on the same variable, one near the start of lifetime and another after many string manipulations such as "insert", may in fact return different buffers.
              This change appears to have happened in some version after Qt 5.9 - I had tested with up to Qt 5.9 and did not have this problem, even going back to Qt4.

              While changes in the implementation (and operating system, malloc implementation, other memory allocations in the process, etc) may alter when two calls to QString::unicode() separated by a string modification return different pointers, the possibility is not new.

              • Qt 5.5 QString::unicode()
              • Qt 4.7 QString::unicode()
              • Qt 3.3 QString::unicode()
              • Qt 2.3 QString::unicode()

              They all say the same thing: The result remains valid until the string is modified.
              Code that fails to take this into account risks encountering C++ undefined behavior.

              Approaching this from an implementation standpoint, this should not be a surprise to anybody familiar with realloc() or memory management in general.

              Asking a question about code? http://eel.is/iso-c++/testcase/

              1 Reply Last reply
              5
              • O Offline
                O Offline
                okellogg
                wrote on last edited by
                #6

                Thanks again @jeremy_k for your explanations, they prompted me to make the changes in commit 3041141.

                kshegunovK 1 Reply Last reply
                0
                • O okellogg

                  Thanks again @jeremy_k for your explanations, they prompted me to make the changes in commit 3041141.

                  kshegunovK Offline
                  kshegunovK Offline
                  kshegunov
                  Moderators
                  wrote on last edited by
                  #7

                  Out of curiosity, any specific reason you're trying to patch up a version that's 12 years old?
                  I'd have gone with the upstream if I were you, and just by glancing through that original piece of code, it looks very fragile ...

                  On an unrelated note, I'd have kept the tokens as a mirror of the original text and would have updated them/changed them based on the input coming in. Keeping a string that gets modified mid-way is a perfect recipe to hit all kinds of nastiness ...

                  Read and abide by the Qt Code of Conduct

                  O 1 Reply Last reply
                  0
                  • kshegunovK kshegunov

                    Out of curiosity, any specific reason you're trying to patch up a version that's 12 years old?
                    I'd have gone with the upstream if I were you, and just by glancing through that original piece of code, it looks very fragile ...

                    On an unrelated note, I'd have kept the tokens as a mirror of the original text and would have updated them/changed them based on the input coming in. Keeping a string that gets modified mid-way is a perfect recipe to hit all kinds of nastiness ...

                    O Offline
                    O Offline
                    okellogg
                    wrote on last edited by
                    #8

                    @kshegunov for discussion see https://bugs.kde.org/show_bug.cgi?id=338649#c6
                    (have not yet found the time for analyzing whether the kdevelop clang plugin could be usable for umbrello)

                    1 Reply Last reply
                    1

                    • Login

                    • Login or register to search.
                    • First post
                      Last post
                    0
                    • Categories
                    • Recent
                    • Tags
                    • Popular
                    • Users
                    • Groups
                    • Search
                    • Get Qt Extensions
                    • Unsolved