Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

How to debug a hard to find memory leak in PyQt



  • I have a PyQt application with a memory leak in it. It's an application for doing translations and it uses QThreads and signal/slots to do the translation on a separate thread and then signal a QTextEdit with the result when the translation finishes. The application works but it leaks memory with every translation. I haven't been able to reproduce the memory leak in a smaller code sample outside of the full application and my best guess for how it's occurring is some interaction between my threading logic, Python, Qt memory management, and underlying libraries I'm using.

    Through trial and error I've found that the leak only occurs when I do my translation inside of QThread.run. Just translating repeatedly doesn't leak memory, and I've failed to reproduce the bug by running something else inside of my QThread. For a while I thought the error had something to do with the fact that I'm passing the translation work in as a lambda function which might create a situation like this where the QThread doesn't get cleaned up by the Python garbage collector. This doesn't seem to be the issue because I can see that its del function is being called. I also tried deleting all of the signal/slot connections to the QThread and just printing the output which still leaks memory.

    I've also tried using tracemalloc and valgrind to debug which hasn't yielded any useful information. I think that maybe Qt/Python's memory management is potentially having a weird conflict with the memory management in my underlying Python libraries which themselves I don't think are fully written in Python.

    I also posted on Reddit about this.

    Any suggestions appreciated!


  • Lifetime Qt Champion

    Hi and welcome to devnet,

    Which version of Qt, PyQt are you using ?
    On which platform ?

    Would it be possible to share the code you are using ?



  • Hi,

    @SGaist Python 3.8.5, PyQt5==5.15.1, running on Ubuntu 20.04.1 LTS, here's a link to a version of the code that's leaking memory: https://github.com/argosopentech/argos-translate/tree/db9bac106c17330f0bfb3c708da1d7dc5a016dda .

    Thanks



  • I've been able to narrow the problem down to occurring when I create a CTranslate Translator inside of a PyQt QThread that is created from a QWidget. If I remove the CTranslate Translator and do something else that allocates a large amount of memory there is no leak. If I create the CTranslate Translator from the QWidget with no QThread there is also no leak. If I run the QThread outside of a QWidget there is no leak. The leak only happens with the combo of all three.

    My best guess is that there is some bug/me misusing in the combination of Python/Qt/CTranslate memory management. Python uses automatic reference counting memory management, while Qt in native C++ use C++ parent based memory management. On top of that CTranslate uses C++ extensions to Python so it seems like there are a lot of places where the problem could be appearing.

    I made an example script demonstrating the leak. To run it you need a CTranslate model and need to provide a path to it in the script. Here's a Google Drive link where you can download a package for my project that if extracted (its just a renamed .zip archive) has a CTranslate model at /model. When this script runs it leaks ~5GB of memory.

    I also posted this on the OpenNMT Forum.



  • @argosopentech
    Since this seems to be a PyQt5 issue, have you tried the folks over at https://riverbankcomputing.com/mailman/listinfo/pyqt, by joining that mailing list and asking your question? The author of PyQt5 is there. I don't know whether they will take the time to examine your code/issue, but it's where I would try.



  • @JonB Thanks for the suggestion I just emailed.



  • @argosopentech
    I have seen your post. I don't know how people there will react to you only referring to this post here --- they like the information in their own forum posts. If it were me I might post a follow-up (i.e. Reply to all) there yourself in which you include the first, second & third of your posts here above.



  • @JonB Thanks for the suggestion I just replied with more information.



  • @argosopentech
    I see that you have just received a reply from mailing list :) And that guy is the PyQt5 author, like I said, and probably knows exactly what he is talking about! :) He is not very chatty, just terse and to-the-point, you just have to try to act as best as you can on what he tells you.



  • @JonB thanks for the help, I'm looking at his suggestions now. Here's his response for anyone curious:

    I don't see how the above can be expected to work, no matter what the
    run() method is doing. Your WorkerThread objects are likely to be
    garbage collected before they are finished and the del won't protect
    the thread.

    Try making the GUIWindow the parent of the WorkerThreads and see if that
    makes a difference.

    Phil



  • My understanding of what he's saying is that because the QThread is being managed by Qt blocking in the Python __del__ function isn't going to protect your QThread from being deleted while you're still using it. This didn't seem to be a problem I was having, I've been getting memory leaks not crashes from the QThreads being prematurely deleted though maybe them getting cleaned up early is leading to other memory being leaked. I based my original code on a tutorial I found on PyQt QThreads that uses the __del__ function this way but it makes sense that it isn't a great way of doing things.

    I tried making the QMainWindow the parent of the QThread like he suggested but that didn't seem to fix the problem. I also connected the QThread's finished signal with its deleteLater slot in line with Qt's documentation's example of subclassing QThread.

    from PyQt5.QtWidgets import QMainWindow, QApplication
    from PyQt5.QtCore import QThread
    import ctranslate2
    
    class WorkerThread(QThread):
        def run(self):
            translator = ctranslate2.Translator('/path/to/ctranslate/model')
    
    class GUIWindow(QMainWindow):
        def translate(self):
            new_worker_thread = WorkerThread(self)
            new_worker_thread.finished.connect(new_worker_thread.deleteLater)
            new_worker_thread.start()
    
    app = QApplication([])
    main_window = GUIWindow()
    main_window.show()
    
    for i in range(120):
        print(i)
        main_window.translate()
    
    app.exec_()
    
    

    I'm going to look into his suggestion more but I'm not sure it solves the issue. The other example in the Qt documentation puts the work in a worker object and pass to a QThread using moveToThread so I'm going to try structuring the threading like that and see what happens.



  • @argosopentech said in How to debug a hard to find memory leak in PyQt:

    like he suggested but that didn't seem to fix the problem

    I suggest you go back with this fix this then, and ask him very politely to have a look and see if he can suggest anything else. Throw yourself at his mercy, nicely :) (But do try anything else you can think of from his suggestion before doing so.)


  • Lifetime Qt Champion

    Do you really need to create a new translator each time ?

    I am wondering whether you should reconsider your architecture.

    It looks like you could make use of QtConcurrent to manage translation tasks rather than doing your own thread management.



  • @JonB thanks that's what I was thinking too. I just wanted to post here first to make sure I wasn't missing something obvious or misunderstanding him before I sent another email.

    @SGaist This was something that was mentioned on the OpenNMT forum thread I started too. It seems like not making a new Translator every time would be good for performance too. I'm going to try this and I think it should at least drastically slow down my memory leak. However, since users can switch between translations some will still need to be garbage collected so I'd like to figure out what's causing this to leak.


  • Lifetime Qt Champion

    Switching to a different language should be part of your API rather than a constraint. That way you can reload or replace the translator when appropriate.



  • @SGaist I just added reusing the same CTranslate Translator, this seems to prevent the memory leak from becoming a problem. Right now I save every Translator that has been used and so I never have to create one more than once. This prevents them from being leaked but ideally I'd like to just save the one that's most recently been used so if someone does a large number of translations without restarting the application they don't have to all be kept in memory. My concern would be that if I did that whatever has been causing this memory leak would also cause the Translator objects to be leaked.

    Saving the Translators seems to mostly work around the problem but there does seem to be either something wrong with the way I was using PyQt/CTranslate in the example above or a bug in one of them.


Log in to reply