QTextDocument and Multithreading



  • Hi all!

    I'm currently working on an app that parse specific JSON files and generate rich text from them. The problem is that the JSON files can easily reach 5000+ lines of text, and so, the parsing of the classes takes a considerable time.

    To speed up the process I'm trying to use multithreading, but I'm having some difficulties...

    First I thought I could use multiple threads to create QTextBlocks containing the text, and then build the QTextDocument using them, but the QTextBlock objects are read only.

    I then tried to pre-insert empty QTextBlocks and then edit them in multiple threads, each thread being in charge of a range of blocks, but my application crashes with a lot of "Cannot create children for a parent in a different thread" messages.

    Then, I tried to use threads to create the lines already formatted using HTML. It worked, but I had problems with some formatting options in HTML (https://forum.qt.io/topic/64153/qtextcursor-and-css-problem-with-text-indent-property/13), which I can't solve since, apparently, it's a bug in the framework.

    So, my question this time is: Is there a way to do multithreading editing of a single QTextDocument object?


  • Qt Champions 2016

    @Tyras
    Hello again,
    So you've obviously discovered the hard way that you shouldn't thread a non-thread safe code. The joke aside I envision the following possible solution for your problem:

    You parse your JSON in your worker object(s) (I'm assuming the low-level API here) and emit a signal from it on each ready block. The signal should have something like this as a signature:

    class DocumentParserWorker : pubic QObject
    {
        Q_OBJECT
        
        // ...
        
    signals:
        void newBlockReady(QTextBlockFormat format, QTextCharFormat charFormat)
        
    public slots:
        void parseJson(...)
        {
            // ... Parse the json and build up the QTextBlockFormat and QTextCharFormat
            // When there's a new block to be inserted:
            emit newBlockReady(format, charFormat);
        }
    };
    

    So the "builder" would look similarly to this:

    class DocumentBuilder : public QObject
    {
        Q_OBJECT 
    
    public:
        DocumentBuilder(QTextDocument * doc)
            : document(doc)
        {
        }
    
    public slots:
        void addBlock(QTextBlockFormat format, QTextCharFormat charFormat)
        {
            if (!document)
                return;    // Someone deleted the document in the meantime
            
            // So we have everything we need to insert the block
            QTextCursor cursor(document);
            cursor.insertBlock(format, charFormat);
        }
    
    private:
        QPointer<QTextDocument> document;
    };
    

    You could then connect the signal to your main thread's builder object and insert the block as requested (you can treat other elements similarly):

    int main(int argc, char ** argv)
    {
        QApplication app(argc, argv);
        
        QTextEdit textEdit;
        DocumentBuilder builder(textEdit.document());
        
        QThread workerThread(&app);
        workerThread.start();
        
        DocumentParserWorker * worker = new DocumentParserWorker;
        worker->moveToThread(&thread);
        
        // Some standard connects for the threading
        QObject::connect(&workerThread, SIGNAL(finished()), worker, SLOT(deleteLater()));
        QObject::connect(&app, SIGNAL(aboutToQuit()), &workerThread, SLOT(quit()));
        
        // Connect your parser to the builder
        QObjcet::connect(worker, SIGNAL(newBlockReady(QTextBlockFormat, QTextCharFormat)), &builder, SLOT(addBlock(QTextBlockFormat, QTextCharFormat)));
        
        // Show the text edit widget
        textEdit.show();
        
        // Pass some processing to the worker thread (can be done with a signal connected to the parseJson slot, however for this example this seems easier to write)
        QMetaObject::inokeMethod(worker, "parseJson", Qt::QueuedConnection, Q_ARG(...), Q_ARG(...));
        
        // Start the event loop
        return QApplication::exec();
    }
    

    Additional note:
    You might need to register the QTextBlockFormat and QTextCharFormat with the meta-type system with qRegisterMetaType<QTextBlockFormat>() and qRegisterMetaType<QTextCharFormat>() before you could use them across threads. The registration usually occurs in main().

    Kind regards.



  • @kshegunov

    You parse your JSON in your worker object(s) (I'm assuming the low-level API here) and emit a signal from it on each ready block. The signal should have something like this as a signature

    I can't really do it the way you suggested because the order of the lines must be preserved. What I tried to do at first was to preallocate a QVector which would contain all the lines, since I know the number of lines before parsing them, and pass the QVector and the index range to the worker thread.

    The problem is, QTextBlockFormat and QTextFOrmat only stores the format rules - not actual text. The Class that stores the formatted text block is QTextBlock, but its objects are read-only.


  • Qt Champions 2016

    @Tyras

    I can't really do it the way you suggested because the order of the lines must be preserved.

    How many workers do you have parsing a single JSON file?



  • @kshegunov

    How many workers do you have parsing a single JSON file?

    the number of threads of the processor.


  • Qt Champions 2016

    @Tyras
    Then aggregate the block formats and texts in your own class (implicit sharing should work) and add an integer member for the line at which the parsing had started. Emit this object from your threads instead of the formats, and when the whole procedure has finished, only then you insert everything into the document. You could queue your objects in the builder for later processing if they're not the next pending text fragment. Does this make sense?

    Alternatively build up text fragments label them for their order of occurrence and insert them in the slot building the document.

    Kind regards.



  • That... could actually work!

    Gonna try as soon as I get time, and report back



  • @kshegunov

    The idea was very good, but It was impratical because I can (and usually do) have different formattings in a same line (for example, in a same line, some text is bold, some not). But it gave me a hint of how to implement it.

    I found out that i can convert QTextDocument objects to QTextDocumentFragment objects, and then insert the latter into a new QTextDocument. So, now, In each thread, I create a QTextDocument, fill it with the text and send it to the the builder Thread.

    void JsonParser::_workerFinished()
    {
    	if(workerPool.activeThreadCount())
    		return;
    
    	QTextDocument *doc = new QTextDocument();
    	QTextCursor cursor(doc);
    
    	while(!logFrags->isEmpty())
    	{
    		QTextDocument *fragDoc = logFrags->first()->clone(this);
    		QTextDocumentFragment frag(fragDoc);
    
    		cursor.insertFragment(frag);
    		delete logFrags->takeFirst();
    	}
    
    	delete logFrags;
    
    	emit processingFinished(doc);
    }
    

    It works, but the line

    QTextDocument *fragDoc = logFrags->first()->clone(this);
    

    outputs

    QObject: Cannot create children for a parent that is in a different thread.
    (Parent is QTextDocument(0x810cf2bd70), parent's thread is QThread(0x810ce03ec0), current thread is QThread(0x810923faa0)
    

    I really don't understand why, since the line in question isn't suppose to write anything to the QTextDocument. And whats even more strange: It only happens in the loop's first iteration.

    Any hints about what's happening?

    *EDIT: My Bad, I just noticed you suggested using text fragments. Thanks!


  • Qt Champions 2016

    @Tyras

    I really don't understand why, since the line in question isn't suppose to write anything to the QTextDocument.

    This usually indicates you have some mismatch between QObject instances, i.e. you're trying to create an object in one thread that has a parent having affinity for another. QObject object hierarchies must be living in the same thread, you can't have a parent that's in one thread and then a child in another. That's why you "push" your worker objects into another thread. One more thing to note is that you can "push" a QObject to another thread, provided he has no parent/children, but you can't "pull" it out of a thread. That's why I was suggesting to either use the formats, text and so on in your aggregated class or use text fragments, because none of those classes derive from QObject and can be passed around easily.

    Kind regards.



  • @kshegunov

    you're trying to create an object in one thread that has a parent having affinity for another.

    That's exactly what I don't understand. QTextDocument::clone() was supposed to be like a copy constructor: create a new object, read the cloned one's contents and write into the new one. It's supposed to create children for the new object, not the original one.


  • Lifetime Qt Champion

    Hi,

    What you are currently doing is giving a parent to that new QTextDocument, you should rather not give the parent parameter and then move your cloned QTextDocument to your current thread with moveToThread.


  • Qt Champions 2016

    @Tyras
    I'm not quite sure what is where in your snippet, but @SGaist's comment looks to be on the right track, so I suggest following his advice.



  • @SGaist

    you should rather not give the parent parameter and then move your cloned QTextDocument to your current thread with moveToThread.

    Should I just ignore the warning, and just move to the current thread, then? It's the clone method that gives the warning (confirmed it in debug).


  • Qt Champions 2016

    @Tyras
    No.
    If logFrags is QVector<QTextDocument> then this:

    logFrags->first()->clone(this)
    

    Will parent it to your parser object, which lives in your parser thread. Assuming that vector comes from another thread, then it causes issues.
    Something like this, should be working okay:

    QTextDocument * fragDoc = logFrags->first()->clone();
    fragDoc->moveToThread(QThread::currentThread());
    

    Kind regards.



  • @kshegunov
    Just tried your code.

    QTextDocument * fragDoc = logFrags->first()->clone();
    

    outputs:

    QObject: Cannot create children for a parent that is in a different thread.
    (Parent is QTextDocument(0x4b9ae26640), parent's thread is QThread(0x4b9acaa630), current thread is QThread(0x4b97a007c0)
    

  • Qt Champions 2016

    Yes, I'm missing on something it seems. Why would you want to clone the objects anyway, can you just push them into the current thread?

    QTextDocument * fragDoc = logFrags->first();
    fragDoc->moveToThread(QThread::currentThread());
    


  • @kshegunov

    Why would you want to clone the objects anyway, can you just push them into the current thread?

    Because I need to push it in the thread that created it... but, well, since the target thread is a singleton, it won't be so ugly.


  • Qt Champions 2016

    @Tyras

    since the target thread is a singleton

    I don't understand this. Do you mean to tell that the QThread object is a singleton?



  • @kshegunov
    Sorry, I meant that the instance of the class that receives the QTextDocument is a singleton


  • Qt Champions 2016

    @Tyras
    You shouldn't have singletons in the first place, much less QObject derived singletons.
    That being said, you'll have to tell where are the instances in the logFrags vector (if it's a vector) created and where the JsonParser object is residing, and by where I mean in what thread.

    Additionally my previous comment:

    Yes, I'm missing on something it seems. Why would you want to clone the objects anyway, can you just push them into the current thread?

    Is absolutely wrong, since I was suggesting you try to "pull" an object from another thread, which is not possible.

    Kind regards.



  • @kshegunov

    You shouldn't have singletons in the first place, much less QObject derived singletons.

    I don't really understand why, since the class is a controller, I should never have more than one instance of it in my application, And I should be able to access this single instance through all the code... besides, singletons are a design pattern... And it is QObject derivated because it must live in it's own thread.

    But, that aside, I'll just have the worker threads push the QTextDocument objects to JsonParser threads after they finish them.

    Thanks for the help!


  • Qt Champions 2016

    @Tyras

    I don't really understand why, since the class is a controller, I should never have more than one instance of it in my application, And I should be able to access this single instance through all the code... besides, singletons are a design pattern... And it is QObject derivated because it must live in it's own thread.

    You shouldn't have singletons because:

    1. A singleton is not a real object, it's a facade for a global variable
    2. A singleton created on the stack is initialized before main, so anything that actually depends on things done in main() as QObject does, may or may not work.
    3. A singleton that's constructed on the heap often is simply left undeleted - a memory leak. C++ is not JAVA, it's the programmers job to clean the memory up.
    4. A singleton that's created on first use in the heap requires special measures to be taken, so the construction is thread safe.
    5. A singleton created on the stack can't guarantee order of initialization (whence point 2 derives). If you have more than one the loader will initialize them depending on its mood!
    6. A singleton that's created on the heap can't guarantee order of initialization ever!
    7. A singleton is a global shared public resource in your application that promotes coupling, it actually couples every one of the classes that decide to use it.
    8. A singleton is not thread-safe by design, and can't be reentrant as there is only one.

    Why your worker object should not be a singleton:

    1. Because you can have many worker objects in different threads.
    2. Because if you want only one, you create only one! (And I just can't stress this enough)
    3. Because the signal/slot mechanism is supposed to decouple your components, not the other way around.
    4. Because object hierarchies work terribly with singletons, who's the root object is simply undefined.
    5. The fact that something is called a "design pattern" in some book, doesn't mean you should use it.
    6. Because something is possible, doesn't mean you should do it.

    Have you seen singletons actually implemented in Qt? Yes, there are static functions, yes there are static variables (if you skim through the code) there's a lot of them actually. But have you seen a real life singleton in this enormously big library? I have not!

    Simply apply some common sense! Create your nice worker object, connect its signals and slots, connect the cleanup routines and just rock its bloody world!

    QObject::deleteLater is actually a slot and this is not at all a coincidence, neither is the QApplication::aboutToQuit signal a fling, nor is the QObject::destroyed(QObject *) signal an error.
    My advice is: forget the "singleton design pattern" and enter the great world of OOP, yes a singleton breaks almost every rule in OOP.

    Man, sometimes I just want to cry ... then I get angry, and then I surrender. I must be dying a little with each of these posts and I firmly believe I've given at least several of these impassioned preaches ...



  • @kshegunov
    I understand, and respect your opinion on the matter but... I doesn't agree with most of them. Not wanting to make a big discussion of it, (mainly 'cause it's really out of the scope of the post) but...

    let's see:

    A singleton is not a real object, it's a facade for a global variable

    Well, yeah. I would say it's a (much) more elegant way to create a "global variable", since it ensures that only one will be created.

    A singleton created on the stack is initialized before main, so anything that actually depends on things done in main() as QObject does, may or may not work.

    Well, the way I learned to implement singleton on C++ uses heap.

    A singleton that's constructed on the heap often is simply left undeleted - a memory leak. C++ is not JAVA, it's the programmers job to clean the memory up.

    Well, I'm not seeing a problem here. I dont mind putting a few deletes at the end of my main, or even connecting some signals.

    A singleton that's created on first use in the heap requires special measures to be taken, so the construction is thread safe.

    Again, not seeing a problem here, just a need to take some caution when coding.

    A singleton created on the stack can't guarantee order of initialization (whence point 2 derives). If you have more than one the loader will initialize them depending on its mood!

    Again, I don't create them on stack.

    A singleton that's created on the heap can't guarantee order of initialization ever!

    Not really sure what you mean here.

    A singleton is a global shared public resource in your application that promotes coupling, it actually couples every one of the classes that decide to use it.

    Some times you need that coupling. You need to center the processing in one point in the code. That's what controllers are all about.

    A singleton is not thread-safe by design, and can't be reentrant as there is only one.

    Not really. For data classes, you're right. But for controllers, they can be thread safe as long as their attributes are read-only

    Why your worker object should not be a singleton

    My worker object is not a singleton. My controller object (the one that creates the worker objects) is.

    The fact that something is called a "design pattern" in some book, doesn't mean you should use it.

    But means that it have its uses.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.