Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

QXmlQuery setFocus consuming memory wildly on a large Xml file



  • Kinda hair-pulling on this one. With QXmlQuery, Qt 5.11, I've tried setFocus using a "Url", or setting doc URL through setQuery and it's always given me an error, so I'm reading the contents of the file directly instead and then setting focus to a string:

    QFile fileRead(filePath);
    QString fileContents;
    if (fileRead.open(QIODevice::ReadOnly))
    {
        QTextStream qStream(&fileRead);
        qStream.setCodec("UTF-8");
        while (!qStream.atEnd())
        {
            QString line = qStream.readLine();
            fileContents += line + "\n";
        }
    }
    fileRead.close();
    
    if (!fileContents.isEmpty())
    {
        QXmlQuery query;
        query.setFocus(fileContents);
    }
    

    I've left out what I do after setting focus because it doesn't matter. Even if I completely comment that out, the issue still happens.

    On most XML files I process, it's fine. There's a large one I try to read and pass in (some 200k+ plus lines and 50mb in size, and it just absolutely chokes on it as soon as setFocus is called. Starts wildly consuming memory and I have to shut it down. I would just not deal with a file that large if I had any choice in the matter with what I'm trying to do, but I don't have control over how large the files are.

    It's also concerning to me how out of control an issue it is. I'd expect an error, not whatever this is. This seems like a recursion it's stuck on or something.

    Any ideas as to what's going wrong? Maybe some way I'm misusing QXmlQuery? Or maybe it's just not designed to handle XML of that size?

    I don't know if it's relevant, but I'm doing this within a reimplemented bool className::event(QEvent *event) function under intercepted handling of tooltip event if (event->type() == QEvent::ToolTip) Maybe I'll try moving it somewhere else later just to be sure, but it's hard to imagine how that would factor in when every other file I try is fine.

    Edit: Tried doing query.setFocus(QUrl::fromLocalFile(filePath)); and letting it ride out, and it peaked at something like around 450mb memory in Task Manager before stopping and letting go of the memory. Seems like less than when I was doing it via string (the string version seemed like it was just gonna keep going on forever), but still, that seems like a lot of memory for it to be allocating when the file itself is around 50mb in size. Either I don't understand something about RAM vs storage or that's wayyy off where it should be.

    Edit 2: I think I may have figured out that it has to do with something called Document Projection (or rather, the lack of it). Best guess is QXmlQuery does not use Document Projection. See:

    In most XQuery processors, a large percent of the time needed to query a large XML file is spent parsing the file and creating an in-memory representation that can be queried. The in-memory representation may be many times the size of the original XML file, and Java VM out of memory errors may occur at run time. When your query addresses an XML file using fn:doc(), DataDirect XQuery® uses a technique known as Document Projection, which allows it to create only the part of a document needed by the query.
    https://www.progress.com/tutorials/xquery/performance-and-optimization

    Also probably this is where the technique comes from in some part (have not read it yet): https://www.cs.rutgers.edu/~amelie/papers/2003/xmlprojection.pdf

    If it does use it, then maybe my use of it is forcing it to allocate everything instead of allowing document projection.


Log in to reply