High CPU usage for large XML - QXmlDefaultHandler



  • We have Qt C++ desktop application.
    Our first and foremost requirement to retrieve a large data from server (XML format) and parse and generate Tree view out of it.
    We have observed that parsing of 65MB XML takes around 45 seconds and in this duration CPU usage shows around 25% and application seems to be in Not Responding state, even in Task manager.

    We need to have our application performing. We have also tried QXMLStreamReader but it is similar in cpu usage and time taken is more, around 60 secs for same 65MB xml.

    Any other way we can parse such xml using Qt without losing performance?

    Regards
    Chaula


  • Moderators

    @Chaula
    well the parsing takes how long it takes.
    Also it heavily depends on how you process the data.
    But doing the parsing in a separate thread can't be a bad idea.

    Also for big XML files you should always use QXmlStreamReader, or it may be very likely that you run out of memory and your application will crash.

    Additionally you could also populate the tree dynamically, by adding the nodes one by one or in batches and let the model trigger it's update signals.


  • Qt Champions 2016

    @Chaula

    Hello,

    We have observed that parsing of 65MB XML takes around 45 seconds and in this duration CPU usage shows around 25%

    For files that big, I'd consider using some kind of binary format instead of XML in the first place. EBML comes to mind.

    and application seems to be in Not Responding state, even in Task manager.

    If you do the parsing from the main thread, then this is perfectly normal, as it will block the GUI. Perhaps you might move the parsing to it's own thread and signal the model from there. If performance is really critical (i.e. you can't wait a minute or so) I'd also think about a clever scheme to parse the file in parts, for example a separate thread parsing each root element, however currently I can't think of a way of doing this directly with a text file.

    Kind regards.


  • Qt Champions 2016

    Hi
    just to be sure I understand:

    When you say parse, that includes traveling the tree and do something with the data ?

    Not just the actual parse that the xml parser does to build the DOM ?

    The reason I asked is that I tested with pugixml and it parses a
    60,6 MB xml file in 623 milliseconds. So that is very far from 45 secs.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.