High CPU usage for large XML - QXmlDefaultHandler

Chaula

We have Qt C++ desktop application.
Our first and foremost requirement to retrieve a large data from server (XML format) and parse and generate Tree view out of it.
We have observed that parsing of 65MB XML takes around 45 seconds and in this duration CPU usage shows around 25% and application seems to be in Not Responding state, even in Task manager.

We need to have our application performing. We have also tried QXMLStreamReader but it is similar in cpu usage and time taken is more, around 60 secs for same 65MB xml.

Any other way we can parse such xml using Qt without losing performance?

Regards
Chaula

raven-worx

@Chaula
well the parsing takes how long it takes.
Also it heavily depends on how you process the data.
But doing the parsing in a separate thread can't be a bad idea.

Also for big XML files you should always use QXmlStreamReader, or it may be very likely that you run out of memory and your application will crash.

Additionally you could also populate the tree dynamically, by adding the nodes one by one or in batches and let the model trigger it's update signals.

kshegunov

@Chaula

Hello,

We have observed that parsing of 65MB XML takes around 45 seconds and in this duration CPU usage shows around 25%

For files that big, I'd consider using some kind of binary format instead of XML in the first place. EBML comes to mind.

and application seems to be in Not Responding state, even in Task manager.

If you do the parsing from the main thread, then this is perfectly normal, as it will block the GUI. Perhaps you might move the parsing to it's own thread and signal the model from there. If performance is really critical (i.e. you can't wait a minute or so) I'd also think about a clever scheme to parse the file in parts, for example a separate thread parsing each root element, however currently I can't think of a way of doing this directly with a text file.

Kind regards.

mrjj

Hi
just to be sure I understand:

When you say parse, that includes traveling the tree and do something with the data ?

Not just the actual parse that the xml parser does to build the DOM ?

The reason I asked is that I tested with pugixml and it parses a
60,6 MB xml file in 623 milliseconds. So that is very far from 45 secs.