Compare two XML files
-
wrote on 3 Oct 2012, 15:18 last edited by
Hello
Can you help me with general algorithm - how compare two xml with ignoring node order. -
wrote on 4 Oct 2012, 07:58 last edited by
So, you mean to do something like this?
@
<someXml>
<A>
<A2/>
<A1/>
</A>
<B>
<B1/>
<B2/>
</B>
</someXml>
@
compared with@
<someXml>
<B>
<B2/>
<B1/>
</B>
<A>
<A1/>
<A2/>
</A>
</someXml>
@And say that the files are the same?
How big are the documents we're talking about? And do you only need "same"/"different", or do you also need to identify these differences?
-
wrote on 4 Oct 2012, 08:06 last edited by
Yes. I need identify differences,
and nodes may be missed in one of files
and nodes may have attributes, which also may be missed or have difference. -
wrote on 4 Oct 2012, 08:08 last edited by
How big are the files you are talking about? Would having a representation of each of them in memory be an option?
-
wrote on 4 Oct 2012, 08:12 last edited by
Files are not big, not more 100Kb. So I guess I can use QDomDocument etc for it, but I'm not sure about algorithm.
-
wrote on 4 Oct 2012, 08:23 last edited by
For more explanation I need create tool something like this:
!http://s17.a-img.com/images/shots/DiffDog2010r3_XML_comparison.gif(diff)!
but more simple, of course. But with highlighting difference nodes. -
wrote on 4 Oct 2012, 08:42 last edited by
Basically, I'd say you'd need to build a tree representation of each of your XML files. Then, you iterate over tree A recursively, trying to find a match for each node in tree B. The tree would contain everything, down to the attributes.
You can remove every end-node (node without children) from both trees if a match if found. After iterating over the tree, you're left with two trees that only contain those nodes that are not in the other tree: the difference. Note that for this to work, you will have to iterate as deep as you can get before you start deleting. If you can't get to the same depth on both, you have a difference and there is nothing to delete.
To search efficiently, I think QDomDocument may not be the ideal in-memory representation, because it is hard to search for nodes. Instead, I would considder a custom data structure, or alternatively build an index of the QDomDocument first so you can quickly retreive nodes from it based on a path.
Note that a Google search on "compare two trees algorithm" returns quite a number of useful-looking results.
-
wrote on 4 Oct 2012, 08:52 last edited by
Thanks, I will try it.
1/8