QXmlSchemaValidator's validate method takes too long to complete
-
Hi!
I'm developing an application that needs to parse an xml file as safely as possible. To do it, I try to validate the said file using a schema (an xsd file).
In fact, I have to do this for two different kind of xml files, each of these having its own validating schema. So I started writing the code to parse and validate the first one. It works perfectly.
Then I duplicated the code used to deal with the first type of xml files to adapt it to do the same with the second type.
It works, but the method validate() from QXmlSchemaValidator takes, maybe, 30 seconds to finish, which is unacceptable since this code is called by a GUI-based application and I cant have my users sitting for so long a time just to check that an input file is correct.
This is really weird. I insist: the code to validate and parse the two kinds of files is almost identical (there are just a few aesthetical corrections affectubg the text in error messages) and in the first case everythink works as a charm, while in the second one I have this awful delay.
My guess is that, by some kind of reason, the schema defining the 2nd type of XML files must have some kind of problem. However, I have designed it using XMLSpy, and this tool says my schema is OK.
I have prepared a full working example so those that could help me can reproduce the problem. I will include below the whole set of .hpp and .cpp files, the .pro one as well a sample input xml file plus the validating schema.
If you run the example, the problems appears when calling method "parse" in class "workflow_parser". There, when reaching the place where the "validate" method (which belongs to QXmlSchemaValidator), the program stalls for a while, and then, finishes correctly.
Any ideas? My setup:
Windows 10, up to date.
QT 5.15 LTS using the c++ compiler included with Microsoft Visual studio 2015.The files:
test_workflow_reader.pro
QT -= gui QT += core xml xmlpatterns CONFIG += console CONFIG -= app_bundle # Flags for the compiler. win32: CONFIG += c++11 unix:!macx: QMAKE_CXXFLAGS += "-std=c++11" SOURCES += \ main.cpp \ workflow_parser.cpp \ workflow_parser_message_handler.cpp HEADERS += \ workflow_parser.hpp \ workflow_parser_message_handler.hpp \ workflow_structures.hpp
workflow_parser.hpp
/** \file workflow_parser.hpp \brief Parse workflow XML files. */ #ifndef WORKFLOW_PARSER_HPP #define WORKFLOW_PARSER_HPP #include <QDomDocument> #include <QFile> #include <QtXmlPatterns> #include <QString> #include <vector> #include <set> #include <sstream> #include <iostream> #include "workflow_structures.hpp" #include "workflow_parser_message_handler.hpp" using namespace std; /// \brief Parse toolkit XML files. class workflow_parser { public: /// \brief Return the list of detected errors. /** \return The list of detected errors. Note that the list will be empty if parse() returned true. */ vector<string> error_list (void); /// \brief Parse a workflow, loading its contents in a /// workflow structure. /** \param[in] filename The name of the XML file with the definition of the workflow. \param[out] wf The workflow, once parsed and verified, is loaded into this structure. \return An error code: - 0: successful completion. */ bool parse (QString& filename, WFWorkflow& wf); /// \brief Set the path to the validating schema. /** \param[in] path_to_schema \return True if the schema is set, false otherwise. Call this method if a validating schema is available. Setting a schema means that the parse() method will be safer to use, since not only the syntactic validity will be checked; the structure of the xml file to read will be compared with the one defined by the schema. */ bool set_schema (QString& path_to_schema); /// \brief Constructor. workflow_parser (void); protected: /// \brief The list of detected errors. vector<string> error_list_; /// \brief Flag stating whether a validating schema is available. bool got_schema_; /// \brief The validating schema. QXmlSchema schema_; }; #endif // WORKFLOW_PARSER_HPP
workflow_parser_message_handler.hpp
/** \file workflow_parser_message_handler.hpp \brief Error handler for XML validation against the workflow schema. */ #ifndef WORKFLOW_PARSER_MESSAGE_HANDLER_HPP #define WORKFLOW_PARSER_MESSAGE_HANDLER_HPP #include <QXmlStreamReader> #include <QAbstractMessageHandler> #include <QString> #include <vector> #include <string> using namespace std; /// \brief Handle schema validation errors for workflow XML files. class workflow_parser_message_handler : public QAbstractMessageHandler { public: /** \brief Return the text column number where an error was detected for error number "at". \param[in] at The number of the error sought. \return The text column number where and error was detected for error number "at". */ int error_column (size_t at) const; /** \brief Return the text line number where an error was detected for error number "at". \param[in] at The number of the error sought. \return The text line number where and error was detected for error number "at". */ int error_line (size_t at) const; /** \brief Return the textual description of error number "at". \param[in] at The number of the error sought. \return The textual description of error number "at". */ string error_message (size_t at) const; /** \brief Total number of errors detected. \return The total number of errors detected. */ size_t error_total (void) const; /** \brief Constructor. */ workflow_parser_message_handler (void); protected: /** \brief Re-implementation of parent class method. Takes care of receiving and handling the incoming error messages. \param[in] type Kind of error. \param[in] description Textual description of the error. \param[in] identifier Identifier. \param[in] sourceLocation Location (such as line and column in the parsed text file) of the error. */ virtual void handleMessage ( QtMsgType type, const QString& description, const QUrl& identifier, const QSourceLocation& sourceLocation); private: /// \brief Array storing the error column numbers. vector<int> error_column_; /// \brief Array storing the textual description of the errors. vector<string> error_description_; /// \brief Array storing the error line numbers. vector<int> error_line_; }; #endif // WORKFLOW_PARSER_MESSAGE_HANDLER_HPP
workflow_structures.hpp
#ifndef WORKFLOW_STRUCTURES_HPP #define WORKFLOW_STRUCTURES_HPP #include <vector> using namespace std; enum ConnectionType {RepoEndPoint, TaskEndPoint}; /** \brief Coordinates (in the scene space) of the center of a task or repository. */ struct WFCoordinate { int x; /**< X scene coordinate. */ int y; /**< Y scene coordinate. */ }; /** \brief Defines a connection endpoint. */ struct WFEndPoint { ConnectionType type; /**< Type of endpoint (Task, Repository). */ int endpoint_id; /**< Numerical identifier of the endpoint. */ int slot ; /**< Ordinal stating the input / output slot. */ }; /** \brief Defines a single connection. */ struct WFConnection { WFEndPoint from; /**< Definition of the point where the connection starts. */ WFEndPoint to; /**< Definition of the point where the conection ends. */ bool operator<(const WFConnection& rhs) const { { // First, we sort using the type of endpoint. if (from.type < rhs.from.type) return true; if (from.type > rhs.from.type) return false; if (to.type < rhs.to.type) return true; if (to.type > rhs.to.type) return false; // // At this point, all types are equal. Then, our decision // will be based on the endpoint's ids. // if (from.endpoint_id < rhs.from.endpoint_id) return true; if (from.endpoint_id > rhs.from.endpoint_id) return false; if (to.endpoint_id < rhs.to.endpoint_id) return true; if (to.endpoint_id > rhs.to.endpoint_id) return false; // // At this point, all endpoints id are equal. We'll then // rely on the connection's slots. // if (from.slot < rhs.from.slot) return true; if (from.slot > rhs.from.slot) return false; if (to.slot < rhs.to.slot) return true; if (to.slot > rhs.to.slot) return false; // // All components in both structs are equal. This means // that the first one is NOT less than the second one. // return false; } } }; /** \brief Defines a task or repository. */ struct WFNode { string id; /**< String (not unique) identifier of the task or repository. */ int numerical_id; /**< Numerical (unique) identifier of the task or repository. */ WFCoordinate pos; /**< Coordinates (in the scene space) of the center of the task or repository. */ }; /** \brief Defines a complete workflow. */ struct WFWorkflow { string id; /**< String identifier of the workflow. */ string description; /**< Description (preferably short) of the workflow. */ string toolkit_id; /**< Identifier of the toolkit on which the workflow relies. */ int last_repo_id; /**< Last used numerical identifier used by the workflow editor to guarantee that no two repos have the same string id. */ vector<WFNode> repos; /**< The repositories included in the workflow. */ vector<WFNode> tasks; /**< The tasks included in the workflow. */ vector<WFConnection> connections; /** The connections between tasks / repositories. */ }; #endif // WORKFLOW_STRUCTURES_HPP
workflow_parser.cpp
/** \file workflow_parser.cpp \brief Implementation file for workflow_parser.hpp. */ #include "workflow_parser.hpp" vector<string> workflow_parser:: error_list (void) { { return error_list_; } } bool workflow_parser:: parse (QString& filename, WFWorkflow& wf) { { cout <<"Inside workflow_parser::parse" << endl; // If we've got a validating schema, try to validate our XML document. if (got_schema_) { cout << "We have a schema to validate the workflow." << endl; cout << " The workflow's file name is: " << filename.toStdString() << endl; QFile file(filename); if(!file.open(QIODevice::ReadOnly | QIODevice::Text)) { error_list_.push_back("Unable to open the workflow file '" + filename.toStdString() + "'"); return false; } cout << "I've just opened the file with the workflow" << endl; workflow_parser_message_handler message_handler; QXmlSchemaValidator validator(schema_); validator.setMessageHandler(&message_handler); cout << "Message handler and schema validator are ready. Starting to validate!" << endl; if (!validator.validate(&file, QUrl::fromLocalFile(file.fileName()))) { string message; message = "'" + filename.toStdString() + "' is not a valid workflow XML definition file."; error_list_.push_back(message); message = " These are the errors detected:"; error_list_.push_back(message); for (size_t i = 0; i < message_handler.error_total(); i++) { string scolumn; string sline; stringstream ss1; stringstream ss2; ss1 << message_handler.error_line(i); sline = ss1.str(); ss2 << message_handler.error_column(i); scolumn = ss2.str(); message = " " + message_handler.error_message(i) + "(line " + sline + ", column " + scolumn + ")"; error_list_.push_back(message); } file.close(); return false; } cout << "Workflow validated" << endl; file.close(); cout << "Workflow file closed." << endl; } // Open the actual file with the XML data. cout << "About to open again the xml file." << endl; cout << " Its name was and is: " << filename.toStdString() << endl; QFile file(filename); if(!file.open(QIODevice::ReadOnly | QIODevice::Text)) { error_list_.push_back("Unable to open the workflow file '" + filename.toStdString() + "'"); return false; } cout << "Parsing..." << endl; // Parse our input XML file. QDomDocument document; int column; QString error; int line; QString message; if(!document.setContent(&file, &error, &line, &column)) { // Unable to load the input file. file.close(); message = " " + error + ". Line: " + QString::number(line) + ". Column: " + QString::number(column); error_list_.push_back("Error parsing '" + filename.toStdString() + "'"); error_list_.push_back(message.toStdString()); return false; } // Get the document's root element QDomElement root = document.firstChildElement(); // Get the identifier and description of the toolkit. QDomElement element; element = root.firstChildElement("id"); wf.id = element.text().toUpper().toStdString(); element = root.firstChildElement("description"); wf.description = element.text().toStdString(); element = root.firstChildElement("toolkit_id"); wf.toolkit_id = element.text().toStdString(); element = root.firstChildElement("last_repository_id"); wf.last_repo_id = element.text().trimmed().toInt(); // Get the list of repositories, if any. wf.repos.clear(); QDomNodeList nodes; element = root.firstChildElement("repositories"); if (!element.isNull()) { nodes = element.elementsByTagName("repository"); for(int i = 0; i < nodes.count(); i++) { QDomNode elm = nodes.at(i); if(elm.isElement()) { QDomElement e = elm.toElement(); QDomElement id = e.firstChildElement("id"); QDomElement numerical_id = e.firstChildElement("numerical_id"); QDomElement pos = e.firstChildElement("position"); QDomElement x = pos.firstChildElement("x"); QDomElement y = pos.firstChildElement("y"); WFNode rep; rep.id = id.text().toUpper().toStdString(); rep.numerical_id = numerical_id.text().toInt(); rep.pos.x = x.text().toInt(); rep.pos.y = y.text().toInt(); wf.repos.push_back(rep); } } } // Get the list of tasks, if any. wf.tasks.clear(); element = root.firstChildElement("tasks"); if (!element.isNull()) { nodes = element.elementsByTagName("task"); for(int i = 0; i < nodes.count(); i++) { QDomNode elm = nodes.at(i); if(elm.isElement()) { QDomElement e = elm.toElement(); QDomElement id = e.firstChildElement("id"); QDomElement numerical_id = e.firstChildElement("numerical_id"); QDomElement pos = e.firstChildElement("position"); QDomElement x = pos.firstChildElement("x"); QDomElement y = pos.firstChildElement("y"); WFNode tsk; tsk.id = id.text().toUpper().toStdString(); tsk.numerical_id = numerical_id.text().toInt(); tsk.pos.x = x.text().toInt(); tsk.pos.y = y.text().toInt(); wf.tasks.push_back(tsk); } } } // Get the list of connections, if any. wf.connections.clear(); element = root.firstChildElement("connections"); if (!element.isNull()) { nodes = element.elementsByTagName("connection"); for(int i = 0; i < nodes.count(); i++) { QDomNode elm = nodes.at(i); if(elm.isElement()) { WFConnection conn; QString node_type; QDomElement e = elm.toElement(); QDomElement from = e.firstChildElement("from"); QDomElement to = e.firstChildElement("to"); QDomElement type = from.firstChildElement("type"); QDomElement nid = from.firstChildElement("numerical_id"); QDomElement pos = from.firstChildElement("position"); node_type = type.text().toUpper(); if (node_type == "REPOSITORY") conn.from.type = RepoEndPoint; else conn.from.type = TaskEndPoint; conn.from.endpoint_id = nid.text().toInt(); conn.from.slot = pos.text().toInt(); type = to.firstChildElement("type"); nid = to.firstChildElement("numerical_id"); pos = to.firstChildElement("position"); node_type = type.text().toUpper(); if (node_type == "REPOSITORY") conn.to.type = RepoEndPoint; else conn.to.type = TaskEndPoint; conn.to.endpoint_id = nid.text().toInt(); conn.to.slot = pos.text().toInt(); wf.connections.push_back(conn); } } } // Close the input file. file.close(); cout << "Workflow file parsed and closed" << endl; // That's all cout << "Leaving workflow_parser::parse" << endl; return true; } } bool workflow_parser:: set_schema (QString& path_to_schema) { { cout << "Inside workflow_parser::set_schema" << endl; cout << "Trying to set file: " << path_to_schema.toStdString() << " as the validating schema." << endl; QUrl schema_url; // Transform the path to the schema file to a valid URL. schema_url = QUrl::fromLocalFile(path_to_schema); // Try to load the schema. schema_.load(schema_url); if (!schema_.isValid()) { string message; message = "Unable to set the validating schema at '" + schema_url.toString().toStdString() + "'"; error_list_.push_back(message); got_schema_ = false; return false; } // We've got a validating schema! got_schema_ = true; // That's all. cout <<"Done. Leaving workflow_parser::set_schema" << endl; return true; } } workflow_parser:: workflow_parser (void) { { got_schema_ = false; } }
workflow_parser_message_handler.cpp
/** \file workflow_parser_message_handler.cpp \brief Implementation file for workflow_parser_message_handler.hpp. */ #include "workflow_parser_message_handler.hpp" int workflow_parser_message_handler:: error_column (size_t at) const { { return error_column_[at]; } } int workflow_parser_message_handler:: error_line (size_t at) const { { return error_line_[at]; } } string workflow_parser_message_handler:: error_message (size_t at) const { { return error_description_[at]; } } size_t workflow_parser_message_handler:: error_total (void) const { { return error_description_.size(); } } void workflow_parser_message_handler:: handleMessage ( QtMsgType type, const QString& description, const QUrl& identifier, const QSourceLocation& sourceLocation) { { Q_UNUSED(type); Q_UNUSED(identifier); QXmlStreamReader xml(description); QString text; while (!xml.atEnd()) if (xml.readNext() == QXmlStreamReader::Characters) text += xml.text(); // Copy the error data to our internal members. error_description_.push_back(text.toStdString()); error_column_.push_back(sourceLocation.column()); error_line_.push_back(sourceLocation.line()); } } workflow_parser_message_handler:: workflow_parser_message_handler (void) : QAbstractMessageHandler(nullptr) { { error_column_.clear(); error_description_.clear(); error_line_.clear(); } }
main.cpp
#include <QtCore> #include <QDebug> #include <string> #include <vector> #include <iostream> #include "workflow_structures.hpp" #include "workflow_parser.hpp" using namespace std; int main(int argc, char *argv[]) { QString filename; QString schema; bool status; WFWorkflow wf; workflow_parser wfp; // QCore application is needed to make possible the validation of schemas. QCoreApplication a(argc, argv); // The names of the input file and the validating schema. filename = "./my_workflow.xml"; schema = "./workflow.xsd"; // Tell our parser that we have a schema to validate our input files. status = wfp.set_schema(schema); if (!status) { vector<string> errors = wfp.error_list(); for (size_t i = 0; i < errors.size(); i++) cout << errors[i] << endl; return 1; } // Let's parse our input file. status = wfp.parse (filename, wf); if (!status) { vector<string> errors = wfp.error_list(); for (size_t i = 0; i < errors.size(); i++) cout << errors[i] << endl; return 1; } // That's all. cout << "Success!" << endl; return 0; }
The sample xml file (my_workflow.xml)
<?xml version="1.0" encoding="ISO-8859-1"?> <workflow xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="workflow.xsd"> <id>tururu</id> <description>Tururu vs. tarara</description> <toolkit_id>INT2PNG</toolkit_id> <last_repository_id>2</last_repository_id> <repositories> <repository> <id>REPO_2</id> <numerical_id>1</numerical_id> <position> <x>3030</x> <y>2798</y> </position> </repository> <repository> <id>REPO_1</id> <numerical_id>3</numerical_id> <position> <x>2809</x> <y>2372</y> </position> </repository> </repositories> <tasks> <task> <id>INTER2JPEG</id> <numerical_id>2</numerical_id> <position> <x>2797</x> <y>2530</y> </position> </task> <task> <id>JPEG2PNG</id> <numerical_id>0</numerical_id> <position> <x>2582</x> <y>2676</y> </position> </task> </tasks> <connections> <connection> <from> <type>repository</type> <numerical_id>3</numerical_id> <position>0</position> </from> <to> <type>task</type> <numerical_id>2</numerical_id> <position>0</position> </to> </connection> <connection> <from> <type>task</type> <numerical_id>0</numerical_id> <position>0</position> </from> <to> <type>repository</type> <numerical_id>1</numerical_id> <position>0</position> </to> </connection> <connection> <from> <type>task</type> <numerical_id>0</numerical_id> <position>1</position> </from> <to> <type>repository</type> <numerical_id>1</numerical_id> <position>0</position> </to> </connection> <connection> <from> <type>task</type> <numerical_id>2</numerical_id> <position>1</position> </from> <to> <type>repository</type> <numerical_id>1</numerical_id> <position>0</position> </to> </connection> <connection> <from> <type>task</type> <numerical_id>2</numerical_id> <position>0</position> </from> <to> <type>task</type> <numerical_id>0</numerical_id> <position>0</position> </to> </connection> </connections> </workflow>
And, finally, the validating schema, that is, workflow.xsd
<?xml version="1.0" encoding="UTF-8"?> <!-- edited with XMLSPY v2004 rel. 2 U (http://www.xmlspy.com) by XMLSPY 2004 Professional Ed. Release 2, Installed Multi for 10 users (SOFTWARE AG) --> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:element name="workflow"> <xs:annotation> <xs:documentation>Comment describing your root element</xs:documentation> </xs:annotation> <xs:complexType> <xs:all> <xs:element name="id" type="xs:token"/> <xs:element name="description" type="xs:token"/> <xs:element name="toolkit_id" type="xs:token"/> <xs:element name="last_repository_id" type="xs:nonNegativeInteger"/> <xs:element name="repositories" minOccurs="0"> <xs:complexType> <xs:sequence> <xs:element name="repository" maxOccurs="unbounded"> <xs:complexType> <xs:all> <xs:element name="id" type="xs:token"/> <xs:element name="numerical_id" type="xs:nonNegativeInteger"/> <xs:element name="position"> <xs:complexType> <xs:all> <xs:element name="x" type="xs:decimal"/> <xs:element name="y" type="xs:decimal"/> </xs:all> </xs:complexType> </xs:element> </xs:all> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="tasks" minOccurs="0"> <xs:complexType> <xs:sequence> <xs:element name="task" maxOccurs="unbounded"> <xs:complexType> <xs:all> <xs:element name="id" type="xs:token"/> <xs:element name="numerical_id" type="xs:nonNegativeInteger"/> <xs:element name="position"> <xs:complexType> <xs:all> <xs:element name="x" type="xs:decimal"/> <xs:element name="y" type="xs:decimal"/> </xs:all> </xs:complexType> </xs:element> </xs:all> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="connections" minOccurs="0"> <xs:complexType> <xs:sequence> <xs:element name="connection" maxOccurs="unbounded"> <xs:complexType> <xs:all> <xs:element name="from"> <xs:complexType> <xs:all> <xs:element name="type"> <xs:simpleType> <xs:restriction base="xs:token"> <xs:enumeration value="repository"/> <xs:enumeration value="REPOSITORY"/> <xs:enumeration value="task"/> <xs:enumeration value="TASK"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="numerical_id" type="xs:nonNegativeInteger"/> <xs:element name="position" type="xs:integer"/> </xs:all> </xs:complexType> </xs:element> <xs:element name="to"> <xs:complexType> <xs:all> <xs:element name="type"> <xs:simpleType> <xs:restriction base="xs:token"> <xs:enumeration value="repository"/> <xs:enumeration value="REPOSITORY"/> <xs:enumeration value="task"/> <xs:enumeration value="TASK"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="numerical_id" type="xs:nonNegativeInteger"/> <xs:element name="position" type="xs:integer"/> </xs:all> </xs:complexType> </xs:element> </xs:all> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:all> </xs:complexType> </xs:element> </xs:schema>
To help you to understand the structure of my data (as described in the schema above) I include a picture with a graphical representation of such schema. It has been produced using XML spy. There, dotted rectangles stand for optional elements.
Thanks for your help!
-
Just a couple of extra comments... weird ones, by the way.
The first-level node, "workflow" is defined as a "xs:all" series of elements (the ones at its right in the image I posted). That is, this definition says that the elements making "workflow" must be all present (except for the optional ones), but the order of appearance of these sub-nodes does not matter.
If I change the "xs:all> connector by a "xs:sequence" one, that is, stating that the order in which the sub-elements appear does matter, then the delay I talked about disappears and the parsing takes much less than a second.
More tests: if I keep the xs:all connector but I make the optional elements "repositories", "tasks" and "connections" mandatory, then I still have a delay, but much shorter than before (maybe a couple of seconds) - which is, what happens with my first kind of xml files!!!
Fortunatelly, I may replace the connector xs:all with xs:sequence, because, for me, it's no problem to state that the sub-elements must be in some specific order. Therefore, I can say that MY problem is solved, but I woud say that for those that cannot perform the said replacement, the validate() method has a serious problem when at least one sub-element is optional.
I don't know if this issue has been detected / solved in Qt 6, but I prefer to say it here just in case someone at Qt decides to take a look.
Thanks!
-
@bleriot13 said in QXmlSchemaValidator's validate method takes too long to complete:
I don't know if this issue has been detected / solved in Qt 6
Be aware:
QXmlSchemaValidator
(andQtXmlPatterns
) was removed at Qt6!