[Moved] Problem using QXmlStreamReader to parse html tag
-
Hi,I want to use QXmlStreamReader to parse a html file ,I want to get the all <p> content ,but in some <p> tag ,there is another <em> tag ,I can't get all <p> tag content when meeting <em>,the QXmlStreamReader can'not read <em> tag.
here is the html file :
@
<?xml version='1.0' encoding='utf-8'?>
<html>
<body class="calibre">
<hr class="calibre6" id="calibre_pb_6"/><h3 id="calibre_toc_7" class="calibre7">CHAPTER VI</h3>
<p class="calibre4">PREPARING TO BE A SAILOR</p>
<p class="calibre4">"Take you for an old fraud," replied the unabashed first mate of the <em class="calibre5">Fancy</em>. "Of course you would be bankrupted, as you ought to have been long ago, if you gave fifty dollars on every turnip that is brought in; but you could well afford to advance a hundred on this watch, and you know it."</p>
<p class="calibre4">"Veil, I tell you; I gifs t'venty-fife."</p>
<p class="calibre4">[Illustration: "'VELL, I TELL YOU. I GIFS YOU TVENTY-FIFE'"]</p>
<p class="calibre4">"Fifty," said Bonny, firmly.</p>
</body>
</html>
@here is the code to parse the html 's <p> and <em> tags,what's wrong with this!
@
QString CParseEpubHtml::ParseHtml( QString filePath)
{
QFile pTmpFile(filePath);
if(!pTmpFile.open(QIODevice::ReadOnly))
{
qWarning("Error opening file");
// return -1;
}
QXmlStreamReader xmlReader(&pTmpFile);
xmlReader.setDevice(&pTmpFile);while(!xmlReader.atEnd() && !xmlReader.hasError()) { xmlReader.readNext(); if(xmlReader.isStartElement()){ if( xmlReader.name()=="p") { m_ReadContent +=xmlReader.readElementText(); } } if(xmlReader.name()=="em") { xmlReader.readNext(); m_ReadContent+=xmlReader.readElementText(); } if(xmlReader.isEndElement()) { if(xmlReader.name()=="p") m_ReadContent+="\n"; if(xmlReader.name()=="html") break; } }
return m_ReadContent;
}@
Another qusetion is that can TextEdit in QML support the css ,I want to show the text in the TextEdit just like in the html,it can keep the style like html!
thank you for reply!
My best regards. -
There is a hierarchy flaw within your loop.
Try this:@while(!xmlReader.atEnd() && !xmlReader.hasError())
{
xmlReader.readNext();
if(xmlReader.isStartElement()){
if( xmlReader.name()=="p")
{
m_ReadContent +=xmlReader.readElementText();
}
if(xmlReader.name()=="em")
{
xmlReader.readNext();
m_ReadContent+=xmlReader.readElementText();
}} if(xmlReader.isEndElement()) { if(xmlReader.name()=="p") m_ReadContent+="\n"; if(xmlReader.name()=="html") break; } }@
How about writing a separate post regarding CSS? It might get lost where it is now.
-
Maybe instead of using TextEdit you could use "QWebView":http://developer.qt.nokia.com/doc/qt-4.7/qwebview.html instead. This way you can edit your content inside the WebView like a "WYSIWYG editor":http://labs.qt.nokia.com/2009/03/12/wysiwyg-html-editor/ by setting the content to editable. @htmlView->page()->setContentEditable(true);@ And as a bonus you can also parse your HTML doc using "QWebElement":http://developer.qt.nokia.com/doc/qt-4.7/qwebelement.html with Css like selectors.