Python / PyQt / Qt performance

mvidelgauz · wrote on 30 Mar 2013, 12:41

Hello,

I am not sure if my problem related to Python (I am using 2.7) or PyQt or Qt itself

I am developing log4j log file viewer. My program loads log file and parses each line to build model for QTreeView. In parallel it collects statistics about some field values. This is done using map <field_value, boolean>

Now, I have 3 test files - test1.log, test2.log, test3.log. test2.log is actually concatenation of test1.log. test3.log is also concatenation of test1.log 3 times. So set of values and thus size of maps must be the same in all 3 cases:

test1.log - ~ 50 MB and 383377 records
test2.log - ~ 100 MB and 383377 x 2 records
test3.log - ~ 150 MB and 383377 x 3 records

Now the problem:

test1 and test2 are loaded in 1-2 second and pythonw process takes ~ 35 MB of RAM (which is already strange given that only file content is 50 or even 100 MB, but that isn't my question, although might trigger a clue)

test3 loads several minutes and pythonw process takes almost 3 GB of RAM

Does anyone have a clue what could be a reason for such performance hit?

Thank you!

SGaist · wrote on 30 Mar 2013, 22:09

Hi,

Without seeing any code it's very difficult to help. We don't know how you load the log files, if you load all three files together etc...

mvidelgauz · wrote on 23 Apr 2013, 08:42

Sorry for late reply, I was fully busy with a different project for all this time.

I will of course show the code but I want first to understand what parts of it is really relevant, otherwise is going to be quite big. So I want to begin with top-level description of the flow:

Reading text file into lines array: lines = input.readlines() (then lines = [line. rstrip() for line in lines]

Parsing each line: for line in lines parser.parseLine(line). Method parseLine() returns array of fields, which is saved as member (in C++ terms) of 'record' (instance of class LogRecord). This record is added to array 'self.records'

Iterating over array of records to create instance of 'TreeItem' class for each record (you may ask why do I need two classes 'LogRecord' and 'TreeItem' - to have kind of "clean" model with records independent of TreeView specific requirements, 'TreeItem' implements method needed only for TreeView representation of the model, such as child/parent relationship)

I profiled project with eric4 IDE and got interesting results:
Method 'loadLogFile' of window class that has just two lines:
self.model.loadLogFile(filePath)
self.mainRecordsView.adjustSectionsWidth()
took ~21 seconds
while method 'loadLogFile' of model (flow described above) took only 9 seconds and 'adjustSectionsWidth()' took another ~12 seconds.
adjustSectionsWidth() just calls 'sizeHintForColumn' of QTreeView 6 times (and then 'resizeSection' of header). So it looks like 'sizeHintForColumn' is quite expensive method (actually, doesn't surprise me), I will try to use some other ways to estimate header section widths.

Besides that I think 9 seconds for model's 'loadLogFile' on i7 with 8GB RAM is also too much. I am now trying to drill down this method with profiler

I will really appreciate any suggestions. And what parts of code should I show?

Thank you!

SGaist · wrote on 23 Apr 2013, 10:15

Do you use a custom model (i think so) ? If so, how do you propagate the fact that your data has arrived ? Once per item ? Once when you loaded all the data ?

How does adjustSectionsWidth work ? Does it parse all data or only the visible part ?

In what kind of structure do you keep your data ? What's it's access speed ?

Hope these questions help

loadLogFile and adjustSectionsWidth would be interesting to see i think

mvidelgauz · wrote on 23 Apr 2013, 20:28

Answers for your questions:

Yes, custom model as you correctly guessed )))

Of course once when data arrived. Here is the code of loadLogFile() of 'LogModel' class:
(currently split into other method calls to collect profile data, I will paste here code of these methods also):

@
def loadLogFile(self, fileName):
self.load_readLines(fileName)

    self.beginResetModel()

    self.doInit()

    self.load_parseLines()
    self.lines = None

    self.setupModelData()

    self.endResetModel()

@

current code of 'adjustSectionsWidth' of 'RecordsView', which is derived from QTreeView:

@
def adjustSectionsWidth(self):
#TODO: reimplement adjustSectionsWidth

    header = self.header()

    #Tempral hard-coded widths:
    header.resizeSection(1, 112)
    header.resizeSection(2, 67)
    header.resizeSection(3, 121)
    header.resizeSection(4, 227)
    header.resizeSection(5, 42)
    header.resizeSection(6, 303)
    return

    #original code:
    fieldLogicalIndex = 1 #not 0 because we dont't want to touch leftmost "extra" section
    while fieldLogicalIndex &lt; header.count():
        #resize section to fit hint
        hint = self.sizeHintForColumn(fieldLogicalIndex)
        hint += 3 #TODO: margin pixels, find better way to estimate this number
        sectionSize = header.sectionSize(fieldLogicalIndex)
        if sectionSize &lt; hint:
            sectionSize = hint
            header.resizeSection(fieldLogicalIndex,  sectionSize)
        fieldLogicalIndex += 1

@

Now, to drill down 'loadLogFile' let's see profile data:
!http://i.imgur.com/d23AMSe.png(ProfileData1)!

As you can see most of the time is spent on QTreeView methods 'endResetModel' and 'hasIndex'

The number of calls ~260000 looks strange to me given that number of records is 383377

'rowCount' and 'data' of my model also took some time. rowCount() code I grabbed from some QTreeView's model sample on the web (it took > 3 seconds):
@
def rowCount(self, parent=QtCore.QModelIndex()):
if parent.column() > 0:
return 0

    if not parent.isValid():
        p_Item = self.rootItem
    else:
        p_Item = parent.internalPointer()

    count = p_Item.childCount()
    return count

@
you can see at screenshot that childCount() took < 1 second

What direction would you suggest to go now?

Thank you again!