[solved] Advise On Reading Lots Of Records From A File

DBoosalis

I hope someone here can offer a trick or a few words of wisdon on how best to optimize reading lines from a file. I have been using QIODevice::readLine(), but for a file with line numbers approaching 230K it seems kind of slow. Does readLine() do any caching? Should I try to do my own caching with a QIODevice:;readAll(). With a readAll how do I then parse the big QByteArray to get sub QByteArrays terminated on eol. Perhaps I should I drop down and use Posix calls for reading large files? One more thing concerning Qt:

In looking at the 5.x documentation there is a virtual method:
QIODevice::readLineData(char * data, qint64 maxSize)

This is in fact called by QIODevice::readLine, and the documentation has the following:

"Buffered devices can improve the performance of readLine() by reimplementing this function"

I guess my question is how can I reimplement this function to optimize for reading one line at a time. Thanks in advance for any help or suggestions. (Example code is always welcomed !)

sierdzio

"seems kind of slow" is quite a broad term ;-) Are you using the code from the snippet in "QFile docs":http://qt-project.org/doc/qt-5/QFile.html?

You can do readAll() and then split the result using "\n" as the delimiter (if you have set the QFile::Text flag first), although I am not sure you will gain anything by doing so.

DBoosalis

Slow being like 25 seconds to load. Yes I read it in a loop till EOF and process each line. The readAll should be faster as it reads the whole file in one or two big chunks. Then I can parse in memory which has got to be very fast. My question is that once I have the QByteArray from a readAll(), how do I parse it with the "\n" deliminator as you suggested. Do I create a QStringList ?

Thanks for taking the time to reply

-david

sierdzio

Ah, so I suspect your line processing is eating that time. But I might be wrong, of course.

Here:
@
QByteArray allData;
// ...
QStringList lines(QString(allData).split("\n"));

foreach (const QString &line, lines) {
// process each line
}
@

Should work. It might be that QByteArray also has split() method, I am not sure.

DBoosalis

[SOLVED] Thanks for your suggestions. I tried both ways: readAll() with line parses on the big byte array read in, and reading one line at a time. There was almost no difference (readAll() was actually a few hundred milliseconds slower on average for 100k records).

For my file, which loaded in 208,000 records, it took 119 seconds. Clearly the time is in the processing for which I will see if there is a way to speed up.

sierdzio

Right, happy coding! :-)