Some sort of Memory Leak with QtConcurrent::mapped?
-
Hey guys,
I have an application where I need to generate a QList<float> from a CSV-File in a effective and fast way. By choosing a file at the UI I trigger a second Thread to start the CSV-parsing. In our projects the files can contain millions of observations, the memory management is therefore very important.How I do it: I read the whole CSV-File into a QByteArray, turn it into a QString, split the single value-elements into a QStringList, and transform the strings to floats asynchronously with QtConcurrent::mapped and QFutureWatcher (to show the user the progress in a progressBar). The result should be presented as QList<float>.
E.g. a file with 15 Million values takes about 200MB, the program uses about 2GB of memory (peak), and after finishing the concurrent tasks the program takes 1GB of memory, and I can't quite understand, why. 15 Million values à 4 Byte = 60MB, nearly 1GB is not freed. Sooo do you know why?
@float DataSetImportWorker::stringToFloat(const QString & str)
{
return str.toFloat();
}void DataSetImportWorker::importCSV(const QString &fileName, bool firstLineNames)
{
QFile file(fileName, this);
if(!file.exists()) {
emit error("File doesn't exist!");
return;
}if(!file.open(QFile::ReadOnly)) { emit error("File couldn't be opened!"); return; } QByteArray byteContent = file.readAll(); QString content(byteContent); int lineIndex = content.indexOf("\n"); if(firstLineNames) { m_dataSet.names = content.mid(0, lineIndex).split(QRegExp("([ \t,;])")); } QStringList fields = content.mid(lineIndex+1, content.size()).split(QRegExp("([ \n\t,;])")); const QFuture<float> future = QtConcurrent::mapped(fields, &stringToFloat); m_watcher.setFuture(future); file.close();
}
void DataSetImportWorker::conversionFinished()
{
m_dataSet.data = m_watcher.future().results();
emit finished(m_dataSet);
}@ -
My guess is that this is not a memory leak but a memory fragmentation problem. Allocating 15 million QStrings the size of a few bytes can do that. Have you checked with some memory leak detection tool like valgrind or vld to see if anything actually leaks?
Anyway I don't think your approach is very good. Any benefits of QtConcurrent are probably overshadowed by the split and millions of allocations it causes and the memory requirement is tremendous.
I would allocate a few strings up front (eg. the same number as the cores/cpus available or some small multiplicity of it), call reserve() on them with expected size of the single data point and instead of splitting the string read the data one by one from it to these pre-allocated strings. Then run the concurrent conversion on the chunk and proceed to read in the next one.
The amount of memory would be just the initial string size + those few buffer strings. The time and memory you save on the millions of small allocations and de-allocations should be magnitudes larger than what you have.