Big SQLITE query select crashes

stormracer

Hello,
i'm very new to QT. I develop an application to display scientific data. I need to select only a subset of this data to create an image. For example i need to select all values which are over a threshold.
My idea is to write all this data to an sql database, because it's easy so select the interesting values. I choose an sqlite database for first.

With small amount of data, all is fine. But test with huge data are not successful.

For example i insert 167351352 rows.
But if i try to get this back it crashes with an:

segfault at 4 ip 00007f892445e1dd sp 00007ffd37b0cb50 error 6 in libQt5Sql.so.5.7.0[7f892443c000+43000]

after selecting 67106384 rows.

If i try to read this data with python it works. Therefor the database file is not corrupt.

This code is for reading the data back:

query.prepare("SELECT TOF FROM Neutrons WHERE X BETWEEN :xl AND :xr AND Y BETWEEN :yu AND :yd;");

query.bindValue(":xl", this->roi_xl);
query.bindValue(":xr", this->roi_xr);
query.bindValue(":yd", this->roi_yd);
query.bindValue(":yu", this->roi_yu);

bool ok = query.exec();
qDebug() <<  "Status: " << ok;
int c = 0;

while (query.next()) {
    c++;
}

The variable c is only for counting and the debugger shows the value 67106384 for c when it crashes.

Is there something wrong? This program crashes on two different pc at exactly the same place. The Database file has a size from 2,9 GB (2.914.419.712 Bytes). One special point i noticed is, when i divide the number of selected rows with 2^16 the result is nearly 1024, which is a magic number (pagesize or so).

My platform is Ubuntu 14.04 x64
Qt Creator 4.0.2 Based on Qt 5.7.0 (GCC 4.9.1 20140922 (Red Hat 4.9.1-10), 64 bit)

In python is a function called fetchmany(10000000) which querys a block of data. Is there something similar in QT?

Greetings
Christian

SGaist

Hi and welcome to devnet,

Can you check the memory consumption you have when running your application ?

Do you mean you call fetchmany repeatedly to get all the data ?

stormracer

Hi,

if i run my program with the Valgrind Analyser from QT Creator i get the problem "Invalid write of size 4" reported.
I've uploaded the reported messages as an image here Klick to Image.

The program need about 1.4GB memory in the Ubuntu task manager.

With Python i call the fetchmany function in a loop. It seems to be better for python to use this function. It returns a list with the given number of elements.

Christian

kshegunov

@stormracer
Hi,

get the problem "Invalid write of size 4" reported

We got that from:

segfault at 4 ip 00007f892445e1dd sp 00007ffd37b0cb50 error 6 in libQt5Sql.so.5.7.0[7f892443c000+43000]

The question @SGaist asked was rather to ascertain if you have enough memory to actually hold the dataset.

The program need about 1.4GB memory in the Ubuntu task manager.

How much do you have in total?

On a related note I'm not convinced you've chosen a very good way of storing your data. I don't expect querying a, or actually the data retrieval from a, 180 million rows table to be very fast.

Kind regards.

stormracer

Hello,

sorry for the missing information. My PC has 12GB Memory and i tested it with a new booted system and about 9GB of free memory. More than enough i think. During the execution of the query there is no significant increase in memory consumption.

For a very small test i use c code and the system sqlite3 library. With this program i'm able to get all data out of the database. It took about 40sec.

Do you know a better way to manage this data? For example i want to save graphic pixel with a color value. Therefor i created a table with x,y,color columns. Now i need all colors which are in a specific range and display them in a histogram. With sql it's easy to select only the necessary values.

Greetings,
Christian

SGaist

Do you mean you are trying to get the histogram from an image ?

kshegunov

@stormracer
Hello,

Do you know a better way to manage this data? For example i want to save graphic pixel with a color value. Therefor i created a table with x,y,color columns. Now i need all colors which are in a specific range and display them in a histogram. With sql it's easy to select only the necessary values.

Even with the most naive vector of structures and full fledged sorting/searching in it you'd most probably get better performance. You have plenty of memory, and the standard library provides a bunch of numerical algorithms, so I don't see a good reason to put all the data in a database, unless it's for permanent storage. Plus with QVector (or the other containers) you can easily have multithreaded code working on your dataset (the QtConcurrent framework).

Kind regards.

stormracer

@SGaist said:

Do you mean you are trying to get the histogram from an image ?

It's not a real image. It like data from a sensor which serves Coordinates in x and y and additional a specific value.

One part of the app is to display an 2D histogram with the coordinates. Than i select an region of interest in the 2D histogram and plot a 1D histogram with the third value which are in the limits of the selected coordinates.

@kshegunov, my development pc has this amount of memory. But for other pc i can't rely on this and i will get larger data files later. Therefor i search for a solution for managing large amount of data. My idea for sql was that database systems are made for efficient data storing and searching. I choose sqlite because it doesn't need a server in the background.

Kind regards
Christian

kshegunov

@stormracer
Hi,

my development pc has this amount of memory. But for other pc i can't rely on this and i will get larger data files later. Therefor i search for a solution for managing large amount of data. My idea for sql was that database systems are made for efficient data storing and searching. I choose sqlite because it doesn't need a server in the background.

Yes, somewhat, for general purpose data. Seeing you're searching for a set of particles in space (2D space?), and assuming you're wanting to mostly search and extract from the dataset not modifying it, I'd consider some sort of HDD swapped (or database backed) k-d tree. From my experience SQL databases behave rather poorly for scientific data, especially when the datasets are big. Also you could try getting a stripped down b-tree implementation (basically a database without the bells and whistles) that you can use for storage of the data and do the indexing/searching by yourself. Or the most basic approach is to work with the data as you'd do normally, but swap the dataset instead of holding it in memory (e.g. QCache comes to mind here).

PS.
Depending on how exactly you want to build the histogram you could also simply discretize the space (if I'm correct that you're working with floating point numbers for coordinates).