Either `QFileInfo` caching does not work, or just its methods are unreasonable (very) slow
methods that take the file time are very slow.Sorting by
with[](const QFileInfo& a, const QFileInfo& b) { return a.lastModified() < b.lastModified(); }
Takes up to 500+ times more time that it should take. It's sick.
This is a widely used code, by the way: https://github.com/search?q=sort+QFileInfo+lastModified&type=code
For example, sorting of 30000 files takes
ms, while if I storelastModified()
value (QDateTime
) in my custom data struct (the creating of the list takes100
ms), then sorting of it takes2
ms.Even if I run the sorting of
again it takes3300
ms, whilelastModified()
values of eachQFileInfo
must be cached, and the list is already sorted.
should use caching of OS API response, so usingsetCaching(true)
is unnecessary (Even if I use it, it changes nothing).
So either caching does not work, or the problem is with something other.
BTW, I find the new method
useless.- Again, it does help with this problem.
- Why it returns
? I sure it should work the same way as in others languages, it should return a data struct similars to one is returned by https://en.wikipedia.org/wiki/Stat_(system_call) +birth_time
(like it is in Node.js).
Okay, here is the code:
#include <QString> #include <QFileInfo> #include <QDir> #include <QElapsedTimer> class Timer { private: inline static QMap<QString, QElapsedTimer> map; public: static void start(QString name) { QElapsedTimer timer; timer.start(); Timer::map.insert(name, timer); } static int elapsed(QString name) { if (!Timer::map.contains(name)) { qDebug().noquote() << "[timer][" + name + "]: Not found."; } int time = Timer::map.take(name).elapsed(); qDebug().noquote() << "[timer][" + name + "]:" << time << "ms"; return time; } }; class MyFileInfo { public: QString path; QDateTime mtime; qint64 size; MyFileInfo(QFileInfo &fileInfo) { this->path = fileInfo.filePath(); this->mtime = fileInfo.lastModified(); this->size = fileInfo.size(); } }; void bencQFileInfo(QList<QFileInfo> &fileInfoList) { Timer::start("sort(QList<QFileInfo>)"); std::sort(fileInfoList.begin(), fileInfoList.end(), [](const QFileInfo& a, const QFileInfo& b) { return a.lastModified() < b.lastModified(); } ); Timer::elapsed("sort(QList<QFileInfo>)"); } void bencMyFileInfo(QList<MyFileInfo> &myFileInfoList) { Timer::start("sort(QList<MyFileInfo>)"); std::sort(myFileInfoList.begin(), myFileInfoList.end(), [](const MyFileInfo& a, const MyFileInfo& b) { return a.mtime < b.mtime; } ); Timer::elapsed("sort(QList<MyFileInfo>)"); } // Get QFileInfo list sorted by names (the default OS sorting) QList<QFileInfo> getFileInfos(QDir &dir) { Timer::start("entryInfoList"); QList<QFileInfo> fileInfoList = dir.entryInfoList(); Timer::elapsed("entryInfoList"); return fileInfoList; } int main() { QString path = "C:\\HERE_IS_THE_FOLDER"; QDir dir(path); if (!dir.exists()) { qDebug() << "DIR DOES NOT EXIST:" << path; } dir.setFilter(QDir::Files | QDir::Hidden | QDir::NoDotAndDotDot); QList<QFileInfo> fileInfoList; // 120 ms fileInfoList = getFileInfos(dir); // 3700-4000 ms bencQFileInfo(fileInfoList); // 3300 ms (sorting of the already sorted list) bencQFileInfo(fileInfoList); // 0 ms — At least it works well fileInfoList = getFileInfos(dir); // 90-100 ms Timer::start("myFileInfoList"); QList<MyFileInfo> myFileInfoList; for (QFileInfo &fileInfo : fileInfoList) { myFileInfoList << MyFileInfo(fileInfo); } Timer::elapsed("myFileInfoList"); // 2-6 ms bencMyFileInfo(myFileInfoList); // 1-2 ms (sorting of the already sorted list) bencMyFileInfo(myFileInfoList); qDebug() << "--------------------"; // 0 ms fileInfoList = getFileInfos(dir); // 1080 ms (just consumes a lot of time, does nothing) Timer::start("stat"); for (QFileInfo &fileInfo : fileInfoList) { fileInfo.stat(); // fileInfo.setCaching(true); // Adds 30-40 ms, does not change anything (since it's the default value). } Timer::elapsed("stat"); // 90-95 ms Timer::start("lastModified"); for (QFileInfo &fileInfo : fileInfoList) { fileInfo.lastModified(); } Timer::elapsed("lastModified"); // 180-185 ms Timer::start("lastModified x2"); for (QFileInfo &fileInfo : fileInfoList) { fileInfo.lastModified(); fileInfo.lastModified(); } Timer::elapsed("lastModified x2"); // 270-305 ms Timer::start("lastModified x2, birthTime"); for (QFileInfo &fileInfo : fileInfoList) { fileInfo.lastModified(); fileInfo.lastModified(); fileInfo.birthTime(); } Timer::elapsed("lastModified x2, birthTime"); }
The example output:
[timer][entryInfoList]: 115 ms [timer][sort(QList<QFileInfo>)]: 3922 ms [timer][sort(QList<QFileInfo>)]: 3324 ms [timer][entryInfoList]: 0 ms [timer][myFileInfoList]: 94 ms [timer][sort(QList<MyFileInfo>)]: 3 ms [timer][sort(QList<MyFileInfo>)]: 1 ms -------------------- [timer][entryInfoList]: 0 ms [timer][stat]: 1086 ms [timer][lastModified]: 92 ms [timer][lastModified x2]: 183 ms [timer][lastModified x2, birthTime]: 270 ms
Upd: Windows 10, Qt 6.2.
QFileInfo should use caching of OS API response, so using setCaching(true) is unnecessary (Even if I use it, it changes nothing).
You have no idea how much any "OS API" caching of file information might or might not achieve time-wise.
So either caching does not work, or the problem is with something other.
One possibility might be if
caching has a limit on how many files it retains in cache. You talk about 30,000 files. That might exceed the cache size and render it ineffective. If you bring it down to, say, 100 files how do the timings compare?BTW, I find the new method
useless.And what would that be? You don't say anything about what version of Qt you are using.
Might be relevant if the OP stated this, maybe this is a Qt6 issue?Why it returns void? I sure it should work the same way as in others languages, it should return a data struct similars to one is returned by
Why should it? It is a class instance method, it fills the necessary class members which can then be accessed through the Qt class's members interface instead of some potentially OS-dependent structure for which Qt would have to introduce support. Seems fine to me. You seem to be pretty angry.
Says:When caching is enabled, QFileInfo reads the file information from the file system the first time it's needed, but generally not later.
Caching is enabled by default.It clearly says that it
requests the data only once, on the first demand. It's the default behaviour.for (QFileInfo &fileInfo : fileInfoList) { fileInfo.lastModified(); }
So, the expected behaviour, that the second run of this code will take
ms, but no, it takes90
ms each run.
There is no any note about limits.
Qt 6.2. If it was Qt 5.x the code would not compile.
Okay, 100 files:
[timer][entryInfoList]: 0 ms [timer][sort(QList<QFileInfo>)]: 4 ms [timer][sort(QList<QFileInfo>)]: 3 ms [timer][entryInfoList]: 0 ms [timer][myFileInfoList]: 0 ms [timer][sort(QList<MyFileInfo>)]: 0 ms [timer][sort(QList<MyFileInfo>)]: 0 ms -------------------- [timer][entryInfoList]: 0 ms [timer][stat]: 1 ms [timer][lastModified]: 0 ms [timer][lastModified x2]: 0 ms [timer][lastModified x2, birthTime]: 0 ms
ms to sort, and3
ms to sort already sorted, while it should take "0
" ms.200:
[timer][entryInfoList]: 1 ms [timer][sort(QList<QFileInfo>)]: 10 ms [timer][sort(QList<QFileInfo>)]: 8 ms [timer][entryInfoList]: 0 ms [timer][myFileInfoList]: 0 ms [timer][sort(QList<MyFileInfo>)]: 0 ms [timer][sort(QList<MyFileInfo>)]: 0 ms -------------------- [timer][entryInfoList]: 0 ms [timer][stat]: 4 ms [timer][lastModified]: 0 ms [timer][lastModified x2]: 1 ms [timer][lastModified x2, birthTime]: 1 ms
for (QFileInfo &fileInfo : fileInfoList) { fileInfo.lastModified(); }
ms. Howeverfor (QFileInfo &fileInfo : fileInfoList) { fileInfo.lastModified(); fileInfo.lastModified(); }
ms.For 400 files:
ms and2
ms.If there is a "limit", it is
. -
QFileInfo::stat() does what it should and I don't see why it should return some kind of a structure because the structure is the QFileInfo instance.
I can reproduce the non-caching of the filetimes - also on linux, need some time to see what's going on.
QFileInfo::stat() does what it should and I don't see why it should return some kind of a structure because the structure is the QFileInfo instance.
?It looks that Qt just ignores that data (while it can return them), so if
returns that data it would be useful.It works such way, for example:
- In Node.js: https://nodejs.org/api/fs.html#fsstatpath-options-callback
- In Python: https://docs.python.org/3/library/os.html#os.stat
uid, gid
Yes - https://doc.qt.io/qt-5/qfileinfo.html#groupId and https://doc.qt.io/qt-5/qfileinfo.html#ownerId
The rest is to OS-specific as @JonB aleady said.
Ok, I think the caching works, it's the QDateTime creation. You can prove that when you add
for (QFileInfo &fileInfo : fileInfoList) { fileInfo.setCaching(false); }
After you gathered the file information.
Looks like it's the QDateTime conversion from UTC to localTime here. This is done after the cache and therefore executed every time. RemovingtoLocalTime()
gives very impressive results:with .toLocalTime() [timer][sort(QList<MyFileInfo>)]: 35 ms [timer][sort(QList<MyFileInfo>)]: 30 ms [timer][sort(QList<QFileInfo>)]: 176 ms [timer][sort(QList<QFileInfo>)]: 169 ms without .toLocalTime() [timer][sort(QList<MyFileInfo>)]: 3 ms [timer][sort(QList<MyFileInfo>)]: 2 ms [timer][sort(QList<QFileInfo>)]: 5 ms [timer][sort(QList<QFileInfo>)]: 3 ms
Source code: https://code.woboq.org/qt5/qtbase/src/corelib/io/qfileinfo.cpp.html#_ZNK9QFileInfo8fileTimeEN11QFileDevice8FileTimeE
QDateTime::toLocalTime() calls some OS tz - Functions and one of this is calling getenv() which is the bottleneck here on Linux. Think it's a similar thing on Windows.
Most likely it is.
Since the caching code is pretty simple*: https://github.com/qt/qtbase/blob/f29566c5a41c127eacaf13f3dbfe4624e55bc83f/src/corelib/io/qfileinfo.cpp#L191-L218
It checks does there a value in the array, if it does, returns it, so the performance should be the same as when I read it from
*Also it caches ONLY the requested time, so if I need all 4 time values (M, B, C, A) I need to ask OS 4 times, while most likely the OS call returns all 4 times once. It's not optimal in terms of performance.
Using of that
method is overkill, since it is slower that calls of 4 methods (lastModified
). Currently with that 30000 files directorystat
ms, onlylastModified
ms, all 4 times —360
ms. -
I looked at the code that @Christian-Ehrlicher has been looking at. Unfortunately you cannot influence/replace the conversion to localtime each time you call one of the filetime functions, and it uses internal Qt calls you cannot access so it's not possible to write a drop in replacement.So unless you want to wait for a possible fix --- which I think would mean them changing quite some code they may not be prepared to do --- if this speed issue breaks you you are left with are you prepared to use your own structure like you show so that you can hold the converted times for the files?
*Also it caches ONLY the requested time, so if I need all 4 time values (M, B, C, A) I need to ask OS 4 times, while most likely the OS call returns all 4 times once. It's not optimal in terms of performance.
You should really look and understand the code before ranting about something. You're wrong: https://code.qt.io/cgit/qt/qtbase.git/tree/src/corelib/io/qfilesystemengine_unix.cpp#n377
Most likely it is.
Not only most likely - see my numbers.
Here again numbers for 30k files. With QDateTime::toLocalTime() 1.3s, without 0.07s.
Here the bug report: https://bugreports.qt.io/browse/QTBUG-100349
It looks you run the code from the bug report in the debug mode.
When I run your code in the debug mode:
getFileInfos: 62 #files: 30000 myFileInfoList 93 bencMyFileInfo 19 bencQFileInfo 2925
The difference is
times.But when I run it in the release mode:
getFileInfos: 64 #files: 30000 myFileInfoList 93 bencMyFileInfo 2 bencQFileInfo 2879
sorting is1439
times faster. -
It looks you run the code from the bug report in the debug mode.
I don't see what it should matter - the problem was identified and it the testcase properly shows it.
Quite old but in qt6.6 QDateTime::lastModified() gained a new parameter 'QTimeZone' where you can pass QTimeZone::UTC to avoid conversion to the local time.
See also https://codereview.qt-project.org/c/qt/qtbase/+/437009 -
Only for
?The problem exists with the other times too.
For example,
are affected with this issue too. -
https://bugreports.qt.io/browse/QTBUG-100349 states:Should be fixed in Qt 6.6 https://codereview.qt-project.org/c/qt/qtbase/+/437009
You can now pass a QTimeZone to any of the QFileInfo file times related methods to specify which time zone returned times should be in
So should apply to e.g.
too, but awaits Qt6.6.. -
