[Solved] Duplicate finder
-
Don't you think a hash computation is a little bit overkilling in the file duplicate determination?
Just read your files' contents into memory blocks and compare them with memcmp. If you may have big files it would be wiser to compare them block-by-block rather then the whole files at once.Upd. function name
-
How do you think, is it correct?
@
while(it.hasNext())
{
it.next();
if(it.peekPrevious().key()==it.peekNext().key())
std::cout<<it.peekPrevious().value()<<"="
<<it.peekNext().value()<<std::endl;
}
@ -
No it is not correct.
It compares only adjacent entries in your container.
If your container is a map (QMap or QHash) and you use insert() to populate it then you will get no duplicates at all, since every key occurs only once, hence the keys at different positions are all distinct.
You must use insertMulti() and values() to get a list of all entries with the same hash value. Or use QMultiMap/QMultiHash with the before mentioned methods.
-
Yes, you're right, thanks
-
i can't understand why it's not working :(
@
while(it.hasNext())
{
it.next();
if(it.key()==it.peekNext().key()) {
std::cout << "i've got you" << std::endl;
}
}
@PS: i've used insertMulti() to add item to hash, as you said
-
maybe like this?
@int compare_flag;while(it.hasNext()) { it.next(); compare_flag = QString::compare(it.key(),it.peekNext().key(),Qt::CaseSensitive); if(compare_flag==0) { std::cout << "i've got you" << std::endl; } }@
-
The keys in a (hash) map are always distinct. You will never find two identical keys so your comparison will never be true.
And even if you had identical keys in your container you would only find them if they are adjacent in the list.
But I'm going to have a kind of déjà-vu...
To make things clearer for us to understand: You do have a multi hash/multi map. What do you put in there and what do you expect to come out?
-
@QHash<QString,int> FilesHash;@
QString key is MD5
int value - just a number of fileon output i want to see the names of similar files
-
Ok, let's make things clearer step by step. Seems that you should make yourself comfortable with the concepts of a map.
A map (QHash is one) stores values associated with keys. Every key only exists once in the map - I wrote that several times, let's prove it:
@
QHash<QString, int> myHash;
myHash.insert("abc", 2);
myHash.insert("def", 3);
myHash.insert("abc", 5);qDebug() << "hash keys:" << myHash.keys();
QHash<QString, int> myMultiHash;
myMultiHash.insertMulti("abc", 2);
myMultiHash.insertMulti("def", 3);
myMultiHash.insertMulti("abc", 5);qDebug() << "multi hash keys:" << myMultiHash.keys();
@What will the output be?
What will happen if you compare every key with every other?
-
insertMulti allows you to store items with similar keys
-
without overwriting them
-
what do you think about it?
@
bool ok;
QHashIterator<QString,int> it(FilesHash);
QHashIterator<QString,int> begin(FilesHash);
QHashIterator<QString,int> end(FilesHash);
while(it.hasNext()) {
it.next();
begin = qLowerBound(FilesHash.begin(), FilesHash.end(), it.key());
end = qUpperBound(begin, FilesHash.end(), it.key());
iter = begin;
while(iter!=end) {
if(*i=*it) {
ok = true;
} else { ok = false; }
}
}
@ -
why i cannot do like this?
@QHashIterator<QString,int> iter(FilesHash);while(it.hasNext()) { it.next(); iter = qBinaryFind(FilesHash.begin(), FilesHash.end(), it.key()); }@
-
please note that the problem has been solved :)