How to remove elements from QStringList by comparing each other members.

VRonin

What if files Test2 and Test5 are identical and the others are unique?
do you want to compare just adiecent files in the list?

Ayush Gupta

I need to compare each file in list with another and remove the file if file has same contents and doing string comparison the name of the file which is less should be deleted.

Ayush Gupta

@VRonin Can you help regardin this?

mrjj

@Ayush-Gupta

Hi
How big are each file ?
Can they all fit in memory at same time?

What is the data inside ?

Often one can use a checksum
to see if files are the same but it depends on the data as
sometimes a sequence of letters can have the same
checksum as another even we logically would say they are not the same.

Ayush Gupta

@mrjj
I have the code to check if file are same or not.
My problem is I am able to filter name from QStringList. I tried index also.
Suppose there are 6 files in list and I need to compare each file ( for that I have code) which will do the thing.
Then I need to compare the file name if fileName1 > fileName2 then delete fileName2 from list if comparison is equal.

jsulm

@Ayush-Gupta https://doc.qt.io/qt-5/qlist.html#erase

mrjj

Hi
so if
fileName1 and fileName2
is the same.
You must delete the file with the highest number in its name ?

Ayush Gupta

@mrjj yes

mrjj

@Ayush-Gupta

Do you only have files 1 to max 9?

    QString filename1 = "somename1";
    QString filename2 = "somename2";

     int fno1=filename1.right(1).toInt();
     int fno2=filename2.right(1).toInt();
     
     if (fno1 < fno2 ) ....

else you need better extraction.

VRonin

Something like this?

// QVector<QFile> fileList;
// bool equalFiles(const QFile&,const QFile&);
bool foundI;
for(auto i=fileList.begin(), maxJ = fileList.end(), maxI=maxJ-1;i!=maxI;){
    foundI=false;
    for(auto j=i+1;j!=maxJ && !foundI;){
        if(equalFiles(*i,*j)){
            if(i->fileName()>j->fileName())
                j = fileList.erase(j);
            else
                foundI=true;
        }
        else
            ++j;
    }
    if(foundI)
        i=fileList.erase(i);
    else
        ++i;
}

Chris Kawa

@VRonin That's a brute force N! complexity algorithm on file contents. That's horrible.

@Ayush-Gupta Create a [hash]->[filename] map. Read each file in sorted order and calculate their hash (MD5, SHA-1 or whatever works for you). Put the file name in the map using that hash and your resulting map will contain all the filenames with unique contents.

Christian Ehrlicher

@Chris-Kawa Before creating the hash I would compare the filesize (and hope that they all differ :) )

Chris Kawa

@Christian-Ehrlicher Good idea. You could use the size as initial hash value and only do a "full" hash for hash conflict resolution (caching the result of course).

Ayush Gupta

@Chris-Kawa Thanks