Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. General talk
  3. Brainstorm
  4. [Solved] Duplicate finder
Forum Update on Monday, May 27th 2025

[Solved] Duplicate finder

Scheduled Pinned Locked Moved Brainstorm
40 Posts 6 Posters 19.4k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    A Offline
    alex.dadaev
    wrote on 24 Jan 2011, 18:12 last edited by
    #1

    What should be my start point to make program that can find files with similar content, but named different?

    1 Reply Last reply
    0
    • D Offline
      D Offline
      DenisKormalev
      wrote on 24 Jan 2011, 18:30 last edited by
      #2

      If you want to find fully similar files then some hash will help you like md5 or sha1. If you want to find files that are different a bit but quiet similar in general then you should use something more complex like Levenshtein distance (if you compare files with some plain text) or some diff analysis (something like if diff is smaller than 10% of smallest size then they are similar enough).

      1 Reply Last reply
      0
      • G Offline
        G Offline
        goetz
        wrote on 24 Jan 2011, 18:50 last edited by
        #3

        Moved to Brainstorm forum, as it has nothing to do with Qt programming as such (initially moved to C++ Gurus, but it's not C++ related either, sorry).

        http://www.catb.org/~esr/faqs/smart-questions.html

        1 Reply Last reply
        0
        • D Offline
          D Offline
          DenisKormalev
          wrote on 24 Jan 2011, 18:51 last edited by
          #4

          Volker, I've thought where it should be moved too, but forgot about brainstorm and left it to someone who will have some ideas. Thanks.

          1 Reply Last reply
          0
          • A Offline
            A Offline
            alex.dadaev
            wrote on 24 Jan 2011, 18:59 last edited by
            #5

            Maybe i've misunderstood something, but how can md5 ot sha1 help me with finding dublicates? I thought that its cryptohraphic algorithms for government use :)

            1 Reply Last reply
            0
            • G Offline
              G Offline
              goetz
              wrote on 24 Jan 2011, 19:06 last edited by
              #6

              If two hashes are equal then it is very likely that the two files they are computed of bitwise equal content (although not guaranteed as the set of all possible file contents is indefinite, whereas the number of possible hashes is limited and thus there cannot exist an isomorphic relation between them). You can store the hashes in a database and search for duplicates in that. For "similar" files you'll have to go with Levenshtein or other means, as Denis stated.

              http://www.catb.org/~esr/faqs/smart-questions.html

              1 Reply Last reply
              0
              • A Offline
                A Offline
                alex.dadaev
                wrote on 24 Jan 2011, 19:28 last edited by
                #7

                But how can i store file info in a hash ?

                1 Reply Last reply
                0
                • D Offline
                  D Offline
                  DenisKormalev
                  wrote on 24 Jan 2011, 19:30 last edited by
                  #8

                  You can pass it contents through hash function and you will have it hash at the end.

                  1 Reply Last reply
                  0
                  • G Offline
                    G Offline
                    goetz
                    wrote on 24 Jan 2011, 19:43 last edited by
                    #9

                    [quote author="alex.dadaev" date="1295897294"]But how can i store file info in a hash ?[/quote]

                    You cannot. This "web page":http://lmgtfy.com/?q=hash+function has some explanations for you.

                    http://www.catb.org/~esr/faqs/smart-questions.html

                    1 Reply Last reply
                    0
                    • A Offline
                      A Offline
                      alex.dadaev
                      wrote on 24 Jan 2011, 20:25 last edited by
                      #10

                      Okay :)

                      1 Reply Last reply
                      0
                      • A Offline
                        A Offline
                        alex.dadaev
                        wrote on 24 Jan 2011, 20:50 last edited by
                        #11

                        Is there any way to use QHash methods in QCryptographicHash ?
                        I'd like to make a comparing table for files that i hash.

                        1 Reply Last reply
                        0
                        • T Offline
                          T Offline
                          tobias.hunger
                          wrote on 24 Jan 2011, 20:57 last edited by
                          #12

                          QHash is a hash table, a datastructure optimized for random access based on a key value.

                          QCryptographicHash is used to calculate cryptographic hash values from input data. They are completely different things:-)

                          So, no, you can not use QHash's methods in QCryptographicHash. Just use the result of a QCryptographicHash as a key to a QHash and you should be set. Just make sure to reset the QCryptographic hash whenever you are done with a file, or you will not get the same hash values for the same files (since the second one will still have all the data of the first one "prepended").

                          1 Reply Last reply
                          0
                          • G Offline
                            G Offline
                            goetz
                            wrote on 24 Jan 2011, 21:12 last edited by
                            #13

                            Yes, that's possible. You can use the following function as a start for your project:

                            @
                            #define MY_SHA1_BUFFER_SIZE 4096

                            QString getSha1HashFromFile( const QString &fn )
                            {
                            QCryptographicHash ch( QCryptographicHash::Sha1 );
                            QFile file( fn );
                            if( !file.open( QIODevice::ReadOnly ) )
                            return QString();

                            char buf[MY_SHA1_BUFFER_SIZE];
                            while( !file.atEnd() ) {
                                qint64 read = file.read( buf, MY_SHA1_BUFFER_SIZE );
                                ch.addData( buf, read );
                            }
                            file.close();
                            return QString( ch.result.toHex() );
                            

                            }
                            @

                            http://www.catb.org/~esr/faqs/smart-questions.html

                            1 Reply Last reply
                            0
                            • A Offline
                              A Offline
                              alex.dadaev
                              wrote on 25 Jan 2011, 11:44 last edited by
                              #14

                              how can i make QString from QFileInfoList ?
                              is there any possibilities to do that?

                              1 Reply Last reply
                              0
                              • G Offline
                                G Offline
                                giesbert
                                wrote on 25 Jan 2011, 12:19 last edited by
                                #15

                                QFileInfo::path() ???

                                Nokia Certified Qt Specialist.
                                Programming Is Like Sex: One mistake and you have to support it for the rest of your life. (Michael Sinz)

                                1 Reply Last reply
                                0
                                • G Offline
                                  G Offline
                                  goetz
                                  wrote on 25 Jan 2011, 12:19 last edited by
                                  #16

                                  No.

                                  QFileInfoList is a typedef for QList<QFileInfo>.

                                  You know how many properties a QFileInfo object describing a single file has, don't you?

                                  If you want a single string from these big bunch of information you will have to construct it yourself.

                                  http://www.catb.org/~esr/faqs/smart-questions.html

                                  1 Reply Last reply
                                  0
                                  • G Offline
                                    G Offline
                                    giesbert
                                    wrote on 25 Jan 2011, 12:30 last edited by
                                    #17

                                    Ok, forgpot to add the iteration by for(...)...

                                    Nokia Certified Qt Specialist.
                                    Programming Is Like Sex: One mistake and you have to support it for the rest of your life. (Michael Sinz)

                                    1 Reply Last reply
                                    0
                                    • A Offline
                                      A Offline
                                      alex.dadaev
                                      wrote on 25 Jan 2011, 12:35 last edited by
                                      #18

                                      but how can i construct a path of a single file?

                                      1 Reply Last reply
                                      0
                                      • G Offline
                                        G Offline
                                        goetz
                                        wrote on 25 Jan 2011, 12:41 last edited by
                                        #19

                                        Read the docs on "QFileInfo":http://doc.qt.nokia.com/stable/qfileinfo.html - we did it too. Everything you need is documented there. Yes, it takes some 5 minutes to read it all through, but if you're too lazy we can't help you. If you have concrete questions or problems with any of the methods, ask them.

                                        http://www.catb.org/~esr/faqs/smart-questions.html

                                        1 Reply Last reply
                                        0
                                        • G Offline
                                          G Offline
                                          giesbert
                                          wrote on 25 Jan 2011, 12:41 last edited by
                                          #20

                                          If you read the documentation, you would find it ...

                                          @
                                          QFileInfoList list;
                                          for(int i = 0; i < list.size(); ++i)
                                          {
                                          QString filePath = list[i].absoluteFilePath();
                                          {
                                          @

                                          Nokia Certified Qt Specialist.
                                          Programming Is Like Sex: One mistake and you have to support it for the rest of your life. (Michael Sinz)

                                          1 Reply Last reply
                                          0

                                          1/40

                                          24 Jan 2011, 18:12

                                          • Login

                                          • Login or register to search.
                                          1 out of 40
                                          • First post
                                            1/40
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • Users
                                          • Groups
                                          • Search
                                          • Get Qt Extensions
                                          • Unsolved