Find the hash value of an Image?



  • hi all,

    i am trying to compare 2 images in an efficient manner using Qt/c++, how can i find the hash of an image?

    i found this class QCryptographicHash that can be used but i am not sure, if this can be used then how do i convet the image say a Qimage to QByteArray?

    Is there any thing similar available in c++?

    thank u



  • Hi, check this:

    @QImage image(...);
    QByteArray byteArray;
    QBuffer buffer(&byteArray);
    image.save(&buffer, "PNG");
    QString pictureBase64 = QString::fromUtf8(byteArray.toBase64().data());
    MainWindow::QMD5(pictureBase64);
    @

    MD5:
    @
    QString MainWindow::QMD5(QString data)
    {
    QCryptographicHash md5(QCryptographicHash::Md5);
    md5.addData(data.toUtf8());
    return md5.result().toHex().constData();
    }
    @



  • hey thanx, will try it.


  • Moderators

    Shine: That seems rather wasteful to me.

    You are reading a bunch of bytes from disk

    then you decode those according to the image format used

    Afterwards you encode the image again into another bunch of bytes encoded in the PNG-way

    Those bytes are then converted into yet another set of bytes (base64 encoding)

    That is then converted again (fromUtf8). The latter conversion does only add \0 bytes in every second place as the base64 encoding made sure you only have ASCII characters anyway, so you needlessly double the size of the data.

    This is then passed on to QMD5 by value, so you actually copy the data

    The method halves the size of the data again by using yet another conversion (toUtf8), stripping out the \0 bytes you added earlier.

    Only then do you calculate the hash.

    Why don't you just pass the QByteArray obtained in step 1. and use that to generate the MD5 sum. You can even check the results using the md5sum application of your OS that way.



  • This might be a stupid question, but why would computing two hashes and comparing the hashes be faster than comparing the actual contents of the images? Intuitively, computing the hashes would take much more time than comparing the images. Or am I missing something?



  • I don't think that hash-based comparing will be faster as well. Assuming that the computational work (and time) required to build the hashes or to compare two blocks of memory is minute compared to work (and time) needed to transfer the data from the disk to memory (which has to be done in both cases) there will be no significant speed-up.

    However, having hashes allows you to cache them. Comparing images whose hashes are known will be significantly faster.



  • Tobias Hunger, yes I think it's wasteful too, just fast solution with influence of current work.
    ludde, he may have list of hashes, no need to calculate them each time.



  • But if you are comparing against a hash table, you probably do want to decode the image first. However, I do not think cryptographic hashes are suitable for this task at all. harshita is not clear what the goal of the comparison is though. If the goal is to compare just the raw bytes (it really does not matter that it is an image), then it would work fine. But if the goal is to see if you already have this image, then he will need something better. There are techniques that create fingerprints from pictures that allow you to find the same picture in a collection of files, even if that picture has been resized or re-encoded in a different format. Those are not (directly) supported by Qt though.

    Anyway: harshita, what do you want to achieve exactly?



  • hi thank you all,
    my goal is verify if 2 images are similar or not. What Andre spoke of creating finger prints is very close to what i want to achieve. Going to the byte level was a thought undertaken to eleminate performance issues,
    if u have a better solution kindly post i am quite new to Qt and image processing so any help will be appreciated.

    The code posted by shrine doesnt seem to work it gives the same hash value for all the images.

    thanx again



  • hi all, i have written this code to compare two images using hash value. The byteArray that is used to hold the image is empty and thus each tym the hash value for any image happen to be the same. Is there somthing wrong in loading the image? please suggest
    @int main(int argc, char *argv[]){
    QApplication a(argc, argv);
    QImage image;
    image.load("C:/Documents and Settings/All Users/Documents/My Pictures/Sample Pictures/Winter.jpg");
    QByteArray byteArray;
    QBuffer buffer(&byteArray);
    image.save(&buffer, "PNG");
    QString pictureBase64 = QString::fromUtf8(byteArray.toBase64().data());
    MainWindow b;
    QString hash=b.QMD5(pictureBase64);

    QImage image2;
    image2.load("C:/Documents and Settings/All Users/Documents/My Pictures/Sample Pictures/Winter.jpg");
    QByteArray byteArray2;
    QBuffer buffer2(&byteArray);
    image2.save(&buffer2, "PNG");
    QString pictureBase642 = QString::fromUtf8(byteArray2.toBase64().data());
    QString hash2=b.QMD5(pictureBase642);

    if(hash2==hash)
    qDebug()<<"images match";
    else
    qDebug()<<"images dont match";
    return a.exec();
    }

    /**********************************/
    QString MainWindow::QMD5(QString data) {
    QCryptographicHash md5(QCryptographicHash::Md5);
    md5.addData(data.toUtf8());
    return md5.result().toHex().constData();
    }@



  • I merged in your second topic on this issue. Please stick to one topic per issue, and one issue per topic.


  • Moderators

    Wow, you really used that terribly inefficient piece of code:-)



  • [quote author="Tobias Hunger" date="1316759296"]Wow, you really used that terribly inefficient piece of code:-)[/quote]

    One should add that

    [quote author="harshita" date="1316666518"]my goal is verify if 2 images are similar or not. What Andre spoke of creating finger prints is very close to what i want to achieve.[/quote]

    isn't achieved in any way.

    Two images are similar if they represent the same visual information - independent of the format, compression and resolution they are stored. A hash will inform you about the exact similarity of two blocks of binary data - with no connection to the visual information they represent.

    Saving picture.jpg as picture.bmp will result in completely different hashes, but similar images. Changing a single pixel from eg. "black" to "almost black" will result in completely different hashes, but similar images.

    Hashes (or hash functions) are designed to be as injective as possible and to produce well distributed and non-colliding values. A single bit changed in the input data (should and) will result in at least half of the bits changed in the hash. Slightly different blocks of binary data will not generate slightly different hashes - they will generate completely different hashes. There is no "fuzzy" in hash.

    Image fingerprinting is a science on its own. Some less sophisticated approches are listed "here":http://stackoverflow.com/questions/596262/image-fingerprint-to-compare-similarity-of-many-images.


  • Moderators

    Lukas: You are right. The first can be solved by doing the image-loading and conversion-to-png that the code is doing actually doing. So those conversions are not completely useless. Thone to base64, utf16 and back still are useless:-)

    The second problem can not be addressed with any hash-based approach. You are right there.



  • Thankyou all 4 bringing me on track... i wud really appreciate if someone wud give me a better idea?
    i can either use c++/qt



  • [quote author="harshita" date="1316776020"]
    Thankyou all 4 bringing me on track... i wud really appreciate if someone wud give me a better idea?
    i can either use c++/qt

    [/quote]

    Lukas already pointed out a discussion on stackoverflow. Did you read that already?



  • [quote author="Tobias Hunger" date="1316772194"]Lukas: You are right. The first can be solved by doing the image-loading and conversion-to-png that the code is doing actually doing. So those conversions are not completely useless. [/quote]

    Actually, I would not assume that an image save in format A and in format B will be exactly the same anymore after loading and decoding them again, unless it was explictly clear that the formats are lossless. The format most often used (and used in the sample code from harshita) to store pictures (.jpg) is not. So even for that case, it is useless.

    Anyway, I think Lukas pointed at an excellent starting point for this in the slashdot. I am actually quite interested in this myself, as I have some ideas for an application that would need to use this functionality too.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.