How to get the MD5 hash of a file in C++?

223,650

Solution 1

Here's a straight forward implementation of the md5sum command that computes and displays the MD5 of the file specified on the command-line. It needs to be linked against the OpenSSL library (gcc md5.c -o md5 -lssl) to work. It's pure C, but you should be able to adapt it to your C++ application easily enough.

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <openssl/md5.h>

unsigned char result[MD5_DIGEST_LENGTH];

// Print the MD5 sum as hex-digits.
void print_md5_sum(unsigned char* md) {
    int i;
    for(i=0; i <MD5_DIGEST_LENGTH; i++) {
            printf("%02x",md[i]);
    }
}

// Get the size of the file by its file descriptor
unsigned long get_size_by_fd(int fd) {
    struct stat statbuf;
    if(fstat(fd, &statbuf) < 0) exit(-1);
    return statbuf.st_size;
}

int main(int argc, char *argv[]) {
    int file_descript;
    unsigned long file_size;
    char* file_buffer;

    if(argc != 2) { 
            printf("Must specify the file\n");
            exit(-1);
    }
    printf("using file:\t%s\n", argv[1]);

    file_descript = open(argv[1], O_RDONLY);
    if(file_descript < 0) exit(-1);

    file_size = get_size_by_fd(file_descript);
    printf("file size:\t%lu\n", file_size);

    file_buffer = mmap(0, file_size, PROT_READ, MAP_SHARED, file_descript, 0);
    MD5((unsigned char*) file_buffer, file_size, result);
    munmap(file_buffer, file_size); 

    print_md5_sum(result);
    printf("  %s\n", argv[1]);

    return 0;
}

Solution 2

You can implement the MD5 algorithm yourself (examples are all over the web), or you can link against the OpenSSL libs and use OpenSSL's digest functions. here's an example to get the MD5 of a byte array:

#include <openssl/md5.h>
QByteArray AESWrapper::md5 ( const QByteArray& data) {
    unsigned char * tmp_hash;
    tmp_hash = MD5((const unsigned char*)data.constData(), data.length(), NULL);
    return QByteArray((const char*)tmp_hash, MD5_DIGEST_LENGTH);
}

Solution 3

For anyone redirected from "https://stackoverflow.com/questions/4393017/md5-implementation-in-c" because it's been incorrectly labelled a duplicate.

The example located here works:

http://www.zedwood.com/article/cpp-md5-function

If you are compiling in VC++2010 then you will need to change his main.cpp to this:

#include <iostream> //for std::cout
#include <string.h> //for std::string
#include "MD5.h"

using std::cout; using std::endl;

int main(int argc, char *argv[])
{
    std::string Temp =  md5("The quick brown fox jumps over the lazy dog");
    cout << Temp.c_str() << endl;

    return 0;
}

You will have to change the MD5 class slightly if you are to read in a char * array instead of a string to answer the question on this page here.

EDIT:

Apparently modifying the MD5 library isn't clear, well a Full VC++2010 solution is here for your convenience to include char *'s:

https://github.com/alm4096/MD5-Hash-Example-VS

A bit of an explanation is here:

#include <iostream> //for std::cout
#include <string.h> //for std::string
#include <fstream>
#include "MD5.h"

using std::cout; using std::endl;

int main(int argc, char *argv[])
{
    //Start opening your file
    ifstream inBigArrayfile;
    inBigArrayfile.open ("Data.dat", std::ios::binary | std::ios::in);

    //Find length of file
    inBigArrayfile.seekg (0, std::ios::end);
    long Length = inBigArrayfile.tellg();
    inBigArrayfile.seekg (0, std::ios::beg);    

    //read in the data from your file
    char * InFileData = new char[Length];
    inBigArrayfile.read(InFileData,Length);

    //Calculate MD5 hash
    std::string Temp =  md5(InFileData,Length);
    cout << Temp.c_str() << endl;

    //Clean up
    delete [] InFileData;

    return 0;
}

I have simply added the following into the MD5 library:

MD5.cpp:

MD5::MD5(char * Input, long length)
{
  init();
  update(Input, length);
  finalize();
}

MD5.h:

std::string md5(char * Input, long length);

Solution 4

QFile file("bigimage.jpg");

if (file.open(QIODevice::ReadOnly))
{
    QByteArray fileData = file.readAll();

    QByteArray hashData = QCryptographicHash::hash(fileData,QCryptographicHash::Md5); // or QCryptographicHash::Sha1
    qDebug() << hashData.toHex();  // 0e0c2180dfd784dd84423b00af86e2fc

}

Solution 5

I needed to do this just now and required a cross-platform solution that was suitable for c++11, boost and openssl. I took D'Nabre's solution as a starting point and boiled it down to the following:

#include <openssl/md5.h>
#include <iomanip>
#include <sstream>
#include <boost/iostreams/device/mapped_file.hpp>

const std::string md5_from_file(const std::string& path)
{
    unsigned char result[MD5_DIGEST_LENGTH];
    boost::iostreams::mapped_file_source src(path);
    MD5((unsigned char*)src.data(), src.size(), result);

    std::ostringstream sout;
    sout<<std::hex<<std::setfill('0');
    for(auto c: result) sout<<std::setw(2)<<(int)c;

    return sout.str();
}

A quick test executable demonstrates:

#include <iostream>

int main(int argc, char *argv[]) {
    if(argc != 2) {
        std::cerr<<"Must specify the file\n";
        exit(-1);
    }
    std::cout<<md5_from_file(argv[1])<<"  "<<argv[1]<<std::endl;
    return 0;
}

Some linking notes: Linux: -lcrypto -lboost_iostreams Windows: -DBOOST_ALL_DYN_LINK libeay32.lib ssleay32.lib

Share:
223,650

Related videos on Youtube

user145586
Author by

user145586

Updated on July 05, 2022

Comments

  • user145586
    user145586 almost 2 years

    I've the file path. How can I get the MD5 hash of it?

    • warren
      warren over 14 years
      @silky - not really a helpful comment :) ..implementing MD5 from scratch is a really good way to get exposure to cryptographic algorithms and protocols, and since it's "known", you can instantly verify your code is right vs md5sum or similar
    • bobobobo
      bobobobo over 13 years
      @Noon Silk I think for the purpose here of making a unique signature for a file md5 should be adequate.
  • akira
    akira over 14 years
    when using Qt (as you do), i would rather just do return QCryptographicHash::hash(data, QCryptographicHash::Md5); as the body of the function...
  • harry
    harry about 14 years
    When it comes to security-related stuff, never write your own implementation if the stuff out there on the net will suffice. And every single possible implementation of MD4/5 is out there, so there's really no reason to write your own.
  • Chris K
    Chris K about 14 years
    on 32bit platforms, your mmap has a limit as to how large the file can be, though it is an elegant solution to the problem. On 32bit Windows, for example, you couldn't MD5 a DVD with this code.
  • Abyx
    Abyx almost 12 years
    -1, md5 != sha.
  • expert
    expert over 11 years
    @ChrisKaminski you can slide 4GB window of memory-mapped file on 32-bit platform.
  • Bob Miller
    Bob Miller about 11 years
    Excellent answer, it's helped me immensely. However, you don't call munmap afterward. It's no memory leak for you because the program ends immediately afterwards, but if some buffoon like myself copies the code and doesn't put in munmap, we get a memory leak in our program ;) The solution: munmap(file_buffer, file_size);
  • arkon
    arkon about 11 years
    @MahmoudAl-Qudsi Um yes there is, my professor doesn't let me plagiarize code.
  • quickly_now
    quickly_now about 11 years
    Not so good for files that are GB in size :)
  • RajaRaviVarma
    RajaRaviVarma almost 10 years
    For me gcc md5.c -o md5 -lcrypto this worked instead of -lssl on Ubuntu 14.04
  • uliwitness
    uliwitness almost 10 years
    @MahmoudAl-Qudsi When it comes to security-related stuff, never use MD5. MD5 is not a crypto-strength hash.
  • harry
    harry almost 10 years
    @uliwitness md5 was not my idea. It's OK to treat MD5 as a middling-fast non-crypto hash, but I agree that it is utterly broken as a crypto hash (and there are far better in terms of speed and hashing for non-crypto hashes).
  • uliwitness
    uliwitness almost 10 years
    @MahmoudAl-Qudsi I was just referring to your mention of "security-related stuff". The OP only said he wanted to get the MD5 of a file (probably for finding duplicates faster), so when you started talking of security, I thought I'd warn people off this thread if they want to do anything security-related, because MD5 is not it.
  • Brock Hensley
    Brock Hensley almost 9 years
    That is for a string, not a file
  • ALM865
    ALM865 almost 9 years
    Answer modified to include a file
  • user31389
    user31389 over 7 years
    It returns hashes different than other MD5 implementations. For example it hashes empty string to e4c23762ed2823a27e62a64b95c024e7 when it should be d41d8cd98f00b204e9800998ecf8427e. There's a related question there: stackoverflow.com/q/33989390/2436687
  • Timmmm
    Timmmm about 7 years
    Depending on openssl - a huge and gnarly library - for something as simple as MD5 seems like a bad idea to me.
  • D'Nabre
    D'Nabre about 7 years
    This is really old question that seems to get a bunch of comments, so just wanted to add some more info. This code was me transcribing MIPS assembly from an assignment I gave students back around 2005 or so. I made sure the C version compiled/linked without error but nothing more. So it's not the most robust is just the idea (so 32/64 bit stuff). Also it was over 10 years ago. 2016+ I'd definitely have looked for a smaller library for md5, but again it was for students originally and I wanted to follow how md5sum (something they are familiar it) and it was based on openssl at the time.
  • 463035818_is_not_a_number
    463035818_is_not_a_number almost 7 years
    some of your links are broken
  • Jonas
    Jonas over 6 years
    Can you please update the VC++2010 solution link.
  • ALM865
    ALM865 over 6 years
    links updated to a Git
  • Abdul Ahad
    Abdul Ahad over 6 years
    thank you. if(!exists(boost::filesystem::path(path))) {
  • Maggnetix
    Maggnetix about 6 years
    worked fine for me!
  • Anton K
    Anton K over 5 years
    Why don't you need to free(tmp_hash);? Is tmp_hash becomes a pointer to something global or what?