How to grab all files in a folder and get their MD5 hash in python?

10,155

Solution 1

glob.glob returns a list of files. Just iterate over the list using for:

import glob
import hashlib

filenames = glob.glob("/root/PycharmProjects/untitled1/*.exe")

for filename in filenames:
    with open(filename, 'rb') as inputfile:
        data = inputfile.read()
        print(filename, hashlib.md5(data).hexdigest())

Notice that this can potentially exhaust your memory if you happen to have a large file in that directory, so it is better to read the file in smaller chunks (adapted here for 1 MiB blocks):

def md5(fname):
    hash_md5 = hashlib.md5()
    with open(fname, "rb") as f:
        for chunk in iter(lambda: f.read(2 ** 20), b""):
            hash_md5.update(chunk)
    return hash_md5.hexdigest()

for filename in filenames:
    print(filename, md5(filename))

Solution 2

I think in the end, you're opening only one empty file. The reason for that is that you take the list returned by glob and remove the list markers in its string representation (and only at both ends of the string as you use strip. This gives you something like:

file1.exe' 'file2.exe' 'file3.exe

You then give this string to open that will try to open a file called like that. In fact, I'm even surprised it works (unless you have only one file) ! You should get a FileNotFoundError.

What you want to do is iterate on all the files returned by glob.glob:

import glob
import hashlib
file = glob.glob("/root/PycharmProjects/untitled1/*.exe")

for f in file:
    with open(f, 'rb') as getmd5:
        data = getmd5.read()
        gethash = hashlib.md5(data).hexdigest()
        print("f: " + gethash)
Share:
10,155
Xozu
Author by

Xozu

Updated on June 07, 2022

Comments

  • Xozu
    Xozu almost 2 years

    I'm trying to write some code to get the md5 of every exe file in a folder.

    My problem is that I don't understand how to do it. It works only if the folder contains only one file. This is my code:

    import glob
    import hashlib
    file = glob.glob("/root/PycharmProjects/untitled1/*.exe")
    
    newf = str (file)
    newf2 =  newf.strip( '[]' )
    newf3 = newf2.strip("''")
    
    with open(newf3,'rb') as getmd5:
        data = getmd5.read()
        gethash= hashlib.md5(data).hexdigest()
        print gethash
    

    And I get the result:

    a7f4518aae539254061e45424981e97c
    

    I want to know how I can do it to more than one file in the folder.

  • Xozu
    Xozu about 8 years
    Thank you so much for your help !