How can I generate an MD5 hash in Java?

1,850

Solution 1

You need java.security.MessageDigest.

Call MessageDigest.getInstance("MD5") to get a MD5 instance of MessageDigest you can use.

The compute the hash by doing one of:

  • Feed the entire input as a byte[] and calculate the hash in one operation with md.digest(bytes).
  • Feed the MessageDigest one byte[] chunk at a time by calling md.update(bytes). When you're done adding input bytes, calculate the hash with md.digest().

The byte[] returned by md.digest() is the MD5 hash.

Solution 2

The MessageDigest class can provide you with an instance of the MD5 digest.

When working with strings and the crypto classes be sure to always specify the encoding you want the byte representation in. If you just use string.getBytes() it will use the platform default. (Not all platforms use the same defaults)

import java.security.*;

..

byte[] bytesOfMessage = yourString.getBytes("UTF-8");

MessageDigest md = MessageDigest.getInstance("MD5");
byte[] theMD5digest = md.digest(bytesOfMessage);

If you have a lot of data take a look at the .update(xxx) methods which can be called repeatedly. Then call .digest() to obtain the resulting hash.

Solution 3

If you actually want the answer back as a string as opposed to a byte array, you could always do something like this:

String plaintext = "your text here";
MessageDigest m = MessageDigest.getInstance("MD5");
m.reset();
m.update(plaintext.getBytes());
byte[] digest = m.digest();
BigInteger bigInt = new BigInteger(1,digest);
String hashtext = bigInt.toString(16);
// Now we need to zero pad it if you actually want the full 32 chars.
while(hashtext.length() < 32 ){
  hashtext = "0"+hashtext;
}

Solution 4

You might also want to look at the DigestUtils class of the apache commons codec project, which provides very convenient methods to create MD5 or SHA digests.

Solution 5

Found this:

public String MD5(String md5) {
   try {
        java.security.MessageDigest md = java.security.MessageDigest.getInstance("MD5");
        byte[] array = md.digest(md5.getBytes());
        StringBuffer sb = new StringBuffer();
        for (int i = 0; i < array.length; ++i) {
          sb.append(Integer.toHexString((array[i] & 0xFF) | 0x100).substring(1,3));
       }
        return sb.toString();
    } catch (java.security.NoSuchAlgorithmException e) {
    }
    return null;
}

on the site below, I take no credit for it, but its a solution that works! For me lots of other code didnt work properly, I ended up missing 0s in the hash. This one seems to be the same as PHP has. source: http://m2tec.be/blog/2010/02/03/java-md5-hex-0093

Share:
1,850
Sigmund Reed
Author by

Sigmund Reed

Front End engineer loving python &amp; lisp.

Updated on April 21, 2022

Comments

  • Sigmund Reed
    Sigmund Reed about 2 years

    I have used material from here and a previous forum page to write some code for a program that will automatically calculate the semantic similarity between consecutive sentences across a whole text. Here it is;

    The code for the first part is copy pasted from the first link, then I have this stuff below which I put in after the 245 line. I removed all excess after line 245.

    with open ("File_Name", "r") as sentence_file:
        while x and y:
            x = sentence_file.readline()
            y = sentence_file.readline()
            similarity(x, y, true)           
    #boolean set to false or true 
            x = y
            y = sentence_file.readline() 
    

    My text file is formatted like this;

    Red alcoholic drink. Fresh orange juice. An English dictionary. The Yellow Wallpaper.

    In the end I want to display all the pairs of consecutive sentences with the similarity next to it, like this;

    ["Red alcoholic drink.", "Fresh orange juice.", 0.611],
    
    ["Fresh orange juice.", "An English dictionary.", 0.0]
    
    ["An English dictionary.", "The Yellow Wallpaper.",  0.5]
    
    if norm(vec_1) > 0 and if norm(vec_2) > 0:
        return np.dot(vec_1, vec_2.T) / (np.linalg.norm(vec_1)* np.linalg.norm(vec_2))
     elif norm(vec_1) < 0 and if norm(vec_2) < 0:
        ???Move On???
    
    • Leif Gruenwoldt
      Leif Gruenwoldt almost 12 years
    • rustyx
      rustyx over 9 years
      MD5 might be unsafe as a one-way security feature, but it is still good for generic checksum applications.
    • Admin
      Admin over 7 years
      dict.has_key has been deprecated for nearly a decade, now: docs.python.org/3.0/whatsnew/3.0.html#builtins
    • Sigmund Reed
      Sigmund Reed over 7 years
      Sorry so is the the only problem and if so how can I fix it? Probably a stupid q. but I'm really new to Python.
    • Admin
      Admin over 7 years
      My previous comment contained a link. Click on the link. Look at the page contained therein. Read the bullet point about dict.has_key().
    • Admin
      Admin over 7 years
      Hint: what is meant by "dict.has_key() has been deprecated" is that you can no longer call the has_key method on a dictionary. Instead, use the in membership operator. docs.python.org/3/reference/…
    • Sigmund Reed
      Sigmund Reed over 7 years
      Hi, I apologize but Python is still very new for me. I swapped hypernyms_2.has_key(lcs_candidate): for hypernyms_2.in(lcs_candidate): it said invalid syntax
    • Admin
      Admin over 7 years
      That's because in is an operator, not a method. Try lcs_candidate in hypernyms_2
    • Sigmund Reed
      Sigmund Reed over 7 years
      Sorry again, I fixed that stuff (thank you so much) but then I get this. Look in the comments please.
    • Admin
      Admin over 7 years
      I suspect that's caused by dividing by zero somewhere... Also, cosine similarity is built in to SciPy: docs.scipy.org/doc/scipy/reference/generated/…
    • Sigmund Reed
      Sigmund Reed over 7 years
      What would you suggest to fix that mess? Preferably without using scipy and sticking to the code I have already.
    • Admin
      Admin over 7 years
      Check to make sure that neither of vec_1 nor vec_2 are the zero vector (ie have length zero) before calculating the cosine similarity. Just use if/else...ie if the norms of the vectors are both positive, then you're good to go, otherwise...well, skip that pair or throw an exception or...do what you want to do.
    • Admin
      Admin over 7 years
      If you don't want to use SciPy to calculate the cosine similarity, then that's fine, too...calculating the dot product and dividing by the product of the norms works as well. Just make sure that both of the norms are positive.
    • Admin
      Admin over 7 years
      Also, it's worth pointing out that you only got a warning, not an exception (ie your code kept going). Testing on my end indicates that np.nan (ie NumPy's nan value--nan meaning "not a number") would be returned when vec_1 or vec_2 have a norm of zero.
    • Sigmund Reed
      Sigmund Reed over 7 years
      This is going to be really annoying but I'm a linguistics professor with minimum to no Python experience, how would this be done? I realize how sickening I am but I can't find any other help on short notice. Also nothing was returned not even nan.
    • Admin
      Admin over 7 years
      Well, what do you want to do if you encounter a vector with norm zero when computing the cosine similarities? Throw an error and quit? Silently continue with the next pair (assuming that you're computing these inside some for loop, which may or may not be the case)? That's not a question that I can answer. You have to decide the flow of logic for your code.
    • Admin
      Admin over 7 years
      You can also just let the warnings be thrown and deal with the nan values in the output afterwards.
    • Sigmund Reed
      Sigmund Reed over 7 years
      I tried something in the comments, it is obviously erroneous. Also don't know how to implement.
    • Admin
      Admin over 7 years
      Norms of vectors are never negative.... So, your elif norm(vec_1) < 0 and if norm(vec_2) < 0: can just be an else:
    • Admin
      Admin over 7 years
      Also, if norm(vec_1) > 0 and if norm(vec_2) > 0: is invalid syntax. anh.cs.luc.edu/python/hands-on/3.1/handsonHtml/…
    • Admin
      Admin over 7 years
      Incidentally, I don't know what you're using to write your code, but you might want to use an IDE (integrated development environment) or text editor with the ability to point out simple syntax errors. I'd recommend PyCharm: jetbrains.com/pycharm (there's a free and not-free edition...the free edition will be more than adequate for what you're trying to do).
  • Akshay
    Akshay over 15 years
    Could you point me to some resources, where i can read about relative merits and weaknesses of each?
  • Bombe
    Bombe over 15 years
    “LATIN1” != “ASCII” (or “US-ASCII”). ASCII is a 7-bit character set, Latin1 is an 8-bit character set. They are not the same.
  • Rob
    Rob over 15 years
    In particular, the methods which return "safe" encoded representations of the byte data in string form.
  • Piskvor left the building
    Piskvor left the building over 15 years
    (see joelonsoftware.com/articles/Unicode.html for much better rationale and explanation)
  • Spidey
    Spidey almost 14 years
    @BalusC: Not true, the BigInteger.toString method will return the full number in the base specified. 0x0606 will be printed as 606, just trailing zeros are omitted,
  • squiddle
    squiddle over 13 years
    If you use Apache Commons Codec anyway you can use: commons.apache.org/codec/api-release/org/apache/commons/code‌​c/…
  • David Leppik
    David Leppik about 13 years
    SHA1 is overkill unless you want a cryptographically secure hash, i.e. you don't want the hash to help in reconstructing the original message, nor do you want a clever attacker to create another message which matches the hash. If the original isn't a secret and the hash isn't being used for security, MD5 is fast and easy. For example, Google Web Toolkit uses MD5 hashes in JavaScript URLs (e.g. foo.js?hash=12345).
  • David Leppik
    David Leppik about 13 years
    Minor nitpick: m.reset() isn't necessary right after calling getInstance. More minor: 'your text here' requires double-quotes.
  • bluish
    bluish about 13 years
    I would replace last line with this: String result = Hex.encodeHexString(resultByte);
  • Paŭlo Ebermann
    Paŭlo Ebermann almost 13 years
    You should specify the encoding to be used in getBytes(), otherwise your code will get different results on different platforms/user settings.
  • iuiz
    iuiz almost 13 years
    However there is no easy way to get the DigestUtils class into your project without adding a ton of libs, or porting the class "per hand" which requires at least two more classes.
  • sparkyspider
    sparkyspider over 12 years
    Can't find it in maven repos either. Grrrr.
  • Nick Spacek
    Nick Spacek over 12 years
    Should be in the central Maven repositories, unless I'm going crazy: groupId=commons-codec artifactId=commons-codec version=1.5
  • Jeremy Huiskamp
    Jeremy Huiskamp over 12 years
    What makes you think file integrity is not a security issue?
  • weekens
    weekens about 12 years
    This topic is also useful if you need to convert the resulting bytes to hex string.
  • kriegaex
    kriegaex almost 12 years
    Oh BTW, before anyone except for myself notices how bad my JRE knowledge really is: I just discovered DigestInputStream and DigestOutputStream. I am going to edit my original solution to reflect what I have just learned.
  • Nilzor
    Nilzor over 11 years
    Why has this answer -1 while the other, shorter and less descriptive answer has +146?
  • mjuarez
    mjuarez over 11 years
    One thing that's not mentioned here, and caught me by surprise. The MessageDigest classes are NOT thread safe. If they're going to be used by different threads, just create a new one, instead of trying to reuse them.
  • Dave.B
    Dave.B over 11 years
    Nice using BigInteger to get a hex value +1
  • kovica
    kovica about 11 years
    I just found out that in some cases this only generates 31 characters long MD5 sum, not 32 as it should be
  • Heshan Perera
    Heshan Perera about 11 years
    @kovica this is because, the starting zeros get truncated if I remember right.. String.format("%032x", new BigInteger(1, hash)); This should solve this. 'hash' is the byte[] of the hash.
  • Bombe
    Bombe about 11 years
    It uses multiple methods to mutate its internal state. How can the lack of thread safety be surprising at all?
  • Blaze Tama
    Blaze Tama over 10 years
    @PaŭloEbermann does MessageDigest.getInstance("MD5"); not enough? I tried to add "MD5" in getBytes() but it returned an error
  • Paŭlo Ebermann
    Paŭlo Ebermann over 10 years
    @BlazeTama "MD5" is not an encoding, it is a message digest algorithm (and not one which should be used in new applications). An encoding is an algorithm pair which transforms bytes to strings and strings to bytes. An example would be "UTF-8", "US-ASCII", "ISO-8859-1", "UTF-16BE", and similar. Use the same encoding as every other party which calculates a hash of this string, otherwise you'll get different results.
  • Ajax
    Ajax over 10 years
    This is a solid, standalone library with minimal dependencies. Good stuff.
  • alex
    alex about 10 years
    and String.format("%1$032X", big) to have an uppercase format
  • Dan Barowy
    Dan Barowy almost 10 years
    @Bombe: why should we expect to have to know about MessageDigest's internal state?
  • Bombe
    Bombe almost 10 years
    @DanBarowy well, you are mutating it (i.e. calling methods that do not return values but cause other methods to return different values) so until proven otherwise you should always assume that it’s not thread-safe to do so.
  • Richard
    Richard over 9 years
    For an example of the character set... (use UTF-8, that is the best and most compatible in my opinion)... byte[] array = md.digest(md5.getBytes(Charset.forName("UTF-8")));
  • rwitzel
    rwitzel over 9 years
    This is the method that provides the same return value as the MySQL function md5(str). A lot of the other answers did return other values.
  • EpicPandaForce
    EpicPandaForce over 9 years
    This doesn't work right on Android because Android bundles commons-codec 1.2, for which you need this workaround: stackoverflow.com/a/9284092/2413303
  • Gelldur
    Gelldur about 9 years
    This answer has bug with charset type!
  • Jannick
    Jannick almost 9 years
    This is probably the worst solution as it strips leading zeros.
  • user253751
    user253751 almost 9 years
    @Traubenfuchs MessageDigest allows you to input the data in chunks. That wouldn't be possible with a static method. Although you can argue they should have added one anyway for convenience when you can pass all the data at once.
  • ASA
    ASA almost 9 years
    Makes sense. I guess you wouldn't always want to move around byte arrays with multiple Gigabytes! Still, just let it take a stream.
  • supernova
    supernova almost 9 years
    @HeshanPerera How come you mentioned in your answer "getting a String representation back from an MD5 hash"!!? But your code shows logic to convert String to Md5 hash. If I am not wrong MD5 hash is a one way algorithm and it can't be converted back to original String.
  • albanx
    albanx over 8 years
    and what about if I want the pure string ?
  • Kurt Alfred Kluever
    Kurt Alfred Kluever over 8 years
    or using one of the shortcut methods: Hashing.md5().hashString("my string").asBytes();
  • Daniel Kamil Kozar
    Daniel Kamil Kozar over 8 years
    @albanx : there is no such thing as a "pure string", unless you meant the serialized contents of the Java object itself. Please refer to the previously posted link to Joel On Software.
  • albanx
    albanx over 8 years
    @DanielKamilKozar I needed the hex string to save in db. dac2009 has posted the solution for this
  • Nacho
    Nacho about 8 years
    Welcome to StackOverflow, you might want to read how to post an answer before doing so. Give a bit of context explaining why you posted that code and what does it do. Also consider taking the time to format your answer to be easily understood by readers.
  • Justin
    Justin about 8 years
    @KurtAlfredKluever don't forget to insert the charset like 'Hashing.md5().hashString("my string", Charsets.UTF_8).asBytes()'
  • bkrish
    bkrish about 8 years
    I found it very useful. It took 15357 ms for a 4.57GB file whereas java inbuilt implementation took 19094 ms.
  • kbolino
    kbolino about 8 years
    @Traubenfuchs and what would it do with the bytes that it read from that stream, throw them away?
  • ASA
    ASA about 8 years
    I believe back when I wrote this I thought about an "finalized" & "ready for consumption" InputStream that would be fully drained by the static method. Any necessary state would be saved in the method body.
  • Markus Pscheidt
    Markus Pscheidt about 8 years
    Great. It doesn't fall into the trap of cutting leading zeros.
  • Joabe Lucena
    Joabe Lucena over 7 years
    As @CedricSimon said, that's exactly what I was looking for. Upvote here.. Thanks!
  • Sigmund Reed
    Sigmund Reed over 7 years
    Hello, I added the code but I got these errors (written in the question)
  • holmis83
    holmis83 over 7 years
    Only one-liner I've seen that doesn't use an external library.
  • Isuru
    Isuru about 7 years
    Is this Kotlin language?
  • walshie4
    walshie4 almost 7 years
    Unless I'm mistaken this returns always in uppercase which will not align with md5's made without using hex. Not even really sure it is a true md5
  • gildor
    gildor over 6 years
    @Isuru looks like Scala
  • Ilya Serbis
    Ilya Serbis over 6 years
    actually it accepts not only MD5 bytes array (size == 16). You can pass byte array of any length. It will be converted to MD5 bytes array by means of MD5 MessageDigest (see nameUUIDFromBytes() source code)
  • Humphrey
    Humphrey over 6 years
    This was very easy and simple I would recommend this for all of the visitors
  • Humphrey
    Humphrey over 6 years
    Then how do u convert this thedigest to a string so that we can insert it in mysql ?
  • Fran Marzoa
    Fran Marzoa over 6 years
    This does not answer the question, it's just a couple of links. stackoverflow.com/help/how-to-answer
  • Fran Marzoa
    Fran Marzoa over 6 years
    Beware this won't work for Android if you're using API level < 19, but you just need to change the second line with md5.update(string.getBytes("UTF-8")); This will add yet another checked exception to handle, though...
  • James
    James about 6 years
    BTW: The performance of this is much better then using BigInteger to create the hex string representation.
  • Hummeling Engineering BV
    Hummeling Engineering BV over 5 years
    Better yet, where possible use yourString.getBytes(StandardCharsets.UTF_8). This prevents handling an UnsupportedEncodingException.
  • tom
    tom about 5 years
    From Java 11 on, you can use hashtext = "0".repeat(32 - hashtext.length()) + hashtext instead of the while, so the editors won't give you a warning that you're doing string concatenation inside a loop.
  • dac2009
    dac2009 almost 5 years
    Since its not my solution, and I didnt test all scenarios myself, I will leave it unchanged, although I think specifiying encoding etc is probably a good idea.
  • JGFMK
    JGFMK over 4 years
    This seems far superior. You don't even have to capture as many exceptions either.
  • user1819780
    user1819780 about 4 years
    Instead of m.update(plaintext.getBytes()); I would recommend specifying the encoding. such as m.update(plaintext.getBytes("UTF-8")); getBytes() does not guarantee the encoding and may vary from system to system which may result in different MD5 results between systems for the same String.
  • Arundale Ramanathan
    Arundale Ramanathan almost 4 years
    This was very useful. I was having problems with MessageDigest.getInstance("MD5").
  • Logesh S
    Logesh S almost 3 years
    Worked perfectly for Gravatar's email MD5 hash!, Thank you