how do you get the password hash of a zip file?

65,980

Solution 1

Password hashes are not extracted from the file. John the Ripper (JtR) does not really do this. You can download the code for JtR to figure out how it is done. Here is a little article on how to get the so called "non-hash" info for .zip and .rar files using zip2john and rar2john: http://www.cybercrimetech.com/2014/07/how-to-cracking-zip-and-rar-protected.html

To correct the misnomer, JtR does not actually "retrieve the hash" from the file. It extracts certain information from the file, for example, as documented in the rar2john code:

Output Line Format:
 *
 * For type = 0 for files encrypted with "rar -hp ..." option
 * archive_name:$RAR3$\*type\*hex(salt)\*hex(partial-file-contents):type::
       ::archive_name
 *
 * For type = 1 for files encrypted with "rar -p ..." option
 * archive_name:$RAR3$\*type\*hex(salt)\*hex(crc)\*PACK_SIZE\*UNP_SIZE\*0\*
       archive_name\*offset-for-ciphertext\*method:type::file_name
 *
 * or
 *
 * archive_name:$RAR3$\*type\*hex(salt)\*hex(crc)\*PACK_SIZE\*UNP_SIZE\*1\*
       hex(full encrypted file)\*method:type::file_name

So, as shown above, the "password hash" is not extracted. Also, it is a complete fail to believe that the file is "entirely" encrypted (as suggested by others answering similar questions). Instead, critical unencrypted and encrypted file items, such as the salt, are retrieved to generate a "non-hash". These items are used by JtR with various password guesses to test decryption. It uses the zip or rar password hash generation functions to create a hash from the guess that is in turn used to generate the crypt key values. The generated crypt key values are then used to test against a small, extracted, and well-defined portion of the encrypted file.

So, while JtR is not "extracting a password hash" that can be sent to any ol' password-hash-checker-rainbow-table-lookup-thingy, it is doing the next best thing--extracting critical cracking information. The steps to crack are essentially: 1) a hash is generated from a password guess, 2) a few extra steps are added to check that decryption succeeds or fails (a lot of fails), and 3) repeat. What makes rar cracking so difficult is a different salt for each rar file and, more importantly, the large and variable number of hash iterations that are needed before the decryption test can be performed. The newer zip process is similar, but the iterations are not variable--last I checked--making it somewhat easier.

This is the nut of "how it is done" as asked and the answer is "you don't" get the real password hash of a zip file until just before the file is cracked.

The example from the question's CTF exercise is misleading. The given "hash" could have been a simple password hash prepared for the exercise to simplify the cracking process for the student by any ol' cracker OR it could have been a specific zip2john "non-hash" that resulted in a fairly easy password for JtR to guess--short, common, or both. The questioner did not provide a "hash" or "hash file" to verify either way.

Solution 2

Why would there need to be a hash? The zipped file is compressed and then encrypted. This does not require storing a hash in the file because it's not authenticating, it's decrypting. The only thing that may be stored in the file is a salt, depending on the encryption used.

Solution 3

Even though I'm not sure how its done, John the Ripper (JtR) has a little executable (zip2john) that creates a hash out of a zip file. Since the code's open you could take a look at how this is extracted. Of course that's assuming that the file was encrypted with the PKZIP encryption (so it didn't work on files createt by WinRar for example).

I tried it on simple files that were zipped with 7zip and where simple passwords were used and JtR equipped with a decent wordlist cracked it in ms. The (optionally) freely given wordlists here did the trick.

Cheers

Solution 4

One of the principle developers of John The Ripper sheds some light on how cracking the password of archives work, specifically pkzip2 archives.

This is my enquiry and their reply on the JtR mailing list:

On 2021-06-03 10:28, Matthias Braun wrote:

After reading the description of zip2john's output in the source file (https://github.com/openwall/john/blob/bleeding-jumbo/src/zip2john.c), this is my current assumption: Since the archive doesn't contain the hash of the password used for encrypting, per default, JtR tries a wordlist to decrypt the data stored at the DA position in zip2john's output, runs CRC32 on the result and compares that checksum with the checksum stored at the CR position.

But in the case of cracking the password for a large file, I'd expect decryption to take quite a while, making each password guess costly.

You are basically correct.

How does JtR crack pkzip2 archives?

First of all, zip2john simply picks the smallest file from the archive in order to minimize the work. Sometimes there is no small file though.

Second trick: pkzip archives also include another, smaller, checksum (just 1 or 2 bytes) of a tiny part of the packed data of each file. If that doesn't match, we can early reject. But if it does match (and unfortunately a single byte checksum will randomly match in about 1 of 256 cases) we do have to checksum the whole (potentially large) file.

So third trick in our bag is we collect several such smaller checksums (for up to 8 files), and require all of them to match before going on and calculate a whole-file CRC. If the number of files is smaller, we may end up with less than 8 of course. On a side note, theoretically the files in one archive may use different passwords and that invalidates this trick (in fact we'll probably get false negatives). That is fairly uncommon so by default we assume all have the same password.

So best case for us is an archive with 8 or more files, and at least one small file. Worst case is an archive with only one, huge, file.

magnum

Share:
65,980
Fallenreaper
Author by

Fallenreaper

Hacker, Programmer, software Dev, Software Engineer.. see my Twitter and website for more details.

Updated on June 05, 2021

Comments

  • Fallenreaper
    Fallenreaper almost 3 years

    I was doing a capture the flag sample, and they gave me the file's hash. I ran it through John The Ripper and I got my password for the sample.

    The thing is, the hash was given to me. Is there a way to find the hash of that file if it weren't given?

    There are two points that need to be resolved to beat a password: The hash and the encryption algorithm.

    I have been practicing with password protected zip files, so I figured it was using zip 2.0 or something or some sort of AES, I can figure that out with the help of Google.

    What I can't find out however, is how to access the metadata for the zip file for that hashed data. Since it is a standard, zipping something doesn't need to have the algorithm in it for when storing the password. However, there needs to be some sort sort of hash.

    Does anyone know how, with this example of a password protected zip file, this is done?

    I'm using a MacBook Pro right now.

  • Fallenreaper
    Fallenreaper about 11 years
    I was at the following website: p0wnlabs.com/free/passwordcracking and was learning some basic CTF material. One of the things is that, in that site, it gave the hash of the password for a zip file, and then.... you needed to figure out the password. After that test, i was curious if, instead of being given the hash, if i could obtain it... that way i could circumvent the requirement of given information.
  • Eric
    Eric about 11 years
    @Fallenreaper You're misreading the challenge. The challenge is that they want you to find the clear text data that generated the hash. The zip file is simply a proof that will provide you with a prize (the image) if you're successful. They could have just as easily had you type the password into a website for verification. Think of this, if a zip file stored the hash, why not just replace the hash with a hash of a known word?
  • Fallenreaper
    Fallenreaper about 9 years
    So far, i have compiled a list of 23million passwords as a file to pass into this sort of thing. Hoping that the password is in there vs the bruteforce method. Might make it a bit quicker.
  • Matthias Braun
    Matthias Braun almost 3 years
    While your answer makes it clear that zip2john doesn't produce the password's hash, the article you linked to is under the misapprehension that there is a password hash: "This will give you files that contain the password hashes to be cracked"
  • AtesComp
    AtesComp almost 3 years
    @MatthiasBraun, yes it does, which is why I pointed out that it is a misnomer. The article states that due to the fact that JohnTheRippers original design was to read password hash files (shadow files and other hash collections you could create by hand, see google.com/amp/s/www.cyberciti.biz/faq/…) and "crack" them for the clear text password. For zip and rar files, they use the same format to hold the so called "non-hashes" as well.
  • Danish Xavier
    Danish Xavier about 2 years
    What is the advantage of the above method. Why do we need hashes. We can simply open the file in the python using default zip module and then pass the plain text password from a big file list and if no error happens that means the password is cracked. Why go all the trouble of creating hash ? Am I missing something ?
  • Danish Xavier
    Danish Xavier about 2 years
    What is the advantage of the above method. Why do we need hashes. We can simply open the file in the python using default zip module and then pass the plain text password from a big file list and if no error happens that means the password is cracked. Why go all the trouble of creating hash ? Am I missing something ?
  • somada141
    somada141 about 2 years
    @DanishXavier the reason people prefer the hash approach is that it's infinitely faster to hash a candidate plain-text password and compare the hashes than a process such as what you proposed above. I think depending on the hardware you can try billions of potential passwords per second which would not be the case opening and testing the file through code.