Where does Google Chrome save temporary pdf files?

29,575

Solution 1

Note: This no longer works since the Chrome disk cache format has changed


It's in that directory, all right. Just uses a random string for filenames. Test it out:

find ~/.cache/google-chrome -type f -exec file {} + | grep PDF

For example:

$ find .cache/google-chrome/ -type f -exec file {} + | grep PDF
.cache/google-chrome/Default/Cache/f_004bf0:       PDF document, version 1.5
.cache/google-chrome/Default/Cache/f_004c01:       PDF document, version 1.4

Solution 2

The Google Chrome cache directory $HOME/.cache/google-chrome/Default/Cache on Linux contains one file per cache entry named <16 char hex>_0 in "simple entry format":

  • 20 Byte SimpleFileHeader
  • key (i.e. the URI)
  • payload (the raw file content i.e. the PDF in our case)
  • SimpleFileEOF record
  • HTTP headers
  • SHA256 of the key (optional)
  • SimpleFileEOF record

You therefore cannot simply use file to determine the file type (it will just detect data), but must search for the PDF header. This will list all PDFs in the cache directory:

grep -Rl '%PDF' $HOME/.cache/google-chrome/Default/Cache

Note: This may give you false positives in case the string %PDF appears somewhere in a file which isn't a PDF

Note: If you're not using the default Chrome profile, replace Default with the profile name, e.g. Profile 1.

evince will happily read the cache file directly, without having to strip the header.

If you do want to extract the original PDF, save the following script as extractpdf.py:

def main(cachefile):
    with open(cachefile, 'rb') as f:
        s = f.read()
    with open(cachefile + '.pdf', 'wb') as f:
        f.write(s[s.find(b'%PDF'):s.rfind(b'%%EOF')+5])

if __name__ == '__main__':
    import sys
    main(sys.argv[1])

And call it as python3 extractpdf.py <cache file>

Share:
29,575

Related videos on Youtube

Jignesh
Author by

Jignesh

Updated on September 18, 2022

Comments

  • Jignesh
    Jignesh over 1 year

    I was wondering where does Google Chrome save pdf documents that it opens in the browser itself. I know those get deleted if we dont save them and close the browser.

    I looked into chrome temporary folder at ~/.cache/google-chrome, but couldnt find any pdf there.

  • Ricky Robinson
    Ricky Robinson almost 9 years
    Just wondering... didn't it all use to be stored in the /tmp folder?
  • muru
    muru almost 9 years
    @RickyRobinson not in the past couple of years, at least, I think. I've had problems with Chrome filling up student's quotas for at least that long.
  • JamesBB
    JamesBB about 7 years
    Does that find command search sub-directories? My ~/.cache/google-chrome/ folder has multiple sub-folders, each with multiple sub-folders, so I was wondering how deep it would search. I didn't find any PDF files, although I had just closed one, so is there anywhere else that Chrome might store files it's had open?
  • muru
    muru about 7 years
    @JamesBB as you can see from the output, it went at least 3 subdirectories deep. find recurses by default, unless you used the -maxdepth option.
  • rivu
    rivu almost 4 years
    Does this method work any more? I tried and all I see in chrome cache are data files. My version is 77.
  • kynan
    kynan over 3 years
    Nope, this doesn't work any more because the file format of the Chrome cache changed.