Python download without supplying a filename

15,177

Solution 1

Here is a complete way to do it with python3 and no filename specified in url:

from urllib.request import urlopen
from urllib.request import urlretrieve
import cgi

url = "http://cloud.ine.ru/s/JDbPr6W4QXnXKgo/download"
remotefile = urlopen(url)
blah = remotefile.info()['Content-Disposition']
value, params = cgi.parse_header(blah)
filename = params["filename"]
urlretrieve(url, filename)

In result you should get cargo_live_animals_parrot.jpg file

Solution 2

edited after the question was clarified...

urlparse.urlsplit will take the url that you are opening and split it into its component parts, then you can take the path portion and use the last /-delimited chunk as the filename.

import urllib, urlparse

split = urlparse.urlsplit(url)
filename = "/tmp/" + split.path.split("/")[-1]
urllib.urlretrieve(url, filename)

Solution 3

There is urlopen, which creates a file-like object that can be used to read the data without saving it to a local file:

from urllib2 import urlopen

f = urlopen("http://example.com/")
for line in f:
  print len(line)
f.close()

(I'm not really sure if this is what you're asking for.)

Solution 4

The URL you're specifying doesn't refer to a file at all. It's a redirect to a web page, that runs some javascript, that causes your web browser to download the file. The actual address my browser was directed to (a mirror) from the URL in question is:

http://mozilla.mirrors.evolva.ro//firefox/releases/3.6.3/win32/en-US/Firefox%20Setup%203.6.3.exe

I believe that there are two ways that web servers specify the name of the file for downloads;

  1. The final segment of the URL path
  2. The Content-Disposition header, which can specify some other filename to use

For the file you want to download I think you only need the last path segment of the URL (but using the actual URL of the file, not the web page that chooses which mirrored file to use). But for some downloads you'd need to get the filename to use from the Content-Disposition header.

Solution 5

I ended up with

os.system('wget -P /tmp http://www.mozilla.com/products/download.html?'
          'product=firefox-3.6.3&os=win&lang=en-US')
Share:
15,177
Samuel Taylor
Author by

Samuel Taylor

I have been using PHP for many years now.

Updated on June 03, 2022

Comments

  • Samuel Taylor
    Samuel Taylor almost 2 years

    How do I download a file with progress report using python but without supplying a filename.

    I have tried urllib.urlretrieve but I seem to have to supply a filename for the downloaded file to save as.

    So for example:

    I don't want to supply this:

    urllib.urlretrieve("http://www.mozilla.com/products/download.html?product=firefox-3.6.3&os=win&lang=en-US", "/tmp/firefox.exe")
    

    just this:

    urllib.urlretrieve("http://www.mozilla.com/products/download.html?product=firefox-3.6.3&os=win&lang=en-US", "/tmp/")
    

    but if I do I get this error:

    IOError: [Errno 21] Is a directory: '/tmp'
    

    Also unable to get the filename from some URL Example:

    http://www.mozilla.com/products/download.html?product=firefox-3.6.3&os=win&lang=en-US