Basic http file downloading and saving to disk in python?

330,851

Solution 1

A clean way to download a file is:

import urllib

testfile = urllib.URLopener()
testfile.retrieve("http://randomsite.com/file.gz", "file.gz")

This downloads a file from a website and names it file.gz. This is one of my favorite solutions, from Downloading a picture via urllib and python.

This example uses the urllib library, and it will directly retrieve the file form a source.

Solution 2

For Python3+ URLopener is deprecated. And when used you will get error as below:

url_opener = urllib.URLopener() AttributeError: module 'urllib' has no attribute 'URLopener'

So, try:

import urllib.request 
urllib.request.urlretrieve(url, filename)

Solution 3

As mentioned here:

import urllib
urllib.urlretrieve ("http://randomsite.com/file.gz", "file.gz")

EDIT: If you still want to use requests, take a look at this question or this one.

Solution 4

Four methods using wget, urllib and request.

#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile as profile
import urllib
import wget


url = 'https://tinypng.com/images/social/website.jpg'

def testRequest():
    image_name = 'test1.jpg'
    r = requests.get(url, stream=True)
    with open(image_name, 'wb') as f:
        for chunk in r.iter_content():
            f.write(chunk)

def testRequest2():
    image_name = 'test2.jpg'
    r = requests.get(url)
    i = Image.open(StringIO(r.content))
    i.save(image_name)

def testUrllib():
    image_name = 'test3.jpg'
    testfile = urllib.URLopener()
    testfile.retrieve(url, image_name)

def testwget():
    image_name = 'test4.jpg'
    wget.download(url, image_name)

if __name__ == '__main__':
    profile.run('testRequest()')
    profile.run('testRequest2()')
    profile.run('testUrllib()')
    profile.run('testwget()')

testRequest - 4469882 function calls (4469842 primitive calls) in 20.236 seconds

testRequest2 - 8580 function calls (8574 primitive calls) in 0.072 seconds

testUrllib - 3810 function calls (3775 primitive calls) in 0.036 seconds

testwget - 3489 function calls in 0.020 seconds

Solution 5

I use wget.

Simple and good library if you want to example?

import wget

file_url = 'http://johndoe.com/download.zip'

file_name = wget.download(file_url)

wget module support python 2 and python 3 versions

Share:
330,851

Related videos on Youtube

arvindch
Author by

arvindch

"A man may die, nations may rise and fall, but an idea lives on." - John F. Kennedy

Updated on October 27, 2021

Comments

  • arvindch
    arvindch over 2 years

    I'm new to Python and I've been going through the Q&A on this site, for an answer to my question. However, I'm a beginner and I find it difficult to understand some of the solutions. I need a very basic solution.

    Could someone please explain a simple solution to 'Downloading a file through http' and 'Saving it to disk, in Windows', to me?

    I'm not sure how to use shutil and os modules, either.

    The file I want to download is under 500 MB and is an .gz archive file.If someone can explain how to extract the archive and utilise the files in it also, that would be great!

    Here's a partial solution, that I wrote from various answers combined:

    import requests
    import os
    import shutil
    
    global dump
    
    def download_file():
        global dump
        url = "http://randomsite.com/file.gz"
        file = requests.get(url, stream=True)
        dump = file.raw
    
    def save_file():
        global dump
        location = os.path.abspath("D:\folder\file.gz")
        with open("file.gz", 'wb') as location:
            shutil.copyfileobj(dump, location)
        del dump
    

    Could someone point out errors (beginner level) and explain any easier methods to do this?

    Thanks!

    • Charlie Parker
      Charlie Parker almost 3 years
      note if you are downloading from pycharm note that who knows where the "current folder is"
  • arvindch
    arvindch over 10 years
    urllib will work, however, many people seem to recommend the use of requests over urllib. Why's that?
  • arvindch
    arvindch over 10 years
    Ok, thanks! But is there a way to get it working through requests?
  • dparpyani
    dparpyani over 10 years
    requests is extremely helpful compared to urllib when working with a REST API. Unless, you are looking to do a lot more, this should be good.
  • arvindch
    arvindch over 10 years
    Ok, now I've read the links you've provided for requests usage. I'm confused about how to declare the file path, for saving the download. How do I use os and shutil for this?
  • John Lapoya
    John Lapoya over 10 years
    Any possibility to save in /myfolder/file.gz ?
  • Flash
    Flash about 10 years
    For Python3: import urllib.request urllib.request.urlretrieve(url, filename)
  • Dharmit
    Dharmit almost 10 years
    No better possibility than trying it yourself, maybe? :) I could successfully do testfile.retrieve("http://example.com/example.rpm", "/tmp/test.rpm").
  • Aashish Thite
    Aashish Thite over 9 years
    I am not able to extract the http status code with this if the download fails
  • Arash Saidi
    Arash Saidi over 9 years
    @Dharmit Is there a way to close that file? I mean, I want to download a file, do something to it, then delete it. However, when I try to delete it with os.remove(path/file) I get error: no such file or directory
  • mateor
    mateor over 8 years
    This should probably be urllib.urlretrieve or urllib.URLopener().retrieve, unclear which you meant here.
  • Azeezah M
    Azeezah M almost 8 years
    Why do you import csv if you're just naming a file?
  • Admin
    Admin over 7 years
    @ArashSaidi A little bit late to the party, but when you open it you could try using (in this case) testfile.close() to close te file before deleting, but when i tested i didn't get the same error
  • MichielB
    MichielB over 7 years
    This is deprecated since Python 3.3, and the urllib.request.urlretrieve solution (see answer below) is the 'modern' way
  • Estefy
    Estefy almost 7 years
    What is the best way to add a username and password to this code? tks
  • Qudit
    Qudit over 6 years
    urlretrieve is part of the legacy interface and the Python 3 docs state that it may be deprecated in the future.
  • Abdelhak
    Abdelhak almost 6 years
    How did you get the number of function calls?
  • carte blanche
    carte blanche over 5 years
    am looking for the same on how to add username and password ?how to authenticate?
  • wowkin2
    wowkin2 over 4 years
    Weird... Why nobody votes for this answer when Python 2 became deprecated and only this solution should work properly...
  • Yechiel K
    Yechiel K over 4 years
    Agreed! I was pulling my hair over the earlier solutions. Wish I could upvote 200 times!
  • Michael Schnerring
    Michael Schnerring almost 4 years
    Don't you have to req.iter_content()? Or use the req.raw file object? See this
  • DaWe
    DaWe almost 4 years
    No, it just works, haven't you tried? @MichaelSchnerring
  • Charlie Parker
    Charlie Parker almost 3 years
    get this error: AttributeError: module 'urllib' has no attribute 'URLopener'
  • Charlie Parker
    Charlie Parker almost 3 years
    doesn't work for me: AttributeError: module 'urllib' has no attribute 'urlretrieve
  • Charlie Parker
    Charlie Parker almost 3 years
    how do indicate which folder/path to save the contents of the url?
  • Charlie Parker
    Charlie Parker almost 3 years
    how do indicate which folder/path to save the contents of the url?
  • Charlie Parker
    Charlie Parker almost 3 years
    how do indicate which folder/path to save the contents of the url?
  • Charlie Parker
    Charlie Parker almost 3 years
    note if you are downloading from pycharm note that who knows where the "current folder is"
  • Charlie Parker
    Charlie Parker almost 3 years
    note if you are downloading from pycharm note that who knows where the "current folder is"
  • Charlie Parker
    Charlie Parker almost 3 years
    note if you are downloading from pycharm note that who knows where the "current folder is"