List all files in an online directory with Python?

15,299

Since you're trying to download a bunch of things at once, start by looking for a site index or a webpage that neatly lists everything you want to download. The mobile version of the website is usually lighter than the desktop and is easier to scrape.

This website has exactly what you're looking for: All Games.

Now, it's really quite simple to do. Just, extract all of the game page links. I use BeautifulSoup and requests to do this:

import requests
from bs4 import BeautifulSoup

games_url = 'http://www.primarygames.com/mobile/category/all/'

def get_all_games():
    soup = BeautifulSoup(requests.get(games_url).text)

    for a in soup.find('div', {'class': 'catlist'}).find_all('a'):
        yield 'http://www.primarygames.com' + a['href']

def download_game(url):
    # You have to do this stuff. I'm lazy and won't do it.

if __name__ == '__main__':
    for game in get_all_games():
        download_game(url)

The rest is up to you. download_game() downloads a game given the game's URL, so you have to figure out the location of the <object> tag in the DOM.

Share:
15,299
Terrii
Author by

Terrii

I am 19 Years old, I have been playing around with the coding world for about 5 to 6 years now. I make applications that really do nothing other than teach me all different types of code. I am also a web developer that is my strong point at the moment. It's also where i would like to see myself in a few years. Websites I Designed and Developed: Toms Working Opal Mine, http://tomsworkingopalmine.com Matt's Computing, http://mattscomputing.com New Hackham West Community Centre Website, http://mattscomputing.com/hwcc, Will be hosted at http://hwcc.net When Completed. Matthew Allen.

Updated on June 04, 2022

Comments

  • Terrii
    Terrii almost 2 years

    Hello i was just wondering i'm trying to create a python application that downloads files from the internet but at the moment it only downloads one file with the name i know... is there any way that i can get a list of files in an online directory and downloaded them? ill show you my code for downloading one file at a time, just so you know a bit about what i wan't to do.

    import urllib2
    
    url = "http://cdn.primarygames.com/taxi.swf"
    
    file_name = url.split('/')[-1]
    u = urllib2.urlopen(url)
    f = open(file_name, 'wb')
    meta = u.info()
    file_size = int(meta.getheaders("Content-Length")[0])
    print "Downloading: %s Bytes: %s" % (file_name, file_size)
    
    file_size_dl = 0
    block_sz = 8192
    while True:
        buffer = u.read(block_sz)
        if not buffer:
            break
    
        file_size_dl += len(buffer)
        f.write(buffer)
        status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
        status = status + chr(8)*(len(status)+1)
        print status,
    
    f.close()
    

    So what is does is it downloads taxi.swf from this website but what i want it to do is to download all .swf's from that directory "/" to the computer?

    Is it possible and thank you so much in advanced. -Terrii-

  • Terrii
    Terrii over 11 years
    thanks this is what i need but i have a problem and that is that, i don't know how to import beautifulsoup? i put my file.py into the beautifulsoup folder but i don't know what i am ment to do, am i ment to install beautifulsoup?
  • Blender
    Blender over 11 years
    @Terrii: Learn Python before using it. This is really basic stuff. BeautifulSoup is a module.
  • AHuman
    AHuman over 10 years
    How would it be possible to save the results of the find in a variable. I couldn't figure it out.