Get the code from inspect element using Python

17,896

One way to get at the source code of a web page is to use the Beautiful Soup library. A tutorial of this is shown here. The code from the page is shown below, the comments are mine. This particular code does not work as the contents have changed on the site it uses as an example, but the concept should help you to do what you want to do. Hope it helps.

from bs4 import BeautifulSoup
from urllib2 import urlopen

BASE_URL = "http://www.chicagoreader.com"

def get_category_links(section_url):
    # Put the stuff you see when using Inspect Element in a variable called html.
    html = urlopen(section_url).read()    
    # Parse the stuff.
    soup = BeautifulSoup(html, "lxml")    
    # The next two lines will change depending on what you're looking for. This 
    # line is looking for <dl class="boccat">.  
    boccat = soup.find("dl", "boccat")    
    # This line organizes what is found in the above line into a list of 
    # hrefs (i.e. links). 
    category_links = [BASE_URL + dd.a["href"] for dd in boccat.findAll("dd")]
    return category_links

EDIT 1: The solution above provides a general way to web-scrape, but I agree with the comments to the question. The API is definitely the way to go for this site. Thanks to yuvi for providing it. The API is available at https://github.com/500px/PxMagic.


EDIT 2: There is an example of your question regarding getting links to popular photos. The Python code from the example is pasted below. You will need to install the API library.

import fhp.api.five_hundred_px as f
import fhp.helpers.authentication as authentication
from pprint import pprint
key = authentication.get_consumer_key()
secret = authentication.get_consumer_secret()

client = f.FiveHundredPx(key, secret)
results = client.get_photos(feature='popular')

i = 0
PHOTOS_NEEDED = 2
for photo in results:
    pprint(photo)
    i += 1
    if i == PHOTOS_NEEDED:
        break
Share:
17,896
EspenG
Author by

EspenG

Updated on August 21, 2022

Comments

  • EspenG
    EspenG almost 2 years

    In the Safari browser, I can right-click and select "Inspect Element", and a lot of code appears. Is it possible to get this code using Python? The best solution would be to get a file with the code in it.

    More specifically, I am trying to find the links to the images on this page: http://500px.com/popular. I can see the links from "Inspect Element" and I would like to retrieve them with Python.

  • EspenG
    EspenG almost 10 years
    When I run: html = urlopen("500px.com/popular").read(), I can't find the links.. Is not the same code as if I right-click in my browser and select "Inspect Element".
  • Sven Marnach
    Sven Marnach almost 10 years
    @user3465589: The links are late-loaded by some JavaScript embedded in that page. It will be really difficult to get the links unless you simply use the API. Honestly.
  • gary
    gary almost 10 years
    @user3465589 I copied and pasted code from an example written by the author of the API.
  • EspenG
    EspenG almost 10 years
    Okay! But how do I install the API library..?
  • yuvi
    yuvi almost 10 years
    @user3465589 you don't install an API, you use it (in this case - you need to download and use the client)
  • gary
    gary almost 10 years
    Right, some packages can be installed with pip (i.e. pip install some-package-name) but this API should just be downloaded and stored to a location where Python can see it.