Use Python to collect weather data and write to file

16,863

This kind of task is called Screen scraping. The code I show below just adds a few string-manipulation routines for very basic cleanup, but you can do a much better job with a tool made for screen-scraping, like Beautiful Soup.

import urllib2
import cookielib

cookieJar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))

url = "http://www.wunderground.com/global/stations/54511.html?MR=1"
request = urllib2.Request(url)
page = opener.open(request)

# This is one big string
rawdata = page.read()

# This breaks it up into lines
lines_of_data = rawdata.split('\n')

# This is one line in the raw data that looks interesing.  I'm going to
# filter the raw data based on the "og:title" text.
# 
#'<meta name="og:title" content="Beijing, Beijing | 31&deg; | Clear" />

# The "if line.find(" bit is the filter. 
special_lines = [line for line in lines_of_data if line.find('og:title')>-1]
print special_lines

# Now we clean up - this is very crude, but you can improve it with
# exactly what you want to do.
info = special_lines[0].replace('"','').split('content=')[1]
sections = info.split('|')
print sections

Output:

['\t\t<meta name="og:title" content="Beijing, Beijing | 32&deg; | Clear" />']
['Beijing, Beijing ', ' 32&deg; ', ' Clear />']

Edit: By all means, if the particular website offers web services like JSON as in the answer by Xaranke, use that! Not all websites do though, so Beautiful Soup can still be very useful.

Share:
16,863
Nardrek
Author by

Nardrek

Updated on July 27, 2022

Comments

  • Nardrek
    Nardrek almost 2 years

    I would like your advice/help on this:

    Create a python script that:

    Add the data collected from the website into a new line of the CSV file.

    Rules:

    • The script must be running on your computer automatically for 5 days.

    Do you have some advice? :(

    I appreciate your help on this.

    I tried this :

    import urllib2
    import cookielib
    
    cookieJar = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
    
    setURL = 'http://www.wunderground.com/global/stations/54511.html?MR=1'
    request = urllib2.Request(setURL)
    response = opener.open(request)
    
    url = "http://www.wunderground.com/global/stations/54511.html?MR=1"
    request = urllib2.Request(url)
    page = opener.open(request)
    
    WeatherData = page.read()                            
    print WeatherData
    

    So it print all the data but I want to print only the :

    Datetime (timestamp of data captured) - Current Condition - Temperature

    • like I said need advice.like :

      • what should I use to complete this task.

      • How can set the data collect for (days), I don't know..

    I don't need the full answer and copy paste, I'm not fool...

    I want to UNDERSTAND.