Use Python to collect weather data and write to file
This kind of task is called Screen scraping. The code I show below just adds a few string-manipulation routines for very basic cleanup, but you can do a much better job with a tool made for screen-scraping, like Beautiful Soup.
import urllib2
import cookielib
cookieJar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
url = "http://www.wunderground.com/global/stations/54511.html?MR=1"
request = urllib2.Request(url)
page = opener.open(request)
# This is one big string
rawdata = page.read()
# This breaks it up into lines
lines_of_data = rawdata.split('\n')
# This is one line in the raw data that looks interesing. I'm going to
# filter the raw data based on the "og:title" text.
#
#'<meta name="og:title" content="Beijing, Beijing | 31° | Clear" />
# The "if line.find(" bit is the filter.
special_lines = [line for line in lines_of_data if line.find('og:title')>-1]
print special_lines
# Now we clean up - this is very crude, but you can improve it with
# exactly what you want to do.
info = special_lines[0].replace('"','').split('content=')[1]
sections = info.split('|')
print sections
Output:
['\t\t<meta name="og:title" content="Beijing, Beijing | 32° | Clear" />']
['Beijing, Beijing ', ' 32° ', ' Clear />']
Edit: By all means, if the particular website offers web services like JSON as in the answer by Xaranke, use that! Not all websites do though, so Beautiful Soup can still be very useful.
Nardrek
Updated on July 27, 2022Comments
-
Nardrek almost 2 years
I would like your advice/help on this:
Create a python script that:
- Every hour collect the temperature (e.g. 29oC) and current condition (e.g. Clear) from this website: http://www.wunderground.com/global/stations/54511.html
- Create a CSV file with three headers:
- Datetime (timestamp of data captured)
- Current Condition
- Temperature
Add the data collected from the website into a new line of the CSV file.
Rules:
- The script must be running on your computer automatically for 5 days.
Do you have some advice? :(
I appreciate your help on this.
I tried this :
import urllib2 import cookielib cookieJar = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar)) setURL = 'http://www.wunderground.com/global/stations/54511.html?MR=1' request = urllib2.Request(setURL) response = opener.open(request) url = "http://www.wunderground.com/global/stations/54511.html?MR=1" request = urllib2.Request(url) page = opener.open(request) WeatherData = page.read() print WeatherData
So it print all the data but I want to print only the :
Datetime (timestamp of data captured) - Current Condition - Temperature
-
like I said need advice.like :
what should I use to complete this task.
How can set the data collect for (days), I don't know..
I don't need the full answer and copy paste, I'm not fool...
I want to UNDERSTAND.