Python web crawler with MySQL database
Solution 1
yes i know,
libraries
https://github.com/djay/transmogrify.webcrawler
http://code.google.com/p/harvestman-crawler/
http://code.activestate.com/pypm/orchid/
open source web crawler
tutorials
http://www.example-code.com/python/pythonspider.asp
PS I don't know if they use mysql because normally python either uses sqlit or postgre sql so if you want you could use the libraries i gave you and import the python-mysql module and do it :D
http://sourceforge.net/projects/mysql-python/
Solution 2
I would suggest you to use Scrapy
, which is a powerful scraping framework based on Twisted
and lxml
. It is particularly well suited for the kind of tasks you want to perform, it features regex based rules to follow links and lets you use either regular expressions or XPath expressions to extract data from the html. It also provides what they call "pipelines" to dump data to whatever you want.
Scrapy doesn't provide a built-in MySQL pipeline, but someone has written one here, from which you could base your own.
Solution 3
Scrappy is a web crawling and scraping framework you can extend to insert the selected data to a database.
It's like an inverse of the Django framework.
Callum Whyte
Updated on June 11, 2022Comments
-
Callum Whyte almost 2 years
I want to create or find an open source web crawler (spider/bot) written in Python. It must find and follow links, collect meta tags and meta descriptions, title's of web pages and the url of a webpage and put all of the data into a MySQL database.
Does anyone know of any open source scripts that could help me? Also, if anyone can give me some pointers as to what I should do then they are more than welcome to.