Python web crawler with MySQL database

12,592

Solution 1

yes i know,

libraries

https://github.com/djay/transmogrify.webcrawler

http://code.google.com/p/harvestman-crawler/

http://code.activestate.com/pypm/orchid/

open source web crawler

http://scrapy.org/

tutorials

http://www.example-code.com/python/pythonspider.asp

PS I don't know if they use mysql because normally python either uses sqlit or postgre sql so if you want you could use the libraries i gave you and import the python-mysql module and do it :D

http://sourceforge.net/projects/mysql-python/

Solution 2

I would suggest you to use Scrapy, which is a powerful scraping framework based on Twisted and lxml. It is particularly well suited for the kind of tasks you want to perform, it features regex based rules to follow links and lets you use either regular expressions or XPath expressions to extract data from the html. It also provides what they call "pipelines" to dump data to whatever you want.

Scrapy doesn't provide a built-in MySQL pipeline, but someone has written one here, from which you could base your own.

Solution 3

Scrappy is a web crawling and scraping framework you can extend to insert the selected data to a database.

It's like an inverse of the Django framework.

Share:
12,592
Callum Whyte
Author by

Callum Whyte

Updated on June 11, 2022

Comments

  • Callum Whyte
    Callum Whyte almost 2 years

    I want to create or find an open source web crawler (spider/bot) written in Python. It must find and follow links, collect meta tags and meta descriptions, title's of web pages and the url of a webpage and put all of the data into a MySQL database.

    Does anyone know of any open source scripts that could help me? Also, if anyone can give me some pointers as to what I should do then they are more than welcome to.