How to make a Twitter Crawler using Scrapy?

11,506

Solution 1

Login Twitter and get the follower page of someone An example using Python library Requests:

import requests

url = "https://twitter.com/login"
payload = { 'session[username_or_email]': account, 
            'session[password]': password}
r = requests.post(url, data=payload)

It would be better to add headers of a browser to request query so that Twitter server would regard the spider as a browser user.

# You need to fill the area below after checking the header in your browser
header = {
        'Host': 'twitter.com',
        'User-Agent': ,
        'Accept': ,
        'Accept-Language': ,
        'Accept-Encoding': ,
        'X-Requested-With': ,
        "Cookie": ",
        'Connection': }
url = 'http://twitter.com/%s/followers'%(someone)
p = requests.get(url, headers=headers)

Then you get the page and you can parse the page by other stuff like BS4, scrape or anything.

Solution 2

I have seen two Scrapy libraries for Twitter so far:

  • scrapy-twitter - uses Twitter API and gets more data from every tweet
  • TweetScraper - without Twitter API, but it has powerful query language
Share:
11,506
Admin
Author by

Admin

Updated on June 16, 2022

Comments

  • Admin
    Admin almost 2 years

    I have tried using Scrapy to scrape data from websites like Pinterest that do not require logged in sessions for data scraping, but how to use Scrapy for scraping and crawling Twitter, since for accessing Twitter followers and other data we need to first log in.