How to make a Twitter Crawler using Scrapy?

twitter scrapy web-crawler

11,506

Solution 1

import requests

url = "https://twitter.com/login"
payload = { 'session[username_or_email]': account, 
            'session[password]': password}
r = requests.post(url, data=payload)

It would be better to add headers of a browser to request query so that Twitter server would regard the spider as a browser user.

# You need to fill the area below after checking the header in your browser
header = {
        'Host': 'twitter.com',
        'User-Agent': ,
        'Accept': ,
        'Accept-Language': ,
        'Accept-Encoding': ,
        'X-Requested-With': ,
        "Cookie": ",
        'Connection': }
url = 'http://twitter.com/%s/followers'%(someone)
p = requests.get(url, headers=headers)

Then you get the page and you can parse the page by other stuff like BS4, scrape or anything.

Solution 2

I have seen two Scrapy libraries for Twitter so far:

scrapy-twitter - uses Twitter API and gets more data from every tweet
TweetScraper - without Twitter API, but it has powerful query language

11,506

Author by

Admin

Updated on June 16, 2022

Comments

Admin almost 2 years

I have tried using Scrapy to scrape data from websites like Pinterest that do not require logged in sessions for data scraping, but how to use Scrapy for scraping and crawling Twitter, since for accessing Twitter followers and other data we need to first log in.