How to make a Twitter Crawler using Scrapy?
11,506
Solution 1
Login Twitter and get the follower page of someone An example using Python library Requests:
import requests
url = "https://twitter.com/login"
payload = { 'session[username_or_email]': account,
'session[password]': password}
r = requests.post(url, data=payload)
It would be better to add headers of a browser to request query so that Twitter server would regard the spider as a browser user.
# You need to fill the area below after checking the header in your browser
header = {
'Host': 'twitter.com',
'User-Agent': ,
'Accept': ,
'Accept-Language': ,
'Accept-Encoding': ,
'X-Requested-With': ,
"Cookie": ",
'Connection': }
url = 'http://twitter.com/%s/followers'%(someone)
p = requests.get(url, headers=headers)
Then you get the page and you can parse the page by other stuff like BS4, scrape or anything.
Solution 2
I have seen two Scrapy libraries for Twitter so far:
- scrapy-twitter - uses Twitter API and gets more data from every tweet
- TweetScraper - without Twitter API, but it has powerful query language
Author by
Admin
Updated on June 16, 2022Comments
-
Admin almost 2 years
I have tried using Scrapy to scrape data from websites like Pinterest that do not require logged in sessions for data scraping, but how to use Scrapy for scraping and crawling Twitter, since for accessing Twitter followers and other data we need to first log in.