Filter Twitter feeds only by language
Solution 1
You can't (without special access). Streaming all the tweets (unfiltered) requires a connection to the firehose, which is granted only in specific use cases by Twitter. Honestly, the firehose isn't really necessary--proper use of track
can get you more tweets than you know what to do with.
Try using something like this:
stream.filter(languages=["en"], track=["a", "the", "i", "you", "u"]) # etc
Filtering by words like that will get you many, many tweets. If you want real data for the most-used words, check out this article from Time: The 500 Most Frequently Used Words on Twitter. You can use up to 400 keywords, but that will likely approach the 1% limit of tweets at a given time interval. If your track
parameter matches 60% of all tweets at a given time, you will still only get 1% (which is a LOT of tweets).
Solution 2
Try lang='en'
param in Cursor()
e.g.
tweepy.Cursor(.. lang='en')
Solution 3
Other than getting filtered tweets directly, you can filter it after getting all tweets of different languages by:
tweets = api.search("python")
for tweet in tweets:
if tweet.lang == "en":
print(tweet.text)
#Do the stuff here
Hope it helps.
Solution 4
You can see the arguments for the track method in the github code https://github.com/tweepy/tweepy/blob/master/tweepy/streaming.py
Put languages in a array of ISO_639-1_codes.
They are:
filter(self, follow=None, track=None, is_async=False, locations=None,
stall_warnings=False, languages=None, encoding='utf8', filter_level=None):
So to track by languages just put:
class Listener(StreamListener):
def on_data(self, data):
j = json.loads(data)
t = {
'screenName' : j['user']['screen_name'],
'text:': j['text']
}
print(t)
return(True)
def on_status(self, status):
print(status.text)
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth=auth, listener=Listener(),wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
stream.filter(track=['Trump'],languages=["en","fr","es"])
Solution 5
Tweepy search allows to fetch tweets for specific language. You can use ISO 639-1 code to specify the value for language parameter. Following code will fetch tweets with full text in specified language (English for below example)
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
tweets = api.search(q = keywordtosearch, lang = 'en', count = 100, truncated = False, tweet_mode = 'extended')
for tweet in tweets:
print(tweet.full_text)
#add your code
Related videos on Youtube
Sudo
A Computer Engineer interested in big data, machine learning, data mining, cluster-computing.
Updated on July 17, 2022Comments
-
Sudo almost 2 years
I am using Tweepy API for extracting Twitter feeds. I want to extract all Twitter feeds of a specific language only. The language filter works only if
track
filter is provided. The following code returns 406 error:l = StdOutListener() auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) stream = Stream(auth, l) stream.filter(languages=["en"])
How can I extract all the tweets from certain language using Tweepy?
-
tktktk0711 over 6 yearsHave you solved this issue. I also want to extract specific language tweet like japanese.
-
-
Krishh about 8 yearsThanks!! I was typing
stream.filter(lang=["en"], track=["something"])
-
Maged Saeed over 2 yearstried your code, got
AttributeError: 'API' object has no attribute 'search'
-
Bruno Gois over 2 yearsreplace search with search_tweets