Tweepy Streaming - Stop collecting tweets at x amount
You need to add a counter inside of your class in __init__
, and then increment it inside of on_status
. Then when the counter is below 20 it will insert a record into the collection. This could be done as show below:
def __init__(self, api=None):
super(StdOutListener, self).__init__()
self.num_tweets = 0
def on_status(self, status):
record = {'Text': status.text, 'Created At': status.created_at}
print record #See Tweepy documentation to learn how to access other fields
self.num_tweets += 1
if self.num_tweets < 20:
collection.insert(record)
return True
else:
return False
AngryWhopper
Updated on June 13, 2022Comments
-
AngryWhopper almost 2 years
I'm looking to have the Tweepy Streaming API stop pulling in tweets after I have stored x # of tweets in MongoDB.
I have tried IF and WHILE statements inside the class, defintion with counters, but cannot get it to stop at a certain X amount. This is a real head-banger for me. I found this link here: https://groups.google.com/forum/#!topic/tweepy/5IGlu2Qiug4 but my efforts to replicate this have failed. It always tells me that init needs an additional argument. I believe we have our Tweepy auth set different, so it is not apples to apples.
Any thoughts?
from tweepy.streaming import StreamListener from tweepy import OAuthHandler from tweepy import Stream import json, time, sys import tweepy auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET) auth.set_access_token(OAUTH_TOKEN, OAUTH_TOKEN_SECRET) class StdOutListener(StreamListener): def on_status(self, status): text = status.text created = status.created_at record = {'Text': text, 'Created At': created} print record #See Tweepy documentation to learn how to access other fields collection.insert(record) def on_error(self, status): print 'Error on status', status def on_limit(self, status): print 'Limit threshold exceeded', status def on_timeout(self, status): print 'Stream disconnected; continuing...' stream = Stream(auth, StdOutListener()) stream.filter(track=['tv'])
-
AngryWhopper over 10 yearsAdding the init gives me this error: "'StdOutListener' object has no attribute 'api'" i.imgur.com/Z2N3hCB.png I am not sure what adding that has to do with the api?
-
Nat Dempkowski over 10 yearsSorry about that, you also need to add a call to the init of the base class. I updated the code about, but it is as simple as adding a line
super(StdOutListener, self).__init__()
to the definition of init. -
alko over 10 yearsTo reduce errors in future, I'd better conform
__init__
definition with StreamListener's one:def __init__(self, api=None):
, and call it whith api param. -
AngryWhopper over 10 yearsThanks, this worked! So for my understanding, why did this need a call back to the base class init? When I don't call it, but added api=None, it gives the "no attribute api" error. Is the purpose of the super init to call back to the base class that DOES have an api attribute?
-
Mona Jalal almost 8 yearsIn tweepy I get this error
NameError: global name 'StdOutListener' is not defined
how should I use count in init? -
Nat Dempkowski almost 8 yearsThat line is just trying to call super on the class you're initializing. You can change
StdOutListener
to whatever you're calling yourStreamListener
subclass. eg. the code from my response should go inside theStdOutListener
class. -
Hamman Samuel almost 8 years@AngryWhopper I've been dealing with this error for a few hours now and wouldn't have thought the
__init()__
method would have caused it! +1 for catching that