Tweepy Streaming - Stop collecting tweets at x amount

11,731

You need to add a counter inside of your class in __init__, and then increment it inside of on_status. Then when the counter is below 20 it will insert a record into the collection. This could be done as show below:

def __init__(self, api=None):
    super(StdOutListener, self).__init__()
    self.num_tweets = 0

def on_status(self, status):
    record = {'Text': status.text, 'Created At': status.created_at}
    print record #See Tweepy documentation to learn how to access other fields
    self.num_tweets += 1
    if self.num_tweets < 20:
        collection.insert(record)
        return True
    else:
        return False
Share:
11,731
AngryWhopper
Author by

AngryWhopper

Updated on June 13, 2022

Comments

  • AngryWhopper
    AngryWhopper almost 2 years

    I'm looking to have the Tweepy Streaming API stop pulling in tweets after I have stored x # of tweets in MongoDB.

    I have tried IF and WHILE statements inside the class, defintion with counters, but cannot get it to stop at a certain X amount. This is a real head-banger for me. I found this link here: https://groups.google.com/forum/#!topic/tweepy/5IGlu2Qiug4 but my efforts to replicate this have failed. It always tells me that init needs an additional argument. I believe we have our Tweepy auth set different, so it is not apples to apples.

    Any thoughts?

    from tweepy.streaming import StreamListener
    from tweepy import OAuthHandler
    from tweepy import Stream
    import json, time, sys
    
    import tweepy
    auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
    auth.set_access_token(OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
    
    class StdOutListener(StreamListener):
    
        def on_status(self, status):
            text = status.text
            created = status.created_at
            record = {'Text': text, 'Created At': created}
            print record #See Tweepy documentation to learn how to access other fields
            collection.insert(record)  
    
    
        def on_error(self, status):
            print 'Error on status', status
    
        def on_limit(self, status):
            print 'Limit threshold exceeded', status
    
        def on_timeout(self, status):
            print 'Stream disconnected; continuing...'
    
    
    stream = Stream(auth, StdOutListener())
    stream.filter(track=['tv'])
    
  • AngryWhopper
    AngryWhopper over 10 years
    Adding the init gives me this error: "'StdOutListener' object has no attribute 'api'" i.imgur.com/Z2N3hCB.png I am not sure what adding that has to do with the api?
  • Nat Dempkowski
    Nat Dempkowski over 10 years
    Sorry about that, you also need to add a call to the init of the base class. I updated the code about, but it is as simple as adding a line super(StdOutListener, self).__init__() to the definition of init.
  • alko
    alko over 10 years
    To reduce errors in future, I'd better conform __init__ definition with StreamListener's one: def __init__(self, api=None):, and call it whith api param.
  • AngryWhopper
    AngryWhopper over 10 years
    Thanks, this worked! So for my understanding, why did this need a call back to the base class init? When I don't call it, but added api=None, it gives the "no attribute api" error. Is the purpose of the super init to call back to the base class that DOES have an api attribute?
  • Mona Jalal
    Mona Jalal almost 8 years
    In tweepy I get this error NameError: global name 'StdOutListener' is not defined how should I use count in init?
  • Nat Dempkowski
    Nat Dempkowski almost 8 years
    That line is just trying to call super on the class you're initializing. You can change StdOutListener to whatever you're calling your StreamListener subclass. eg. the code from my response should go inside the StdOutListener class.
  • Hamman Samuel
    Hamman Samuel almost 8 years
    @AngryWhopper I've been dealing with this error for a few hours now and wouldn't have thought the __init()__ method would have caused it! +1 for catching that