How to get large list of followers Tweepy

11,984

Solution 1

I actually figured it out, so I'll post the solution here just for reference.

import tweepy
import time

key1 = "..."
key2 = "..."
key3 = "..."
key4 = "..."

accountvar = raw_input("Account name: ")

auth = tweepy.OAuthHandler(key1, key2)
auth.set_access_token(key3, key4)

api = tweepy.API(auth)

users = tweepy.Cursor(api.followers, screen_name=accountvar).items()

while True:
    try:
        user = next(users)
    except tweepy.TweepError:
        time.sleep(60*15)
        user = next(users)
    except StopIteration:
        break
    print "@" + user.screen_name

This stops after every 300 names for 15 minutes, and then continues. This makes sure that it doesn't run into problems. This will obviously take ages for large accounts, but as Leb mentioned:

The twitter API only allows 100 users to be searched for at a time...[so] what you'll need to do is iterate through each 100 users but staying within the rate limit.

You basically just have to leave the program running if you want the next set. I don't know why mine is giving 300 at a time instead of 100, but as I mentioned about my program earlier, it was giving me 100 earlier as well.

Hope this helps anyone else that had the same problem as me, and shoutout to Leb for reminding me to focus on the rate limit.

Solution 2

To extend upon this:

You can harvest 3,000 users per 15 minutes by adding a count parameter:

users = tweepy.Cursor(api.followers, screen_name=accountvar, count=200).items()

This will call the Twitter API 15 times as per your version, but rather than the default count=20, each API call will return 200 (i.e. you get 3000 rather than 300).

Solution 3

Twitter provides two ways to fetch the followers: -

  1. Fetching full followers list (using followers/list in Twitter API or api.followers in tweepy) - Alec and mataxu have provided the approach to fetch using this way in their answers. The rate limit with this is you can get at most 200 * 15 = 3000 followers in every 15 minutes window.
  2. Second approach involves two stages:-
    a) Fetching only the followers ids first (using followers/ids in Twitter API or api.followers_ids in tweepy).you can get 5000 * 15 = 75K follower ids in each 15 minutes window.

    b) Looking up their usernames or other data (using users/lookup in twitter api or api.lookup_users in tweepy). This has rate limitation of about 100 * 180 = 18K lookups each 15 minute window.

Considering the rate limits, Second approach gives followers data 6 times faster when compared to first approach. Below is the code which could be used to do it using 2nd approach:-

#First, Make sure you have set wait_on_rate_limit to True while connecting through Tweepy
api = tweepy.API(auth, wait_on_rate_limit=True,wait_on_rate_limit_notify=True)

#Below code will request for 5000 follower ids in one request and therefore will give 75K ids in every 15 minute window (as 15 requests could be made in each window).
followerids =[]
for user in tweepy.Cursor(api.followers_ids, screen_name=accountvar,count=5000).items():
    followerids.append(user)    
print (len(followerids))

#Below function could be used to make lookup requests for ids 100 at a time leading to 18K lookups in each 15 minute window
def get_usernames(userids, api):
    fullusers = []
    u_count = len(userids)
    print(u_count)
    try:
        for i in range(int(u_count/100) + 1):            
            end_loc = min((i + 1) * 100, u_count)
            fullusers.extend(
                api.lookup_users(user_ids=userids[i * 100:end_loc])                
            )
        return fullusers
    except:
        import traceback
        traceback.print_exc()
        print ('Something went wrong, quitting...')

#Calling the function below with the list of followeids and tweepy api connection details
fullusers = get_usernames(followerids,api)

Hope this helps. Similiar approach could be followed for fetching friends details by using api.friends_ids inplace of api.followers_ids

If you need more resources for rate limit comparison and for 2nd approach, check below links:-

Solution 4

The twitter API only allows 100 users to be searched for at a time. That's why no matter how many you input to it you'll get 100. The followers_id is giving you the correct number of users but you're being limited by GET users/lookup

What you'll need to do is iterate through each 100 users but staying within the rate limit.

Share:
11,984
mataxu
Author by

mataxu

Updated on July 04, 2022

Comments

  • mataxu
    mataxu almost 2 years

    I'm trying to use Tweepy to get the full list of followers from an account with like 500k followers, and I have a code that gives me the usernames for smaller accounts, like under 100, but if I get one that's even like 110 followers, it doesn't work. Any help figuring out how to make it work with larger numbers is greatly appreciated!

    Here's the code I have right now:

    import tweepy
    import time
    
    key1 = "..."
    key2 = "..."
    key3 = "..."
    key4 = "..."
    
    accountvar = raw_input("Account name: ")
    
    auth = tweepy.OAuthHandler(key1, key2)
    auth.set_access_token(key3, key4)
    
    api = tweepy.API(auth)
    
    ids = []
    for page in tweepy.Cursor(api.followers_ids, screen_name=accountvar).pages():
         ids.extend(page)
         time.sleep(60)
    
    users = api.lookup_users(user_ids=ids)
    for u in users:
         print u.screen_name
    

    The error I keep getting is:

    Traceback (most recent call last):
      File "test.py", line 24, in <module>
        users = api.lookup_users(user_ids=ids)
      File "/Library/Python/2.7/site-packages/tweepy/api.py", line 321, in lookup_users
        return self._lookup_users(post_data=post_data)
      File "/Library/Python/2.7/site-packages/tweepy/binder.py", line 239, in _call
        return method.execute()
      File "/Library/Python/2.7/site-packages/tweepy/binder.py", line 223, in execute
        raise TweepError(error_msg, resp)
    tweepy.error.TweepError: [{u'message': u'Too many terms specified in query.', u'code': 18}]
    

    I've looked at a bunch of other questions about this type of question, but none I could find had a solution that worked for me, but if someone has a link to a solution, please send it to me!

  • Leb
    Leb almost 9 years
    I would consider going slightly over 15 minutes to remove any possible delay between your computer and twitter API. I don't have an actual number, maybe 16? I'm not sure how strict twitter is with their '15 mintues', would '14:35' be good? Just to be safe and ensure the program is running with the least interference.
  • Tom Cornebize
    Tom Cornebize about 4 years
    Great answer, this approach is much faster!
  • Gabriel
    Gabriel over 3 years
    Apparently this approach no longer works. The call to tweepy.Cursor() in my case returns a Rate limit reached. Sleeping for: 894 after less than 100 requests
  • Gabriel
    Gabriel over 3 years
    Apparently it works if you use: tweepy.Cursor(api.followers_ids, screen_name=accountvar,count=5000).items(5000)
  • shaha
    shaha about 3 years
    If @Gabriel 's comment is true, can you update your solution Himanshu?
  • brienna
    brienna about 3 years
    It works fine for me with .items() instead of .items(5000)
  • Ashfaq Ali Shafin
    Ashfaq Ali Shafin almost 3 years
    Gelling an error TweepError: Failed to send request: ('Connection aborted.', OSError("(10054, 'WSAECONNRESET')"))