My python program is running really slow

14,401

Solution 1

Program efficiency typically falls under the 80/20 rule (or what some people call the 90/10 rule, or even the 95/5 rule). That is, 80% of the time the program is actually running in 20% of the code. In other words, there is a good shot that your code has a "bottleneck": a small area of the code that is running slow, while the rest runs very fast. Your goal is to identify that bottleneck (or bottlenecks), then fix it (them) to run faster.

The best way to do this is to profile your code. This means you are logging the time of when a specific action occurs with the logging module, use timeit like a commenter suggested, use some of the built-in profilers, or simply print out the current time at very points of the program. Eventually, you will find one part of the code that seems to be taking the most amount of time.

Experience will tell you that I/O (stuff like reading from a disk, or accessing resources over the internet) will take longer than in-memory calculations. My guess as to the problem is that you're using 1 HTTP connection to get a list of streamers, and then one HTTP connection to get the status of that streamer. Let's say that there are 10000 streamers: your program will need to make 10001 HTTP connections before it finishes.

There would be a few ways to fix this if this is indeed the case:

  • See if Twitch.TV has some alternatives in their API that allows you to retrieve a list of users WITH their streaming mode so that you don't need to call an API for each streamer.
  • Cache results. This won't actually help your program run faster the first time it runs, but you might be able to make it so that if it runs a second time within a minute, it can reuse results.
  • Limit your application to only dealing with a few streamers at a time. If there are 10000 streamers, what exactly does your application do that it really needs to look at the mode of all 10000 of them? Perhaps it's better to just grab the top 20, at which point the user can press a key to get the next 20, or close the application. Often times, programming is not just about writing code, but managing expectations of what your users want. This seems to be a pet project, so there might not be "users", meaning you have free reign to change what the app does.
  • Use multiple connections. Right now, your app makes one connection to the server, waits for the results to come back, parses the results, saves it, then starts on the next connection. This process might take an entire half a second. If there were 250 streamers, running this process for each of them would take a little over two minutes total. However, if you could run four of them at a time, you could potentially reduce your time to just under 30 seconds total. Check out the multiprocessing module. Keep in mind that some APIs might have limits to how many connections you can make at a certain time, so hitting them with 50 connections at a time might irk them and cause them to forbid you from accessing their API. Use caution here.

Solution 2

You are using the wrong tool here to parse the json data returned by your URL. You need to use json library provided by default rather than parsing the data using regex. This will give you a boost in your program's performance

Change the regex parser

#regular expressions to get all the stream names
        self.info = re.sub(r'"teams":\[\{.+?"\}\]', '', self.info) #remove all team names (they have the same name: parameter as streamer names)
        self.streamers_names = re.findall('"name":"(.+?)"', self.info) #looks for the name of each streamer in the pile of info

To json parser

self.info = json.loads(self.info) #This will parse the json data as a Python Object
#Parse the name and return a generator 
return (stream['name'] for stream in data[u'streams'])
Share:
14,401

Related videos on Youtube

Shay
Author by

Shay

Updated on September 16, 2022

Comments

  • Shay
    Shay over 1 year

    I'm making a program that (at least right now) retrives stream information from TwitchTV (streaming platform). This program is to self educate myself but when i run it, it's taking 2 minutes to print just the name of the streamer.

    I'm using Python 2.7.3 64bit on Windows7 if that is important in anyway.

    classes.py:

    #imports:
    import urllib
    import re
    
    #classes:
    class Streamer:
    
        #constructor:
        def __init__(self, name, mode, link):
            self.name = name
            self.mode = mode
            self.link = link
    
    class Information:
    
        #constructor:
        def __init__(self, TWITCH_STREAMS, GAME, STREAMER_INFO):
            self.TWITCH_STREAMS = TWITCH_STREAMS
            self.GAME = GAME
            self.STREAMER_INFO = STREAMER_INFO
    
        def get_game_streamer_names(self):
            "Connects to Twitch.TV API, extracts and returns all streams for a spesific game."
    
            #start connection
            self.con = urllib2.urlopen(self.TWITCH_STREAMS + self.GAME)
            self.info = self.con.read()
            self.con.close()
    
            #regular expressions to get all the stream names
            self.info = re.sub(r'"teams":\[\{.+?"\}\]', '', self.info) #remove all team names (they have the same name: parameter as streamer names)
            self.streamers_names = re.findall('"name":"(.+?)"', self.info) #looks for the name of each streamer in the pile of info
    
    
            #run in a for to reduce all "live_user_NAME" values
            for name in self.streamers_names:
                if name.startswith("live_user_"):
                    self.streamers_names.remove(name)
    
            #end method
            return self.streamers_names
    
        def get_streamer_mode(self, name):
            "Returns a streamers mode (on/off)"
    
            #start connection
            self.con = urllib2.urlopen(self.STREAMER_INFO + name)
            self.info = self.con.read()
            self.con.close()
    
        #check if stream is online or offline ("stream":null indicates offline stream)
        if self.info.count('"stream":null') > 0:
            return "offline"
        else:
            return "online"
    

    main.py:

    #imports:
    from classes import *
    
    #consts:
    TWITCH_STREAMS = "https://api.twitch.tv/kraken/streams/?game=" #add the game name at the end of the link (space = "+", eg: Game+Name)
    STREAMER_INFO  = "https://api.twitch.tv/kraken/streams/" #add streamer name at the end of the link
    GAME = "League+of+Legends"
    
    def main():
        #create an information object
        info = Information(TWITCH_STREAMS, GAME, STREAMER_INFO)
    
        streamer_list = [] #create a streamer list
        for name in info.get_game_streamer_names():
            #run for every streamer name, create a streamer object and place it in the list
            mode =  info.get_streamer_mode(name)
            streamer_name = Streamer(name, mode, 'http://twitch.tv/' + name)
            streamer_list.append(streamer_name)
    
        #this line is just to try and print something
        print streamer_list[0].name, streamer_list[0].mode
    
    
    if __name__ == '__main__':
        main()
    

    the program itself works perfectly, just really slow

    any ideas?

    • lgautier
      lgautier about 11 years
      find out which part takes time. The modules profile or cProfile can help.
  • jfs
    jfs about 11 years
    to get an overview of relative time performance: python -mcProfile the_script.py
  • Shay
    Shay about 11 years
    Thank you, i did know it was json i just didn't know about the python json module, helped alot.
  • Shay
    Shay about 11 years
    I found out a way to request more then one streamer per request and limit the streamers recived, thanks for your answer, helped alot!
  • TimothyAWiseman
    TimothyAWiseman about 11 years
    Profiling should almost always be the first step when facing a performance problem.