Is it possible to reduce memory RAM consumption when using Selenium GeckoDriver and Firefox

13,065

Solution 1

I discover how to avoid the memory leak.

I just use

time.sleep(2)

after

file2.write(itemLista+" "+str(isActivated.text)+" "+str(activationDate.text)+"\n")

Now firefox is working without consumes lots of RAM

It is just perfect.

I don't know exactly why it stopped consumes so much memory, but I think it was growing memory consume because it didn't have time to finish each driver.get request.

Solution 2

It is not clear from your question about the list items within lista to check the actual url/website.

However, it may not be possible to reduce RAM consumption while accessing the website more than 10k times in a row with the approach you have adapted.

Solution

As you mentioned when the script access this site 2500 times or so, it already consumes 4gb or more of RAM and it stops to work you may induce a counter to access the site 2000 times in a loop and reinitialize the WebDriver and Web Browser afresh after invoking driver.quit() within tearDown(){} method to close & destroy the existing WebDriver and Web Client instances gracefully as follows:

driver.quit() // Python

You can find a detailed discussion in PhantomJS web driver stays in memory

Incase the GeckoDriver and Firefox processes are still not destroyed and removed you may require to kill the processes from tasklist.

  • Python Solution(Cross Platform):

    import os
    import psutil
    
    PROCNAME = "geckodriver" # or chromedriver or iedriverserver
    for proc in psutil.process_iter():
        # check whether the process name matches
        if proc.name() == PROCNAME:
            proc.kill()
    

You can find a detailed discussion in Selenium : How to stop geckodriver process impacting PC memory, without calling driver.quit()?

Share:
13,065

Related videos on Youtube

fabiobh
Author by

fabiobh

Updated on June 04, 2022

Comments

  • fabiobh
    fabiobh almost 2 years

    I use Selenium and Firefox webdriver with python to scrape data from a website.

    But in the code, I need to access this website more than 10k times and it consumes a lot of RAM to do that.

    Usually, when the script access this site 2500 times, it already consumes 4gb or more of RAM and it stops to work.

    Is it possible to reduce memory RAM consumption without close browser session?

    I ask that because when I start the script, I need to log manually on the site(two-factor autentication, the code is not shown below) and if I close the browser session, I will need to log in the site again.

    for itemLista in lista:
        driver.get("https://mytest.site.com/query/option?opt="+str(itemLista))
    
        isActivated = driver.find_element_by_xpath('//div/table//tr[2]//td[1]')
        activationDate = driver.find_element_by_xpath('//div/table//tr[2]//td[2]')
    
        print(str(isActivated.text))
        print(str(activationDate.text))
    
        indice+=1
        print("numero: "+str(indice))
    
        file2.write(itemLista+" "+str(isActivated.text)+" "+str(activationDate.text)+"\n")
    
    #close file
    file2.close()
    
    • r.ook
      r.ook over 5 years
      Maybe instead of keeping file2 open, only open it and write it once per iteration? It seems like the culprit is the growing size of file2 in your buffer.
    • undetected Selenium
      undetected Selenium over 5 years
      Did you consider Headless Firefox or PhantomJS or HTMLUnit browsers as an option?
    • fabiobh
      fabiobh over 5 years
      @ DebanjanB I think to use a headless browser it is not an option for me, because when I access the site, I need to put a password on it. Because the site is protected by a two-factor password that I receive on my email each time that I try to access.
    • Todor Minakov
      Todor Minakov over 5 years
      I'm curious, can you get performance graphs from your OS? One that'll track the browser's process, and your script; it'll help you nail down which is causing the memory usage hike. (I'm mostly curious cause I'd love the see the browser's one :D, its behavior during 10k navigations is very interesting.)
    • Todor Minakov
      Todor Minakov over 5 years
      You could implement a browser recycle option - every X itterations to close the browser and the webdriver, and open them again, thus getting their memory footprint to baseline. X can be 100, 500, 2000 - whatever turns up most useful for you (this "recycle" is an expensive operation, time-wise). This though should be done if only the mem leak turns out to be in the browser, not in your script.
    • ewwink
      ewwink over 5 years
      use Chrome it use less memory.
  • fabiobh
    fabiobh over 5 years
    I made the change that you suggest, but unfortunately it didn't fix my problem. The firefox browser stills consume a lot of RAM. I think even if the [code]file2[/code] is still open, it didn't affect too much the RAM usage.
  • r.ook
    r.ook over 5 years
    Odd. Is your driver creating a new instance of the browser each time? On Chrome I notice that in the processes it creates a new chrome.exe but is quickly killed off and the RAM usage is in check. Not sure how it functions for the firefox driver. If anything I'd guess the RAM rises at each driver.get()... if that is so you could consider create a new driver and close it each iteration, but it's probably more time consuming.
  • Todor Minakov
    Todor Minakov over 5 years
    This solution will not decrease the memory usage - python does not hold in-memory the file content that has been written up until now; it has a buffer for what is pending writing - the only thing that's in the RAM re: your concern, and ths buffer's default size in most OS is just a single line. In fact this approach leads to bad/unexpected result - in every itteration the file is re-created with just the last line; e.g. now the script will not store all values, but just the last.
  • fabiobh
    fabiobh over 5 years
    @Todor Minakov You are right, only the last line is saved.
  • fabiobh
    fabiobh over 5 years
    @Idlehands When I see the task manager, it only has a single firefox process. I can't create a new driver because I will lose browser session and I need the browser session because I need to put a two-factor password manually each time that I try to run the script. I will try to use Chrome driver to see if I find any difference.
  • Todor Minakov
    Todor Minakov over 5 years
    Don't be overly-optimistic for Chrome - it's going to have the same - if not bigger - mem footprint. Better see what actually is using that much memory; could be your script itself.
  • r.ook
    r.ook over 5 years
    @TodorMinakov Good point, I shouldn't have used w mode. I thought the buffer would take up virtual memory but maybe I'm mistaken. Thanks for pointing it out.
  • fabiobh
    fabiobh over 5 years
    Unfortunately I can't use driver.quit() because it will destroy the web session. I need the web session because when I run the script I need to manually inut a two-factor password. If I set the script to use driver.quit() after 2000 times, when I restart the driver I will need to put the password again. But just like you said, I think there is not other solution to this problem.
  • undetected Selenium
    undetected Selenium over 5 years
    In that case you can invoke driver.close() and forcefully kill the Firefox browser instances.