How to Bypass Google Recaptcha while scraping with Requests

19,498

Using Google Cache along with a referer (in the header) will help you bypass the captcha.
Things to note:

  • Don't send more than 2 requests/sec. You may get blocked.
  • The result you receive is a cache. This will not be effective if you are trying to scrape a real-time data.
    Example:
header = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" ,
    'referer':'https://www.google.com/'
}

r = requests.get("http://webcache.googleusercontent.com/search?q=cache:www.naukri.com/jobs-in-andhra-pradesh",headers=header)

This gives:

>>> r.content
[Squeezed 2554 lines]
Share:
19,498
k monish
Author by

k monish

Updated on July 24, 2022

Comments

  • k monish
    k monish almost 2 years

    Python code to request the URL:

    agent = {"User-Agent":'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'} #using agent to solve the blocking issue
    response = requests.get('https://www.naukri.com/jobs-in-andhra-pradesh', headers=agent)
    #making the request to the link
    

    Output when printing the html :

    <!DOCTYPE html>
    
    <html>
      <head>
        <title>Naukri reCAPTCHA</title> #the title in the actual title of the URL that I am requested for
        <meta name="robots" content="noindex, nofollow">
            <link rel="stylesheet" href="https://static.naukimg.com/s/4/101/c/common_v62.min.css" />      
            <script src="https://www.google.com/recaptcha/api.js" async defer></script>   
        </head>
    </html>
    
  • k monish
    k monish about 4 years
    passing this captcha will it cause any trouble moving ahead?
  • Joshua Varghese
    Joshua Varghese about 4 years
    @kmonish not unless you avoid the referer and cache while requesting. do requests with a time interval
  • k monish
    k monish about 4 years
    @JoshuaVarghese When I try to navigate to next page it is getting back to older link for the next page. Does this not allow us to navigate and scrape?
  • Joshua Varghese
    Joshua Varghese about 4 years
    @kmonish did you add the webcache.google.... prefix to the url,for navigating?
  • k monish
    k monish about 4 years
    @JoshuaVarghese it shows on the search engine that the URL is not found in the server
  • Joshua Varghese
    Joshua Varghese about 4 years
    @kmonish could you give the url? comment the url here
  • k monish
    k monish about 4 years
  • Joshua Varghese
    Joshua Varghese about 4 years
    @kmonish it fails because the site wasnt cached
  • k monish
    k monish about 4 years
    @JoshuaVarghese I just realized that some links are not getting cached like how others get. Is there any other way where we can resolve this issue
  • Muhammad Zubair
    Muhammad Zubair about 3 years
    I have added all these headers, but it still shows recaptcha sometimes.