Scrapy Shell - How to change USER_AGENT

15,981

Solution 1

scrapy shell -s USER_AGENT='custom user agent' 'http://www.example.com'

Solution 2

Inside the scrapy shell, you can set the User-Agent in the request header.

url = 'http://www.example.com'
request = scrapy.Request(url, headers={'User-Agent': 'Mybot'})
fetch(request)
Share:
15,981

Related videos on Youtube

dfriestedt
Author by

dfriestedt

Updated on October 20, 2020

Comments

  • dfriestedt
    dfriestedt over 2 years

    I have a fully functioning scrapy script to extract data from a website. During setup, the target site banned me based on my USER_AGENT information. I subsequently added a RotateUserAgentMiddleware to rotate the USER_AGENT randomly. This works great.

    However, now when I trying to use the scrapy shell to test xpath and css requests, I get a 403 error. I'm sure this is because the USER_AGENT of the scrapy shell is defaulting to some value the target site has blacklisted.

    Question: is it possible to fetch a URL in the scrapy shell with a different USER_AGENT than the default?

    fetch('http://www.test') [add something ?? to change USER_AGENT]

    Thx

  • Computer's Guy
    Computer's Guy about 7 years
    Do you know how to also add headers to scrapy shell? Thanks.
  • Ariel
    Ariel almost 6 years
    I got here because I was running the shell from outside the project directory and my settings file was being ignored. Once I changed into the project directory, the custom USER_AGENT setting worked properly, no need to pass any extra parameter to the scrapy shell command.

Related