Scrapy: connection refused

17,631

Solution 1

Mission 1: Scrapy will send a usergent with 'bot' in it. Sites might block based on user agent also.

Try over-riding USER_AGENT in settings.py

Eg: USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.7'

Mission 2: Try giving a delay between request, to spoof that a human is sending the request.

DOWNLOAD_DELAY = 0.25 

Mission 3: If nothing works, install wireshark and see the difference in request header (or) post data while scrapy sends and when your browser sends.

Solution 2

Probably there is an issue with your network connection.

First of all, check your internet connection.

If you access net through proxy server, you should a piece of code into your scrapy project (http://doc.scrapy.org/en/latest/topics/downloader-middleware.html#scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware)

Anyway, try upgrade your scrapy version.

Share:
17,631
anders
Author by

anders

Updated on June 12, 2022

Comments

  • anders
    anders almost 2 years

    I'm receiving an error when trying to test scrapy installation:

    $ scrapy shell http://www.google.es
    j2011-02-16 10:54:46+0100 [scrapy] INFO: Scrapy 0.12.0.2536 started (bot: scrapybot)
    2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled extensions: TelnetConsole, SpiderContext, WebService, CoreStats, MemoryUsage, CloseSpider
    2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled scheduler middlewares: DuplicatesFilterMiddleware
    2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpProxyMiddleware, HttpCompressionMiddleware, DownloaderStats
    2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
    2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled item pipelines: 
    2011-02-16 10:54:46+0100 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
    2011-02-16 10:54:46+0100 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
    2011-02-16 10:54:46+0100 [default] INFO: Spider opened
    2011-02-16 10:54:47+0100 [default] DEBUG: Retrying <GET http://www.google.es> (failed 1 times): Connection was refused by other side: 111: Connection refused.
    2011-02-16 10:54:47+0100 [default] DEBUG: Retrying <GET http://www.google.es> (failed 2 times): Connection was refused by other side: 111: Connection refused.
    2011-02-16 10:54:47+0100 [default] DEBUG: Discarding <GET http://www.google.es> (failed 3 times): Connection was refused by other side: 111: Connection refused.
    2011-02-16 10:54:47+0100 [default] ERROR: Error downloading <http://www.google.es>: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionRefusedError'>: Connection was refused by other side: 111: Connection refused.
        ]
    2011-02-16 10:54:47+0100 [scrapy] ERROR: Shell error
        Traceback (most recent call last):
        Failure: scrapy.exceptions.IgnoreRequest: Connection was refused by other side: 111: Connection refused.
    
    2011-02-16 10:54:47+0100 [default] INFO: Closing spider (shutdown)
    2011-02-16 10:54:47+0100 [default] INFO: Spider closed (shutdown)
    

    Versions:

    • Scrapy 0.12.0.2536
    • Python 2.6.6
    • OS: Ubuntu 10.10

    EDIT: I can reach it with my browser, wget, telnet google.es 80 and it happens with all the sites.

    • Vajk Hermecz
      Vajk Hermecz over 9 years
      Any solution to this? I am also experiencing this when trying to use privoxy proxy with scrapy...
  • Han
    Han about 4 years
    How did you find that your server was blocking non-whitelisted ports?