Scrapy: connection refused
Solution 1
Mission 1: Scrapy will send a usergent with 'bot' in it. Sites might block based on user agent also.
Try over-riding USER_AGENT in settings.py
Eg: USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.7'
Mission 2: Try giving a delay between request, to spoof that a human is sending the request.
DOWNLOAD_DELAY = 0.25
Mission 3: If nothing works, install wireshark and see the difference in request header (or) post data while scrapy sends and when your browser sends.
Solution 2
Probably there is an issue with your network connection.
First of all, check your internet connection.
If you access net through proxy server, you should a piece of code into your scrapy project (http://doc.scrapy.org/en/latest/topics/downloader-middleware.html#scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware)
Anyway, try upgrade your scrapy version.
anders
Updated on June 12, 2022Comments
-
anders almost 2 years
I'm receiving an error when trying to test scrapy installation:
$ scrapy shell http://www.google.es j2011-02-16 10:54:46+0100 [scrapy] INFO: Scrapy 0.12.0.2536 started (bot: scrapybot) 2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled extensions: TelnetConsole, SpiderContext, WebService, CoreStats, MemoryUsage, CloseSpider 2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled scheduler middlewares: DuplicatesFilterMiddleware 2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpProxyMiddleware, HttpCompressionMiddleware, DownloaderStats 2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled item pipelines: 2011-02-16 10:54:46+0100 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023 2011-02-16 10:54:46+0100 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080 2011-02-16 10:54:46+0100 [default] INFO: Spider opened 2011-02-16 10:54:47+0100 [default] DEBUG: Retrying <GET http://www.google.es> (failed 1 times): Connection was refused by other side: 111: Connection refused. 2011-02-16 10:54:47+0100 [default] DEBUG: Retrying <GET http://www.google.es> (failed 2 times): Connection was refused by other side: 111: Connection refused. 2011-02-16 10:54:47+0100 [default] DEBUG: Discarding <GET http://www.google.es> (failed 3 times): Connection was refused by other side: 111: Connection refused. 2011-02-16 10:54:47+0100 [default] ERROR: Error downloading <http://www.google.es>: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionRefusedError'>: Connection was refused by other side: 111: Connection refused. ] 2011-02-16 10:54:47+0100 [scrapy] ERROR: Shell error Traceback (most recent call last): Failure: scrapy.exceptions.IgnoreRequest: Connection was refused by other side: 111: Connection refused. 2011-02-16 10:54:47+0100 [default] INFO: Closing spider (shutdown) 2011-02-16 10:54:47+0100 [default] INFO: Spider closed (shutdown)
Versions:
- Scrapy 0.12.0.2536
- Python 2.6.6
- OS: Ubuntu 10.10
EDIT: I can reach it with my browser, wget, telnet google.es 80 and it happens with all the sites.
-
Vajk Hermecz over 9 yearsAny solution to this? I am also experiencing this when trying to use privoxy proxy with scrapy...
-
Han about 4 yearsHow did you find that your server was blocking non-whitelisted ports?