Node.js request module getting ETIMEDOUT and ESOCKETTIMEDOUT
Solution 1
Edit: duplicate of https://stackoverflow.com/a/37946324/744276
By default, Node has 4 workers to resolve DNS queries. If your DNS query takes long-ish time, requests will block on the DNS phase, and the symptom is exactly ESOCKETTIMEDOUT
or ETIMEDOUT
.
Try increasing your uv thread pool size:
export UV_THREADPOOL_SIZE=128
node ...
or in index.js
(or wherever your entry point is):
#!/usr/bin/env node
process.env.UV_THREADPOOL_SIZE = 128;
function main() {
...
}
Edit 1: I also wrote a blog post about it.
Edit 2: if queries are non-unique, you may want to use a cache, like nscd.
Solution 2
I found if there are too many async requests, then ESOCKETTIMEDOUT exception happens in linux. The workaround I've found is doing this:
setting this options to request():
agent: false, pool: {maxSockets: 100}
Notice that after that, the timeout can be lying so you might need to increase it.
Jorayen
Updated on July 09, 2022Comments
-
Jorayen almost 2 years
I'm crawling a lot of links with the request module in parallel with combination of the async module.
I'm noticing alot ofETIMEDOUT
andESOCKETTIMEDOUT
errors although the links are reachable and respond fairly quickly using chrome.I've limit the
maxSockets
to 2 and thetimeout
to 10000 in the request options. I'm usingasync.filterLimit()
with a limit of 2 to even cut down the parallelism to 2 request each time. So I have 2 sockets, 2 request, and a timeout of 10 seconds to wait for headers response from the server yet I get these errors.Here;s request configuration I use:
{ ... pool: { maxSockets: 2 }, timeout: 10000 , time: true ... }
Here's the snippet of code I use to fecth links:
var self = this; async.filterLimit(resources, 2, function(resource, callback) { request({ uri: resource.uri }, function (error, response, body) { if (!error && response.statusCode === 200) { ... } else { self.emit('error', resource, error); } callback(...); }) }, function(result) { callback(null, result); });
I listened to the error event and I see whenever the error code is
ETIMEDOUT
the connect object is either true/false so sometimes it's a connection timeout and sometimes it's not (according to request docs)UPDATE: I decided to boost up the
maxSockets
toInfinity
so no connection will be hangup due to lack of available sockets:pool: { maxSockets: Infinity }
In-order to control the bandwidth I implemented a
requestLoop
method that handle the request with amaxAttemps
andretryDelay
parameters to control the requests:async.filterLimit(resources, 10, function(resource, callback) { self.requestLoop({ uri: resource.uri }, 100, 5000, function (error, response, body) { var fetched = false; if (!error) { ... } else { .... } callback(...); }); }, function(result) { callback(null, result); });
Implementation of requestLoop:
requestLoop = function(options, attemptsLeft, retryDelay, callback, lastError) { var self = this; if (attemptsLeft <= 0) { callback((lastError != null ? lastError : new Error('...'))); } else { request(options, function (error, response, body) { var recoverableErrors = ['ESOCKETTIMEDOUT', 'ETIMEDOUT', 'ECONNRESET', 'ECONNREFUSED']; var e; if ((error && _.contains(recoverableErrors, error.code)) || (response && (500 <= response.statusCode && response.statusCode < 600))) { e = error ? new Error('...'); e.code = error ? error.code : response.statusCode; setTimeout((function () { self.requestLoop(options, --attemptsLeft, retryDelay, callback, e); }), retryDelay); } else if (!error && (200 <= response.statusCode && response.statusCode < 300)) { callback(null, response, body); } else if (error) { e = new Error('...'); e.code = error.code; callback(e); } else { e = new Error('...'); e.code = response.statusCode; callback(e); } }); } };
So this to sum it up: - Boosted
maxSockets
toInfinity
to try overcome timeout error of sockets connection - ImplemntedrequestLoop
method to control failed request andmaxAttemps
as well asretryDelay
of such requests - Also there's maxium number of concurrent request set by the number passed toasync.filterLimit
I want to note that I've also played with the settings of everything here in-order to get errors free crawling but so far attempts failed as-well.
Still looking for help about solving this problem.
UPDATE2: I've decided to drop async.filterLimit and make my own limit mechanism. I just have 3 variables to help me achieve this:
pendingRequests
- a request array which will hold all requests (will explain later)activeRequests
- number of active requestsmaxConcurrentRequests
- number of maximum allowed concurrent requestsinto the pendingRequests array, i push a complex object containing a reference to the requestLoop function as well as arguments array containing the arguments to be passed to the loop function:
self.pendingRequests.push({ "arguments": [{ uri: resource.uri.toString() }, self.maxAttempts, function (error, response, body) { if (!error) { if (self.policyChecker.isMimeTypeAllowed((response.headers['content-type'] || '').split(';')[0]) && self.policyChecker.isFileSizeAllowed(body)) { self.totalBytesFetched += body.length; resource.content = self.decodeBuffer(body, response.headers["content-type"] || '', resource); callback(null, resource); } else { self.fetchedUris.splice(self.fetchedUris.indexOf(resource.uri.toString()), 1); callback(new Error('Fetch failed because a mime-type is not allowed or file size is bigger than permited')); } } else { self.fetchedUris.splice(self.fetchedUris.indexOf(resource.uri.toString()), 1); callback(error); } self.activeRequests--; self.runRequest(); }], "function": self.requestLoop }); self.runRequest();
You'' notice the call to
runRequest()
at the end. This function job is to manage the requests and fire requests when it can while keeping the maximumactiveRequests
under the limit ofmaxConcurrentRequests
:var self = this; process.nextTick(function() { var next; if (!self.pendingRequests.length || self.activeRequests >= self.maxConcurrentRequests) { return; } self.activeRequests++; next = self.pendingRequests.shift(); next["function"].apply(self, next["arguments"]); self.runRequest(); });
This should solve any Timeouts errors, through my testings tho, I've still noticed some timeouts in specific websites I've tested this on. I can't be 100% sure about this, but I'm thinking it's due to the nature of the website backing http-server limiting a user requests to a maximum by doing an ip-checking and as a result returning some HTTP 400 messages to prevent a possible 'attack' on the server.
-
Jorayen almost 8 yearsthanks for the helpful addition to this topic, I've yet to tested what you've suggested but that's good to know none the less. I will also report result as soon as I'll have a free time to do so :)
-
Eric Rini about 7 yearsWhat I've read about UV_THREADPOOL_SIZE suggests that this is most important for blocking io (such as disk access) but will not matter for non-blocking io (such asnetwork access).
-
Christian Ivicevic over 5 yearsFirst of all, this worked for me since I was downloading a few hundred small files which caused this bug. Secondly, what are the downsides of increasing the number of sockets? Will they be automatically closed at some point?
-
Motiejus Jakštys about 5 yearsThat's correct, except DNS resolution is also blocking, due to how
getaddrinfo(3)
works.