How is Node.js inherently faster when it still relies on Threads internally?

javascript architecture concurrency node.js

41,092

Solution 1

There are actually a few different things being conflated here. But it starts with the meme that threads are just really hard. So if they're hard, you are more likely, when using threads to 1) break due to bugs and 2) not use them as efficiently as possible. (2) is the one you're asking about.

Think about one of the examples he gives, where a request comes in and you run some query, and then do something with the results of that. If you write it in a standard procedural way, the code might look like this:

result = query( "select smurfs from some_mushroom" );
// twiddle fingers
go_do_something_with_result( result );

If the request coming in caused you to create a new thread that ran the above code, you'll have a thread sitting there, doing nothing at all while while query() is running. (Apache, according to Ryan, is using a single thread to satisfy the original request whereas nginx is outperforming it in the cases he's talking about because it's not.)

Now, if you were really clever, you would express the code above in a way where the environment could go off and do something else while you're running the query:

query( statement: "select smurfs from some_mushroom", callback: go_do_something_with_result() );

This is basically what node.js is doing. You're basically decorating -- in a way that is convenient because of the language and environment, hence the points about closures -- your code in such a way that the environment can be clever about what runs, and when. In that way, node.js isn't new in the sense that it invented asynchronous I/O (not that anyone claimed anything like this), but it's new in that the way it's expressed is a little different.

Note: when I say that the environment can be clever about what runs and when, specifically what I mean is that the thread it used to start some I/O can now be used to handle some other request, or some computation that can be done in parallel, or start some other parallel I/O. (I'm not certain node is sophisticated enough to start more work for the same request, but you get the idea.)

Solution 2

Note! This is an old answer. While it's still true in the rough outline, some details might have changed because of Node's rapid development in the last few years.

It is using threads because:

The O_NONBLOCK option of open() does not work on files.
There are third-party libraries which don't offer non-blocking IO.

To fake non-blocking IO, threads are neccessary: do blocking IO in a separate thread. It is an ugly solution and causes much overhead.

It's even worse on the hardware level:

With DMA the CPU asynchronously offloads IO.
Data is transferred directly between the IO device and the memory.
The kernel wraps this in a synchronous, blocking system call.
Node.js wraps the blocking system call in a thread.

This is just plain stupid and inefficient. But it works at least! We can enjoy Node.js because it hides the ugly and cumbersome details behind an event-driven asynchronous architecture.

Maybe someone will implement O_NONBLOCK for files in the future?...

Edit: I discussed this with a friend and he told me that an alternative to threads is polling with select: specify a timeout of 0 and do IO on the returned file descriptors (now that they are guaranteed not to block).

Solution 3

I fear I'm "doing the wrong thing" here, if so delete me and I apologize. In particular, I fail to see how I create the neat little annotations that some folks have created. However, I have many concerns/observations to make on this thread.

1) The commented element in the pseudo-code in one of the popular answers

result = query( "select smurfs from some_mushroom" );
// twiddle fingers
go_do_something_with_result( result );

is essentially bogus. If the thread is computing, then it's not twiddling thumbs, it's doing necessary work. If, on the other hand, it's simply waiting for the completion of IO, then it's not using CPU time, the whole point of the thread control infrastructure in the kernel is that the CPU will find something useful to do. The only way to "twiddle your thumbs" as suggested here would be to create a polling loop, and nobody who has coded a real webserver is inept enough to do that.

2) "Threads are hard", only makes sense in the context of data sharing. If you have essentially independent threads such as is the case when handling independent web requests, then threading is trivially simple, you just code up the linear flow of how to handle one job, and sit pretty knowing that it will handle multiple requests, and each will be effectively independent. Personally, I would venture that for most programmers, learning the closure/callback mechanism is more complex than simply coding the top-to-bottom thread version. (But yes, if you have to communicate between the threads, life gets really hard really fast, but then I'm unconvinced that the closure/callback mechanism really changes that, it just restricts your options, because this approach is still achievable with threads. Anyway, that's a whole other discussion that's really not relevant here).

3) So far, nobody has presented any real evidence as to why one particular type of context switch would be more or less time consuming than any other type. My experience in creating multi-tasking kernels (on a small scale for embedded controllers, nothing so fancy as a "real" OS) suggests that this would not be the case.

4) All the illustrations that I have seen to date that purport to show how much faster Node is than other webservers are horribly flawed, however, they're flawed in a way that does indirectly illustrate one advantage I would definitely accept for Node (and it's by no means insignificant). Node doesn't look like it needs (nor even permits, actually) tuning. If you have a threaded model, you need to create sufficient threads to handle the expected load. Do this badly, and you'll end up with poor performance. If there are too few threads, then the CPU is idle, but unable to accept more requests, create too many threads, and you will waste kernel memory, and in the case of a Java environment, you'll also be wasting main heap memory. Now, for Java, wasting heap is the first, best, way to screw up the system's performance, because efficient garbage collection (currently, this might change with G1, but it seems that the jury is still out on that point as of early 2013 at least) depends on having lots of spare heap. So, there's the issue, tune it with too few threads, you have idle CPUs and poor throughput, tune it with too many, and it bogs down in other ways.

5) There is another way in which I accept the logic of the claim that Node's approach "is faster by design", and that is this. Most thread models use a time-sliced context switch model, layered on top of the more appropriate (value judgement alert :) and more efficient (not a value judgement) preemptive model. This happens for two reasons, first, most programmers don't seem to understand priority preemption, and second, if you learn threading in a windows environment, the timeslicing is there whether you like it or not (of course, this reinforces the first point; notably, the first versions of Java used priority preemption on Solaris implementations, and timeslicing in Windows. Because most programmers didn't understand and complained that "threading doesn't work in Solaris" they changed the model to timeslice everywhere). Anyway, the bottom line is that timeslicing creates additional (and potentially unnecessary) context switches. Every context switch takes CPU time, and that time is effectively removed from the work that can be done on the real job at hand. However, the amount of time invested in context switching because of timeslicing should not be more than a very small percentage of the overall time, unless something pretty outlandish is happening, and there's no reason I can see to expect that to be the case in a simple webserver). So, yes, the excess context switches involved in timeslicing are inefficient (and these don't happen in kernel threads as a rule, btw) but the difference will be a few percent of throughput, not the kind of whole number factors that are implied in the performance claims that are often implied for Node.

Anyway, apologies for that all being long and rambly, but I really feel that so far, the discussion hasn't proved anything, and I would be pleased to hear from someone in either of these situations:

a) a real explanation of why Node should be better (beyond the two scenarios I've outlined above, the first of which (poor tuning) I believe is the real explanation for all the tests I've seen so far. ([edit], actually, the more I think about it, the more I'm wondering if the memory used by vast numbers of stacks might be significant here. The default stack sizes for modern threads tend to be pretty huge, but the memory allocated by a closure-based event system would be only what's needed)

b) a real benchmark that actually gives a fair chance to the threaded server of choice. At least that way, I'd have to stop believing that the claims are essentially false ;> ([edit] that's probably rather stronger than I intended, but I do feel that the explanations given for performance benefits are incomplete at best, and the benchmarks shown are unreasonable).

Cheers, Toby

Solution 4

What I don't understand is the point that Node.js still is using threads.

Ryan uses threads for that parts that are blocking(Most of node.js uses non-blocking IO) because some parts are insane hard to write non blocking. But I believe Ryan wish is to have everything non-blocking. On slide 63(internal design) you see Ryan uses libev(library that abstracts asynchronous event notification) for the non-blocking eventloop. Because of the event-loop node.js needs lesser threads which reduces context switching, memory consumption etc.

Solution 5

Threads are used only to deal with functions having no asynchronous facility, like stat().

The stat() function is always blocking, so node.js needs to use a thread to perform the actual call without blocking the main thread (event loop). Potentially, no thread from the thread pool will ever be used if you don't need to call those kind of functions.

View more solutions

41,092

Ralph Caraveo

Updated on October 14, 2021

Comments

Ralph Caraveo over 2 years

I just watched the following video: Introduction to Node.js and still don't understand how you get the speed benefits.

Mainly, at one point Ryan Dahl (Node.js' creator) says that Node.js is event-loop based instead of thread-based. Threads are expensive and should only be left to the experts of concurrent programming to be utilized.

Later, he then shows the architecture stack of Node.js which has an underlying C implementation which has its own Thread pool internally. So obviously Node.js developers would never kick off their own threads or use the thread pool directly...they use async call-backs. That much I understand.

What I don't understand is the point that Node.js still is using threads...it's just hiding the implementation so how is this faster if 50 people request 50 files (not currently in memory) well then aren't 50 threads required?

The only difference being that since it's managed internally the Node.js developer doesn't have to code the threaded details but underneath it's still using the threads to process the IO (blocking) file requests.

So aren't you really just taking one problem (threading) and hiding it while that problem still exists: mainly multiple threads, context switching, dead-locks...etc?

There must be some detail I still do not understand here.
- Pointy over 13 years
  
  I'm inclined to agree with you that the claim is somewhat over-simplified. I believe node's performance advantage boils down to two things: 1) the actual threads are all contained at a fairly low level, and thus remain constrained in size and number, and the thread synchronization is thus simplified; 2) OS-level "switching" via select() is faster than thread context swaps.
- veritas almost 10 years
  
  Please see this stackoverflow.com/questions/24796334/…
Tor Valamo over 13 years

The database thing is more a question of not waiting for the answer while holding up other requests (which may or may not use the database), but rather ask for something and then let it call you when it gets back. I don't think it links them together, as that would be quite difficult to keep track of on response. Also i don't think there's any MySQL interface that lets you hold multiple unbuffered responses on one connection (??)
BGerrissen over 13 years

It's just an abstract example to explain how event loops can offer more efficiency, nodejs does nothing with DB's without extra modules ;)
Tor Valamo over 13 years

Yeah my comment was more towards the 100 queries in a single database roundtrip. :p
Tobias P. over 13 years

exactly! The performance of node.js isnt't due to it's event based loop or some asynchronous io, the callback system which drastically minifies waiting time is the core of the node.js performance.
Ralph Caraveo over 13 years

Okay, I can definitely see how this can increase performance because it sounds to me like you are able to max out your CPU because there isn't any threads or execution stacks just waiting around for IO to return so what Ryan has done is effectively found a way to close all the gaps.
jrtipton over 13 years

Yeah, the one thing I'd say is that it's not like he found a way to close the gaps: it's not a new pattern. What's different is that he is using Javascript to let the programmer express their program in a way that is much more convenient for this kind of asynchrony. Possibly a nitpicky detail, but still...
CHAPa almost 13 years

Hi BGerrissen: nice post. So, when a query is executing, other similar queries will "listener" like the staticFile.X example above ? for example, 100 users retrieving the same query, only one query will be executed and the others 99 will be listening the first one ? thanks !
Paul over 12 years

It's also worth pointing out that for a lot of the I/O tasks, Node uses whatever kernel-level async I/O api that's available (epoll, kqueue, /dev/poll, whatever)
Florin Dumitrescu about 11 years

I'm still not sure that I fully understand it. If we consider that inside a web request IO operations are the ones that take most of the time needed to process the request and if for each IO operation a new thread is created, then for 50 requests that come in a very fast succession, we will probably have 50 threads running in parallel and executing their IO part. The difference from standard web servers is that in there the entire request is executed on the thread, while in node.js just its IO part, but that is the part that is taking most of the time and making the thread to wait.
nalply about 11 years

A problem with threads: they need RAM. A very busy server can run up to a few thousand threads. Node.js avoids the threads and is thusly more efficient. The efficiency is not by running code faster. It does not matter if code is run in threads or in an event loop. For the CPU it's the same. But with doing away threads we save RAM: only one stack instead of a few thousand stacks. And we also save context switches.
SystemParadox almost 11 years

@FlorinDumitrescu, the difference is that 50 IO operations probably don't use 50 threads- they can be queued in a sensible fashion. This is particularly good when left to the kernel, which as Paul points out, is normally the case.
Florin Dumitrescu almost 11 years

@SystemParadox thanks for pointing that out. I actually made some research on the topic lately and indeed the catch is that Asynchronous I/O, when properly implemented at kernel level, does not use threads while performing async I/O operations. Instead the calling thread is released as soon as an I/O operation is started and a callback is executed when the I/O operation is finished and a thread is available for it. So node.js can run 50 concurrent requests with 50 I/O operations in (almost) parallel using just one thread if the async support for the I/O operations is properly implemented.
7cows almost 11 years

@FlorinDumitrescu And is it properly implemented in Node at this time? Can you provide some links? Thanks.
Florin Dumitrescu almost 11 years

@7cows, my mentioned asynchronous I/O research was made with .Net rather than Node, so from Node's perspective I was talking purely theoretically since I don't have any first hand experience in there. I assume that common operations on common platforms are properly implemented, but I can't give you any specific details.
Andy Dufresne over 10 years

It would be interesting to understand how v8, libeio and libev interact to build up an event loop. Anyone?
levi about 10 years

But node is not doing away with threads. It still uses them internally for the IO tasks, which is what most web requests require.
andy over 9 years

@jrtipton, +1 for it's not a new pattern. What's different is that he is using Javascript to let the programmer express their program in a way that is much more convenient for this kind of asynchrony. Since, select\poll\epoll\kqueue, etc is already exists. And the callbacks about not doing blocking things is the same as when we using select\poll\epoll\kqueue in one process(thread) to process all registered events, since it will postpone other event processing job!
Harish Kayarohanam almost 9 years

@FlorinDumitrescu , I think not all IO operations release threads. only Network IO releases the thread and taken care of by the IO hardware. But in case of fs IO it has to take a new thread from the pool and do .docs.libuv.org/en/latest/design.html .. So I have the same doubt now .. will the 50 request not spawn 50 threads ... now the problem of context switching is it not back ? confused
Oleksandr Papchenko over 8 years

Also node stores closures of callbacks in RAM, so i can not see where it wins.
binki over 7 years

@levi But nodejs doesn’t use the “one thread per request” sort of thing. It uses an IO threadpool, probably to avoid the complication with using asynchronous IO APIs (and maybe POSIX open() can’t be made non-blocking?). This way, it amortizes any performance hit where the traditional fork()/pthread_create()-on-request model would have to create and destroy threads. And, as mentioned in postscript a), this also amortizes the stack space issue. You can probably serve thousands of requests with, say, 16 IO threads just fine.
binki over 7 years

You’re making it sound like nodejs automatically memoizes function calls or something. Now, because you don’t have to worry about shared memory synchronization in JavaScript’s event loop model, it is easier to cache things in memory safely. But that doesn’t mean nodejs magically does that for you or that this is the type of performance enhancement being asked about.
David Tonhofer over 7 years

"The default stack sizes for modern threads tend to be pretty huge, but the memory allocated by a closure-based event system would be only what's needed" I get the impression these should be of the same order. Closures are not cheap, the runtime will have to keep the whole call tree of the single-threaded application in memory ("emulating stacks" so to say) and will be able to clean up when a leaf of tree gets released as the associated closure gets "resolved". This will include lots of references to on-heap stuff that cannot be garbage collected and will hit performance at clean-up time.
Pacerier about 7 years

What about Windows?
Pacerier about 7 years

@binki, Modern apache derivatives also maintain a thread pool. They don't fire a new thread per request that's just the (very) old way of doing things. You need to compare nodejs with the modern web servers, not the 20 yr old models.
Pacerier about 7 years

@nalply, That makes no sense. The internal threads of nodejs' threadpool also require their own stack space. No difference aside from comparing threads you can see vs threads you can't see.
binki about 7 years

@Pacerier but doesn't it still dedicate a thread (from the pool) to each request? If so, then there is some small OS level overhead for each thread and you need your pool to be big to handle a large number of concurrent requests. In the nodejs model, the IO thread pool can perform work for more concurrent requests than are threads in the IO pool. And many of the IO operations which have OS level asynchronous APIs might not even need to use that pool at all.
nalply over 6 years

Sorry, no idea. I only know that libuv is the platform-neutral layer for doing asynchronous work. In the beginning of Node there was no libuv. Then it was decided to split off libuv and this made platform-specific code easier. In other words, Windows has its own asynchronous story which might be completely different from Linux, but for us it doesn't matter because libuv does the hard work for us.