Boost Asio single threaded performance

11,879

Solution 1

You might want to read my question from a few years ago, I asked it when first investigating the scalability of Boost.Asio while developing the system software for the Blue Gene/Q supercomputer.

Scaling to 100k or more connections should not be a problem, though you will need to be aware of the obvious resource limitations such as the maximum number of open file descriptors. If you haven't read the seminal C10K paper, I suggest reading it.

After you have implemented your application using a single thread and a single io_service, I suggest investigating a pool of threads invoking io_service::run(), and only then investigate pinning an io_service to a specific thread and/or cpu. There are multiple examples included in the Asio documentation for all three of these designs, and several questions on SO with more information. Be aware that as you introduce multiple threads invoking io_service::run() you may need to implement strands to ensure the handlers have exclusive access to shared data structures.

Solution 2

Using boost::asio you can write single-thread or multi-thread server approximately at same development cost. You can write single-threaded version as first version, then convert it to multithreaded, if needed.

Typically, only bottleneck for boost::asio is that epoll/kqueue reactor is working in a mutex. So, only one thread is doing epoll at same time. This can decrease performance in case when you have multithreaded server, which serves lots and lots very small packets. But, imo it anyway should be faster than just plain-singlethread server.

Now about your task. If you want to just pass messages between connections - i think it must be multithreaded server. The problem is syscalls(recv/send etc). An instruction is very easy think to do for CPU, but any syscall is not very "light" operation (everything is relative, but relative to other jobs in your task). So, with single thread you will get big syscalls overhead, its why i recommend to use multithreaded scheme.

Also, you can separate io_service and make it work as "io_service per thread" idiom. I think this must give best performance, but it has drawback: if one of io_service will get too big queue - other threads will not help it, so some connections may slowdown. On other side, with single io_service - queue overrun can lead to big locking overhead. All you can do - do the both variants and measure bandwidth/latency. It should be not too difficult to implement both variants.

Share:
11,879
pic11
Author by

pic11

I code in C, C++ and Python.

Updated on June 20, 2022

Comments

  • pic11
    pic11 almost 2 years

    I am implementing custom server that needs to maintain very large number (100K or more) of long lived connections. Server simply passes messages between sockets and it doesn't do any serious data processing. Messages are small, but many of them are received/send every second. Reducing latency is one of the goals. I realize that using multiple cores won't improve performance and therefore I decided to run the server in a single thread by calling run_one or poll methods of io_service object. Anyway multi-threaded server would be much harder to implement.

    What are the possible bottlenecks? Syscalls, bandwidth, completion queue / event demultiplexing? I suspect that dispatching handlers may require locking (that is done internally by asio library). Is it possible to disable even queue locking (or any other locking) in boost.asio?

    EDIT: related question. Does syscall performance improve with multiple threads? My feeling is that because syscalls are atomic/synchronized by the kernel adding more threads won't improve speed.