Fastest technique to pass messages between processes on Linux?

c++ linux performance ipc latency

55,948

Solution 1

I would suggest looking at this also: How to use shared memory with Linux in C.

Basically, I'd drop network protocols such as TCP and UDP when doing IPC on a single machine. These have packeting overhead and are bound to even more resources (e.g. ports, loopback interface).

Solution 2

Whilst all the above answers are very good, I think we'd have to discuss what is "fastest" [and does it have to be "fastest" or just "fast enough for "?]

For LARGE messages, there is no doubt that shared memory is a very good technique, and very useful in many ways.

However, if the messages are small, there are drawbacks of having to come up with your own message-passing protocol and method of informing the other process that there is a message.

Pipes and named pipes are much easier to use in this case - they behave pretty much like a file, you just write data at the sending side, and read the data at the receiving side. If the sender writes something, the receiver side automatically wakes up. If the pipe is full, the sending side gets blocked. If there is no more data from the sender, the receiving side is automatically blocked. Which means that this can be implemented in fairly few lines of code with a pretty good guarantee that it will work at all times, every time.

Shared memory on the other hand relies on some other mechanism to inform the other thread that "you have a packet of data to process". Yes, it's very fast if you have LARGE packets of data to copy - but I would be surprised if there is a huge difference to a pipe, really. Main benefit would be that the other side doesn't have to copy the data out of the shared memory - but it also relies on there being enough memory to hold all "in flight" messages, or the sender having the ability to hold back things.

I'm not saying "don't use shared memory", I'm just saying that there is no such thing as "one solution that solves all problems 'best'".

To clarify: I would start by implementing a simple method using a pipe or named pipe [depending on which suits the purposes], and measure the performance of that. If a significant time is spent actually copying the data, then I would consider using other methods.

Of course, another consideration should be "are we ever going to use two separate machines [or two virtual machines on the same system] to solve this problem. In which case, a network solution is a better choice - even if it's not THE fastest, I've run a local TCP stack on my machines at work for benchmark purposes and got some 20-30Gbit/s (2-3GB/s) with sustained traffic. A raw memcpy within the same process gets around 50-100GBit/s (5-10GB/s) (unless the block size is REALLY tiny and fits in the L1 cache). I haven't measured a standard pipe, but I expect that's somewhere roughly in the middle of those two numbers. [This is numbers that are about right for a number of different medium-sized fairly modern PC's - obviously, on a ARM, MIPS or other embedded style controller, expect a lower number for all of these methods]

Solution 3

NetOS Systems Research Group from Cambridge University, UK has done some (open-source) IPC benchmarks.

Source code is located at https://github.com/avsm/ipc-bench .

Project page: http://www.cl.cam.ac.uk/research/srg/netos/projects/ipc-bench/ .

Results: http://www.cl.cam.ac.uk/research/srg/netos/projects/ipc-bench/results.html

This research has been published using the results above: http://anil.recoil.org/papers/drafts/2012-usenix-ipc-draft1.pdf

Solution 4

Check CMA and kdbus: https://lwn.net/Articles/466304/

I think the fastest stuff these days are based on AIO. http://www.kegel.com/c10k.html

Solution 5

As you tagged this question with C++, I'd recommend Boost.Interprocess:

Shared memory is the fastest interprocess communication mechanism. The operating system maps a memory segment in the address space of several processes, so that several processes can read and write in that memory segment without calling operating system functions. However, we need some kind of synchronization between processes that read and write shared memory.

Source

One caveat I've found is the portability limitations for synchronization primitives. Nor OS X, nor Windows have a native implementation for interprocess condition variables, for example, and so it emulates them with spin locks.

Now if you use a *nix which supports POSIX process shared primitives, there will be no problems.

Shared memory with synchronization is a good approach when considerable data is involved.

View more solutions

55,948

user997112

Updated on July 09, 2022

Comments

user997112 almost 2 years
What is the fastest technology to send messages between C++ application processes, on Linux? I am vaguely aware that the following techniques are on the table:
- TCP
- UDP
- Sockets
- Pipes
- Named pipes
- Memory-mapped files
are there any more ways and what is the fastest?
- paddy over 11 years
  
  What are the latency requirements for your application?
- user997112 over 11 years
  
  @paddy basically I will be looking to shave off every nano/microsecond that I can.
James Kanze over 11 years

The AIO stuff is not the fastest solution for communicating between processes on the same processor. Your second link isn't really anything I'd recommend.
user997112 over 11 years

@JamesKanze would you be able to elaborate on your points? With regard to c10k, i have often shared your view- but I have seen that URL quoted many times on SO??
user997112 over 11 years

My messages will be small in size. However, I would not want to block the sender if the receiver cannot copy. This is because imagine I am sending weather data of the same country- the most recent weather data message will override any remaining messages which are still currently being processed. I do however like the fact you say the receiver will be automatically notified!
Mats Petersson over 11 years

There are various ways you'd be able to do that. And it may be simpler to let the receiver look (briefly) at the message it read and say "Well, it's old, so I'll just throw this away" than to fix messaging system to sort things out. That assumes that your processing in the receiving side is substantial, and it's relatively easy to send the data. Another way to solve that is to have a two-way system, where the "receiver" says "I'm done, please send the next packet now!", and the sender simply keeping that "most up to date" at any given time.
Sam over 11 years

While I agree with all that, it would depend on how shared memory is used. E.g. one could implement double buffering: The sender continuously dumps data into block A, each time locking a lock and setting an 'avail flag'. The reader(s) could then wait on that lock, turn the buffers and reset that flag, so that they can safely use the most recent data (read only) without copying, while the writer continues to write into block B. Whether the writer should be blocked by another lock or not may be defined according to the type of data processing it does.
James Kanze over 11 years

@user997112 For anything on the same processor, shared memory beats the alternatives hands down. Between processors, the time differences between asynchronous IO and using separate threads are negligible, and the multithread model is significantly cleaner and easier to develop and maintain. With efficient threading, there's no case where I would chose async IO.
Mats Petersson over 11 years

I agree. I wanted to explain in my answer that there are several ways to solve the same problem, and it all depends on what you are actually trying to achieve which is best, rather than state outright that "one solution is best", because I don't believe that is right. Unless either the data is fairly large, or the processing is very trivial, the actual method to transfer the data is PROBABLY not the biggest stumbling block.
Sam over 11 years

Guess, we are in complete agreement, that the OP should show us some details.
user997112 over 11 years

It is just a case of message being sent to receiver- receiver begins processing data, then once finished receiver begins processing next piece of data "in the queue". So I am implementing a queuing system, sender "sends" data to the receiver using whatever technique is recommended here. Has that helped provide more info?
Sam over 11 years

Yes, so I have following questions: How large are those pieces? Have all pieces to be processed completely and in order?
Alex over 11 years

People have commented mostly on the size of the message being exchanged, and if you use one or two processors. But I believe that a relevant and important issue is the rate of events. If you are processing a very large of events per second (say hundreds of thousands) then AIO may give you an edge.
Davide Berra over 7 years

Linked document is awesome! Thank you
lucid_dreamer about 5 years

@JamesKanze "and the multithread model is significantly cleaner and easier to develop and maintain" -> I thought unpredictable pre-emption was a con of the threading model, so that it is easier to reason about non-blocking IO solutions....
étale-cohomology almost 4 years

Sadly, Boost is bloated.