multithreading or multiprocessing

14,720

Solution 1

Both of them can be complicated and complex in their own ways.

You can do either. In the grand scheme of things, it might not matter which you choose. What does matter is how well you do them. Therefore:

Do what you are most experienced with. Or if your leading a team, do what the team is most experienced with.

---Threading!---

I have done a lot of threaded programming, and I enjoy parts of it, and parts of it I do not enjoy. I've learned a lot, and now can usually write a multi-threaded application without too much pain, but it does have to be written in a very specific way. Namely:

1) It has to be written with very clearly defined data boundaries that are 100% thread safe. Otherwise, whatever condition that can happen, will happen, and it might not be when you have a debugger laying around.. Plus debugging threaded code is like peering into Schrodinger's box... By looking in there, other threads may or may not have had time to process more.

2) It has to be written with test code that stresses the machine. Many multi-threaded systems only show their bugs when the machines are heavily stressed.

3) There has to be some very smart person who owns the data exchanging code. If there is any way for a shortcut to be made, some developer will probably make it, and you will have an errant bug.

4) There has to be catch-all situations that will reset the application with a minimum of fuss. This is for the production code that breaks because of some threading issue. In short: The show must go on.

---Cross-Process!---

I have less experience with process-based threading, but have recently been doing some cross-process stuff in Windows (where the IPC is web service calls... WOO!), and it is relatively clean and simple, but I follow some rules here as well. By and large, interprocess communication will be much more error free because programs receive input from the outside world very well.. and those transport mechanisms are usually asynchronous. Anyway...

1) Define clear process boundaries and communication mechanisms. Message/eventing via, oh say, TCP or web services or pipes or whatever is fine, as long as the borders are clear, and there is a lot of validation and error checking code at those borders.

2) Be prepared for bottlenecks. Code forgiveness is very important. By this I mean, sometimes you won't be able to write to that pipe. You have to be able to requeue and retry those messages without the application locking up/tossing an exception.

3) There will be a lot more code in general, because transporting data across process boundaries means you have to serialize it in some fashion. This can be a source of problems, especially when you start maintaining and changing that code.

Hope this helps.

Solution 2

You've left out too many details. Actually, in terms of what you have already stated, the choice is irrelevant and there is nothing inherently more buggy about multithreading than multiprocessing; you're missing why these techniques have such a reputation. If you aren't sharing data then there isn't much problem to be had (of course, there may be some other issues, but we need details to decide about those). Also, it matters what platform, on UNIX like operating systems, processes are pretty lightweight anyway.

However, there are other issues to consider? What kind of system(s) will you be running on? You definitely don't want to spawn out several processes on a uniprocessor system as you aren't going to get much benefit, depending on some other details you could specify. If you describe the nature of the problem you are trying to solve, we can help further.

Solution 3

Depends on what programming language you want to use (and which libraries). Personally I would choose multithreading, as I know the problems associated with threads (and how to solve them).

Multiprocessing might help you if you want to run the daemon on multiple machines and distribute the load amongst them, but I don't think that that's a major problem here.

Solution 4

Do you need to share updating data between the instances where the updates are frequent and IPC would be too expensive? In that case multithreading is probably better. Otherwise you have to weigh whether the robustness of separate processes or the ease of thread creation/communication is more important to you.

Share:
14,720
pinto
Author by

pinto

Updated on August 05, 2022

Comments

  • pinto
    pinto over 1 year

    I am designing a dedicated syslog-processing daemon for Linux that needs to be robust and scalable and I'm debating multithread vs. multiprocess.

    The obvious objection with multithreading is complexity and nasty bugs. Multi-processes may impact performance because of IPC communications and context switching.

    "The Art of Unix Programming" discusses this here.

    Would you recommend a process-based system (like Apache) or a multi-threaded approach?

  • Santosh
    Santosh over 10 years
    Thanks! Good points to keep in mind when developing thread-safe code.