Why are malloc() and printf() said as non-reentrant?

22,798

Solution 1

malloc and printf usually use global structures, and employ lock-based synchronization internally. That's why they're not reentrant.

The malloc function could either be thread-safe or thread-unsafe. Both are not reentrant:

  1. Malloc operates on a global heap, and it's possible that two different invocations of malloc that happen at the same time, return the same memory block. (The 2nd malloc call should happen before an address of the chunk is fetched, but the chunk is not marked as unavailable). This violates the postcondition of malloc, so this implementation would not be re-entrant.

  2. To prevent this effect, a thread-safe implementation of malloc would use lock-based synchronization. However, if malloc is called from signal handler, the following situation may happen:

    malloc();            //initial call
      lock(memory_lock); //acquire lock inside malloc implementation
    signal_handler();    //interrupt and process signal
    malloc();            //call malloc() inside signal handler
      lock(memory_lock); //try to acquire lock in malloc implementation
      // DEADLOCK!  We wait for release of memory_lock, but 
      // it won't be released because the original malloc call is interrupted
    

    This situation won't happen when malloc is simply called from different threads. Indeed, the reentrancy concept goes beyond thread-safety and also requires functions to work properly even if one of its invocation never terminates. That's basically the reasoning why any function with locks would be not re-entrant.

The printf function also operated on global data. Any output stream usually employs a global buffer attached to the resource data are sent to (a buffer for terminal, or for a file). The print process is usually a sequence of copying data to buffer and flushing the buffer afterwards. This buffer should be protected by locks in the same way malloc does. Therefore, printf is also non-reentrant.

Solution 2

Let's understand what we mean by re-entrant. A re-entrant function can be invoked before a previous invocation has finished. This might happen if

  • a function is called in a signal handler (or more generally than Unix some interrupt handler) for a signal that was raised during execution of the function
  • a function is called recursively

malloc isn't re-entrant because it is managing several global data structures that track free memory blocks.

printf isn't re-entrant because it modifies a global variable i.e. the content of the FILE* stout.

Solution 3

There are at least three concepts here, all of which are conflated in colloquial language, which might be why you were confused.

  • thread-safe
  • critical section
  • re-entrant

To take the easiest one first: Both malloc and printf are thread-safe. They have been guaranteed to be thread-safe in Standard C since 2011, in POSIX since 2001, and in practice since long before that. What this means is that the following program is guaranteed not to crash or exhibit bad behavior:

#include <pthread.h>
#include <stdio.h>

void *printme(void *msg) {
  while (1)
    printf("%s\r", (char*)msg);
}

int main() {
  pthread_t thr;
  pthread_create(&thr, NULL, printme, "hello");        
  pthread_create(&thr, NULL, printme, "goodbye");        
  pthread_join(thr, NULL);
}

An example of a function which is not thread-safe is strtok. If you call strtok from two different threads simultaneously, the result is undefined behavior — because strtok internally uses a static buffer to keep track of its state. glibc adds strtok_r to fix this problem, and C11 added the same thing (but optionally and under a different name, because Not Invented Here) as strtok_s.

Okay, but doesn't printf use global resources to build its output, too? In fact, what would it even mean to print to stdout from two threads simultaneously? That brings us to the next topic. Obviously printf is going to be a critical section in any program that uses it. Only one thread of execution is allowed to be inside the critical section at once.

At least in POSIX-compliant systems, this is achieved by having printf begin with a call to flockfile(stdout) and end with a call to funlockfile(stdout), which is basically like taking a global mutex associated with stdout.

However, each distinct FILE in the program is allowed to have its own mutex. This means that one thread can call fprintf(f1,...) at the same time that a second thread is in the middle of a call to fprintf(f2,...). There's no race condition here. (Whether your libc actually runs those two calls in parallel is a QoI issue. I don't actually know what glibc does.)

Similarly, malloc is unlikely to be a critical section in any modern system, because modern systems are smart enough to keep one pool of memory for each thread in the system, rather than having all N threads fight over a single pool. (The sbrk system call will still probably be a critical section, but malloc spends very little of its time in sbrk. Or mmap, or whatever the cool kids are using these days.)

Okay, so what does re-entrancy actually mean? Basically, it means that the function can safely be called recursively — the current invocation is "put on hold" while a second invocation runs, and then the first invocation is still able to "pick up where it left off." (Technically this might not be due to a recursive call: the first invocation might be in Thread A, which gets interrupted in the middle by Thread B, which makes the second invocation. But that scenario is just a special case of thread-safety, so we can forget about it in this paragraph.)

Neither printf nor malloc can possibly be called recursively by a single thread, because they are leaf functions (they don't call themselves nor call out to any user-controlled code that could possibly make a recursive call). And, as we saw above, they've been thread-safe against *multi-*threaded re-entrant calls since 2001 (by using locks).

So, whoever told you that printf and malloc were non-reentrant was wrong; what they meant to say was probably that both of them have the potential to be critical sections in your program — bottlenecks where only one thread can get through at a time.


Pedantic note: glibc does provide an extension by which printf can be made to call arbitrary user code, including re-calling itself. This is perfectly safe in all its permutations — at least as far as thread-safety is concerned. (Obviously it opens the door to absolutely insane format-string vulnerabilities.) There are two variants: register_printf_function (which is documented and reasonably sane, but officially "deprecated") and register_printf_specifier (which is almost identical except for one extra undocumented parameter and a total lack of user-facing documentation). I wouldn't recommend either of them, and mention them here merely as an interesting aside.

#include <stdio.h>
#include <printf.h>  // glibc extension

int widget(FILE *fp, const struct printf_info *info, const void *const *args) {
  static int count = 5;
  int w = *((const int *) args[0]);
  printf("boo!");  // direct recursive call
  return fprintf(fp, --count ? "<%W>" : "<%d>", w);  // indirect recursive call
}
int widget_arginfo(const struct printf_info *info, size_t n, int *argtypes) {
  argtypes[0] = PA_INT;
  return 1;
}
int main() {
  register_printf_function('W', widget, widget_arginfo);
  printf("|%W|\n", 42);
}

Solution 4

Most likely because you can't start writing output while another call to printf is still printing it's self. The same goes for memory allocation and deallocation.

Share:
22,798
ultimate cause
Author by

ultimate cause

Starting from February 2002, it is now close to 18 years of experience (all in product organizations) I gathered in different domains of IT industry. During this tenure I got some good opportunities where I earned experience in “concept to code” journeys till final delivery. I also learnt skills on mentoring and technical leadership while being hands-on. In most part of my roles I worked in system side areas of projects. In all my projects I used C/C++ as programming language on UNIX/Linux family (briefly on Windows as well). Currently, I am working as a Sr. Principal Engineer in Dell which I joined in November 2012. Most of my work involves me in enhancing various parts of NetBSD kernel and drivers (NIC, I2C, GPIO etc) to accommodate new features for Dell Networking OS. In this job my special focus goes towards debugging kernel/driver issues, contributing to roadmaps and mentoring my juniors. Additionally, in this role I am Technical Lead to resolve security related issues for various Dell Networking OSes. Therefore, I am recipient of rare opportunity to study security threats and creation of threat-models for network/software security. The roles above demand specific expertise in communication and leadership with technical teams. Other than mentoring/leading another aspect in my role as leader for Security team is to have dialogue with teams outside Dell Networking BU and represent Dell Networking to form security policies for Dell Enterprise Solution Group. Prior to Dell, I worked for Ericsson as Lead Engineer during June 2009 to Nov 2012, in telecom domain on SCP systems. In this role I was in-charge of designing, developing and enhancing an interpreter and few other software modules. This job demanded a good knowledge of multi-processing, shared memory databases and debugging skills. During 2006 and June 2009, I worked for Megasoft in telecom domain for SIP based SoftPBX. In this project I had rare opportunity of conceptualizing, coding and delivering the product. In this project I owned several modules and drove them to delivery stage. This was one of most challenging project which needed expertise in object oriented designing, multi-threading, SIP and other related VoIP protocols.

Updated on July 09, 2022

Comments

  • ultimate cause
    ultimate cause almost 2 years

    In UNIX systems we know malloc() is a non-reentrant function (system call). Why is that?

    Similarly, printf() also is said to be non-reentrant; why?

    I know the definition of re-entrancy, but I wanted to know why it applies to these functions. What prevents them being guaranteed reentrant?

  • ChrisF
    ChrisF over 13 years
    This explains what "re-entrant" is, but not why these functions are non re-entrant.
  • ultimate cause
    ultimate cause over 13 years
    Oh great so. I was foolish for asking this about printf. Thanks. But can't we call malloc() from two different threads simultaneously ?
  • ultimate cause
    ultimate cause over 13 years
    OK, but where can it fail ? Will be good if I get some examples.
  • P Shved
    P Shved over 13 years
    "you can't start writing output while another call to printf is still printing it's self." Why? What bit of printf causes that? It's not obvious. Maybe the result would be Hello, woSOMETHINGELSErld!, but all you asked would be nevertheless printed?
  • JeremyP
    JeremyP over 13 years
    @RIPUNJAY TRIPATHI: It depends on the implementation, but usually malloc is thread safe if you have threading support switched on in your compilation.
  • UnixShadow
    UnixShadow over 13 years
    It is undefined behavior in the C specification, there is not example of where it may fail, cause at the time when writing the C specification this was a non issue (no threads at all). It might work find on some implementations and not at all on others while both implementations may follow the specification.
  • P Shved
    P Shved over 13 years
    So you claim that malloc is not thread-safe? Interesting... (-1) And you also claim that including mutexes makes the function reentrant... Even more interesting! (-2)
  • Bart van Ingen Schenau
    Bart van Ingen Schenau over 13 years
    @RIPUNJAY: It could happen that both calls to malloc return the same pointer, because because both malloc invocations determined that block to be available (the first invocation would be interrupted between determining the block as free and marking it as allocated).
  • ultimate cause
    ultimate cause over 13 years
    @Jeremy: Are thread-safety and reenterancy one and same? I thought they are two different things.
  • ultimate cause
    ultimate cause over 13 years
    @UnixShadow:Sorry there is some problem. If there was NO threads concept at all, why RE-ENTERANCY is linked to that ?
  • janneb
    janneb over 13 years
    @RIPUNJAY: A reentrant function is also thread-safe, but a thread-safe function is not necessarily reentrant. And yes, they are two different things. For instance, on systems which support threads malloc() is typically thread-safe, but not reentrant.
  • ultimate cause
    ultimate cause over 13 years
    @janneb: Thanks :) I knew that.
  • janneb
    janneb over 13 years
    Finally, thank you. I was about to write more or less the same, but your answer appeared just as I was starting to type mine. +1.
  • yadab
    yadab over 13 years
    @deadmg: thread safety and RE-ENTERANCY are different though they are related.
  • Michael Foukarakis
    Michael Foukarakis over 13 years
    Thread-safety is only one of the cases for reentrancy. Another is, as mentioned, signal handlers (and other interrupt contexts).
  • Steve Jessop
    Steve Jessop over 13 years
    "any function with locks would be not re-entrant". I worked on a system where you could flag mutexes to disable signals, either just while waiting on the mutex, or throughout the time the mutex is held. Obviously it was easy to mis-use, and you have to guarantee the function would return, but it was used to access globals from reentrant functions (generally in the kernel). I assume without proof that other kernels have equivalent mechanisms, but also that the standard doesn't want to require such shenanigans where avoidable.
  • janneb
    janneb over 13 years
    @Steve: For instance, in the Linux kernel spinlocks typically disable interrupts when the lock is taken. "Normal" sleeping mutexes OTOH run with interrupts enabled.
  • ryyker
    ryyker over 9 years
    @PavelShved - whether malloc is thread-safe or not is an arguable point. At least offer a reference to help clear the issue. From your own post do you not say it is possible that malloc can be thread unsafe?
  • zwol
    zwol over 7 years
    This answer is almost correct; but it's missing a key concept from POSIX, async signal safety. Application code can execute in the middle of a call to either printf or malloc as a result of an asynchronous signal. Neither function is required to be async-signal-safe, so it is unsafe to call them from the handler for an asynchronous signal. That is what POSIX system programmers mean when they say that printf and malloc are "not reentrant".
  • zwol
    zwol over 7 years
    As of C11, malloc and printf are both required to be thread-safe (and POSIX has required this since 2001). However, it's still not safe to use them from an async signal handler.
  • Jerry Jeremiah
    Jerry Jeremiah over 7 years
    "So, whoever told you that printf and malloc were non-reentrant was wrong" What about signal handlers? With threads the original thread will eventually get the CPU back and continue but with signal handlers nothing happens until the signal handler returns...
  • Quuxplusone
    Quuxplusone over 7 years
    Jerry: When I gave this answer, I was under the impression that "reentrant" and "async-signal-safe" were not synonyms, and that malloc/printf are "reentrant but not async-signal-safe". Actually, I'm still under that impression; but @zwol's comment suggests that there is a significant fraction of the population that believes they are synonyms, and thus that any function that isn't async-signal-safe can't possibly be "reentrant", by definition. Sounds like Jerry is also in that camp. I guess the moral is: when using jargon, one should know one's audience and their assumptions. :)
  • zwol
    zwol over 7 years
    @Quuxplusone I would personally say that "reentrant" is a poorly defined term and that one should avoid using it in favor of "thread safe", "recursive-call safe", or "async signal safe", which have clearer definitions. All functions that are async-signal safe are recursive-call safe and thread-safe, and all functions that are recursive-call safe are thread-safe, but the converses are not true. Recursive call safety is only a relevant category for library functions that call back into user code (qsort for instance) and it's also a term I made up just now, I don't think standards use it.
  • Quuxplusone
    Quuxplusone over 7 years
    @zwol: Etymologically, it sure looks like "reentrant" should be a synonym for "recursive-call-safe," and that's how I've been using it. By the way, int f(void (*g)()) { static int x = 0; ++x; g(); } is an example of a function that is recursive-call-safe but not thread-safe, in that calling f([]{}) simultaneously from two different threads will result in a data race on x but calling f([]{ f([]{}); }) is completely safe. (Sorry for the C++ lambdas. Space is limited here. ;))
  • zwol
    zwol over 7 years
    @Quuxplusone Oh, good point. I tend not to write threaded code so I don't always remember all the traps. OTOH I've seen lots of people use "reentrant" as a synonym for "thread-safe" and maybe not even recognize the possibility of code needing to worry about recursive invocations.
  • tia
    tia almost 7 years
    FYI, reentrant function is not always thread-safe. stackoverflow.com/questions/856823/threadsafe-vs-re-entrant
  • cntswj
    cntswj over 2 years
    Will a reentrant lock make malloc reentrant?