What is the status of POSIX asynchronous I/O (AIO)?

25,461

Solution 1

Network I/O is not a priority for AIO because everyone writing POSIX network servers uses an event based, non-blocking approach. The old-style Java "billions of blocking threads" approach sucks horribly.

Disk write I/O is already buffered and disk read I/O can be prefetched into buffer using functions like posix_fadvise. That leaves direct, unbuffered disk I/O as the only useful purpose for AIO.

Direct, unbuffered I/O is only really useful for transactional databases, and those tend to write their own threads or processes to manage their disk I/O.

So, at the end that leaves POSIX AIO in the position of not serving any useful purpose. Don't use it.

Solution 2

Doing socket I/O efficiently has been solved with kqueue, epoll, IO completion ports and the likes. Doing asynchronous file I/O is sort of a late comer (apart from windows' overlapped I/O and solaris early support for posix AIO).

If you're looking for doing socket I/O, you're probably better off using one of the above mechanisms.

The main purpose of AIO is hence to solve the problem of asynchronous disk I/O. This is most likely why Mac OS X only supports AIO for regular files, and not sockets (since kqueue does that so much better anyway).

Write operations are typically cached by the kernel and flushed out at a later time. For instance when the read head of the drive happens to pass by the location where the block is to be written.

However, for read operations, if you want the kernel to prioritize and order your reads, AIO is really the only option. Here's why the kernal can (theoretically) do that better than any user level application:

  • The kernel sees all disk I/O, not just your applications disk jobs, and can order them at a global level
  • The kernel (may) know where the disk read head is, and can pick the read jobs you pass on to it in optimal order, to move the head the shortest distance
  • The kernel can take advantage of native command queuing to optimize your read operations further
  • You may be able to issue more read operations per system call using lio_listio() than with readv(), especially if your reads are not (logically) contiguous, saving a tiny bit of system call overhead.
  • Your program might be slightly simpler with AIO since you don't need an extra thread to block in a read or write call.

That said, posix AIO has a quite awkward interface, for instance:

  • The only efficient and well supported mean of event callbacks are via signals, which makes it hard to use in a library, since it means using signal numbers from the process-global signal namespace. If your OS doesn't support realtime signals, it also means you have to loop through all your outstanding requests to figure out which one actually finished (this is the case for Mac OS X for instance, not Linux). Catching signals in a multi-threaded environment also makes for some tricky restrictions. You can typically not react to the event inside the signal handler, but you have to raise a signal, write to a pipe or use signalfd() (on linux).
  • lio_suspend() has the same issues as select() does, it doesn't scale very well with the number of jobs.
  • lio_listio(), as implemented has fairly limited number of jobs you can pass in, and it's not trivial to find this limit in a portable way. You have to call sysconf(_SC_AIO_LISTIO_MAX), which may fail, in which case you can use the AIO_LISTIO_MAX define, which are not necessarily defined, but then you can use 2, which is defined as guaranteed to be supported.

As for real-world application using posix AIO, you could take a look at lighttpd (lighty), which also posted a performance measurement when introducing support.

Most posix platforms supports posix AIO by now (Linux, BSD, Solaris, AIX, tru64). Windows supports it via its overlapped file I/O. My understanding is that only Solaris, Windows and Linux truly supports async. file I/O all the way down to the driver, whereas the other OSes emulate the async. I/O with kernel threads. Linux being the exception, its posix AIO implementation in glibc emulates async operations with user level threads, whereas its native async I/O interface (io_submit() etc.) are truly asynchronous all the way down to the driver, assuming the driver supports it.

I believe it's fairly common among OSes to not support posix AIO for any fd, but restrict it to regular files.

Solution 3

A libtorrent developer provides a report on this: http://blog.libtorrent.org/2012/10/asynchronous-disk-io/

Solution 4

There is aio_write - implemented in glibc; first call of the aio_read or aio_write function spawns a number of user mode threads, aio_write or aio_read post requests to that thread, the thread does pread/pwrite and when it is finished the answer is posted back to the blocked calling thread.

Ther is also 'real' aio - supported by the kernel level (need libaio for that, see the io_submit call http://linux.die.net/man/2/io_submit ); also need O_DIRECT for that (also may not be supported by all file systems, but the major ones do support it)

see here:

http://lse.sourceforge.net/io/aio.html

http://linux.die.net/man/2/io_submit

Difference between POSIX AIO and libaio on Linux?

Share:
25,461
Glyph
Author by

Glyph

I'm the founder of the Twisted project and a software developer.

Updated on May 04, 2020

Comments

  • Glyph
    Glyph about 4 years

    There are pages scattered around the web that describe POSIX AIO facilities in varying amounts of detail. None of them are terribly recent. It's not clear what, exactly, they're describing. For example, the "official" (?) web site for Linux kernel asynchronous I/O support here says that sockets don't work, but the "aio.h" manual pages on my Ubuntu 8.04.1 workstation all seem to imply that it works for arbitrary file descriptors. Then there's another project that seems to work at the library layer with even less documentation.

    I'd like to know:

    • What is the purpose of POSIX AIO? Given that the most obvious example of an implementation I can find says it doesn't support sockets, the whole thing seems weird to me. Is it just for async disk I/O? If so, why the hyper-general API? If not, why is disk I/O the first thing that got attacked?
    • Where are there example complete POSIX AIO programs that I can look at?
    • Does anyone actually use it, for real?
    • What platforms support POSIX AIO? What parts of it do they support? Does anyone really support the implied "Any I/O to any FD" that <aio.h> seems to promise?

    The other multiplexing mechanisms available to me are perfectly good, but the random fragments of information floating around out there have made me curious.

  • Alex B
    Alex B over 14 years
    What about reading/writing from network (NFS, Samba) filesystems?
  • n-alexander
    n-alexander over 13 years
    well. I have several big dumb writers which, if I let them go to cache, will hit dirty_ratio at peaks, blocking everybody else. If I just use direct IO on them, it is way too slow. If I just had 1 thread I could manage on my own, but it'll be hard to support different IO priorities in 1 tread. AIO + CFQ wouls really seem a good combination, if AIO worked
  • Hongli
    Hongli over 13 years
    I disagree. Disk I/O tends to be buffered but it can be blocking. When poll()ing a file FD it always reports that the FD is readable, even when it will block. This makes it impossible to perform non-blocking operations on disk files in an evented manner, unless one uses threads or AIO.
  • Zan Lynx
    Zan Lynx over 13 years
    @Hongli: The DB engines I have seen use a thread or process of their own. Do you have an example of an SQL engine that uses AIO?
  • Matt Joiner
    Matt Joiner over 13 years
    There's no real such thing as asynchronous sockets. Not in the sense that you can dispatch a bunch of writes to a single socket, as ordering is important. For those protocols where ordering is not important, the calls are not blocking...
  • Ben Voigt
    Ben Voigt about 13 years
    @Matt: Order isn't important for datagram sockets. @Zan: async I/O is very nice for prebuffering real-time streaming data, e.g. media players.
  • Ben Voigt
    Ben Voigt about 13 years
    Windows has had OVERLAPPED I/O supporting disk files since Win32 first came out. It's not at all new. And on POSIX, the signal namespace isn't process-global, it's per-thread. Signals are delivered to particular threads (or is aio an exception to that, can't remember for certain?).
  • Arvid
    Arvid about 13 years
    There's no way to specify which thread AIO delivers its signals to. On linux it seems to mostly deliver it to the thread that issued the aio_*() command, but not always (the only solution I've found to this was to create multiple signalfds). There was a linux patch up on the kernel mailing list a few years ago that would add this, but it never made it in, and it would have been an extensions to POSIX. On Mac OS X, signals seems to mostly be delivered to the main thread (in my experience). I don't think POSIX requires a specific behavior, if it does, I would love to see the part of the spec.
  • Marenz
    Marenz about 13 years
    glibc's implementation of aio_read/write uses threads in userland, so not even kernel threads are used here.
  • Jon Watte
    Jon Watte over 11 years
    AIO has nothing to do with "billions of blocking threads." The whole point of AIO is to have a fixed number of I/O service threads.
  • Zan Lynx
    Zan Lynx over 11 years
    @JonWatte: My answer didn't have anything to do with billions of threads. It was explaining that networking applications already use a event-based system in which AIO is useless.
  • Jon Watte
    Jon Watte over 11 years
    It is not true that AIO is useless in event-based systems. You can actually get to zero-copy networking with proper AIO, which you cannot with event-based notification to recv(). Other things may conspire to make this mostly a theoretical limitation, but I think that the lack of proper AIO (a la OVERLAPPED on Windows) is one of the last big holes in Linux.
  • dpn
    dpn over 11 years
    That would be Arvid, who also replied above :)
  • MikeB
    MikeB almost 11 years
    What does "always typically" mean? Writes are cached by the kernel for any method, or when using AIO? Seems like there must be a way to have software be certain the write was successfully completed; otherwise, integrity and transactional goals can't be met.
  • Chris Pacejo
    Chris Pacejo over 10 years
    According to Ted Ts'o in 2010 fadvise is NOT guaranteed to be asynchronous: lkml.indiana.edu/hypermail/linux/kernel/1012.0/01942.html If this is still the case then AIO still has its place.
  • Glyph
    Glyph over 9 years
    Many of the deficiencies of aio_write are covered above, in stackoverflow.com/a/5307557/13564
  • wick
    wick almost 9 years
    Another live example where you can use AIO is nginx. All modes are supported. If you prefer offloading to userland threads, you would normally find it much worse than direct IO, but Linux native AIO is on par with direct IO. The situation when AIO can be substantially beneficial is severe page cache pressure. Conceptual difference between Async and Direct IOs may be seen here ftp.dei.uc.pt/pub/linux/kernel/people/suparna/aio-linux.pdf
  • chmike
    chmike almost 8 years
    Excellent report. Very instructive. Thank you for sharing.
  • EFraim
    EFraim over 6 years
    The answer does not state anything about which platforms actually support what operations on what kinds of descriptors.
  • gacopu
    gacopu almost 5 years
    @Arvid Is The only efficient and well supported mean of event callbacks are via signals still true today? I'm readling posix aio and found by aio_return one can easily get the status. Can you explain? thanks
  • Arvid
    Arvid almost 5 years
    sure, you always have to call aio_return() once an iocb completes. What I was referring to was being notified of its completion. You can poll aio_return, but it's not going to scale very well. I haven't really looked at the state of aio lately, so I don't know if it's much improved. It's primarily MacOS that has poor support, where signals is your only option, without any hint about which iocb completed.
  • gacopu
    gacopu almost 5 years
    @Arvid Thank you so much. By the way, how does libtorrent do the IO now(linux)? linux native AIO? normal pwrite/pread? posix AIO?
  • Arvid
    Arvid almost 5 years
    the stable release has a thread pool with preadv() and pwritev() calls. master has a thread pool with reads and writes against memory mapped files. Given where storage is going, I think memory mapped I/O is the future. It's synchronous though, so you still need a thread pool.