how is select() alerted to an fd becoming "ready"?

c linux sockets messaging file-descriptor

51,317

Solution 1

It reports that it's ready by returning.

select waits for events that are typically outside your program's control. In essence, by calling select, your program says "I have nothing to do until ..., please suspend my process".

The condition you specify is a set of events, any of which will wake you up.

For example, if you are downloading something, your loop would have to wait on new data to arrive, a timeout to occur if the transfer is stuck, or the user to interrupt, which is precisely what select does.

When you have multiple downloads, data arriving on any of the connections triggers activity in your program (you need to write the data to disk), so you'd give a list of all download connections to select in the list of file descriptors to watch for "read".

When you upload data to somewhere at the same time, you again use select to see whether the connection currently accepts data. If the other side is on dialup, it will acknowledge data only slowly, so your local send buffer is always full, and any attempt to write more data would block until buffer space is available, or fail. By passing the file descriptor we are sending to to select as a "write" descriptor, we get notified as soon as buffer space is available for sending.

The general idea is that your program becomes event-driven, i.e. it reacts to external events from a common message loop rather than performing sequential operations. You tell the kernel "this is the set of events for which I want to do something", and the kernel gives you a set of events that have occured. It is fairly common for two events occuring simultaneously; for example, a TCP acknowledge was included in a data packet, this can make the same fd both readable (data is available) and writeable (acknowledged data has been removed from send buffer), so you should be prepared to handle all of the events before calling select again.

One of the finer points is that select basically gives you a promise that one invocation of read or write will not block, without making any guarantee about the call itself. For example, if one byte of buffer space is available, you can attempt to write 10 bytes, and the kernel will come back and say "I have written 1 byte", so you should be prepared to handle this case as well. A typical approach is to have a buffer "data to be written to this fd", and as long as it is non-empty, the fd is added to the write set, and the "writeable" event is handled by attempting to write all the data currently in the buffer. If the buffer is empty afterwards, fine, if not, just wait on "writeable" again.

The "exceptional" set is seldom used -- it is used for protocols that have out-of-band data where it is possible for the data transfer to block, while other data needs to go through. If your program cannot currently accept data from a "readable" file descriptor (for example, you are downloading, and the disk is full), you do not want to include the descriptor in the "readable" set, because you cannot handle the event and select would immediately return if invoked again. If the receiver includes the fd in the "exceptional" set, and the sender asks its IP stack to send a packet with "urgent" data, the receiver is then woken up, and can decide to discard the unhandled data and resynchronize with the sender. The telnet protocol uses this, for example, for Ctrl-C handling. Unless you are designing a protocol that requires such a feature, you can easily leave this out with no harm.

Obligatory code example:

#include <sys/types.h>
#include <sys/select.h>

#include <unistd.h>

#include <stdbool.h>

static inline int max(int lhs, int rhs) {
    if(lhs > rhs)
        return lhs;
    else
        return rhs;
}

void copy(int from, int to) {
    char buffer[10];
    int readp = 0;
    int writep = 0;
    bool eof = false;
    for(;;) {
        fd_set readfds, writefds;
        FD_ZERO(&readfds);
        FD_ZERO(&writefds);

        int ravail, wavail;
        if(readp < writep) {
            ravail = writep - readp - 1;
            wavail = sizeof buffer - writep;
        }
        else {
            ravail = sizeof buffer - readp;
            wavail = readp - writep;
        }

        if(!eof && ravail)
            FD_SET(from, &readfds);
        if(wavail)
            FD_SET(to, &writefds);
        else if(eof)
            break;
        int rc = select(max(from,to)+1, &readfds, &writefds, NULL, NULL);
        if(rc == -1)
            break;
        if(FD_ISSET(from, &readfds))
        {
            ssize_t nread = read(from, &buffer[readp], ravail);
            if(nread < 1)
                eof = true;
            readp = readp + nread;
        }
        if(FD_ISSET(to, &writefds))
        {
            ssize_t nwritten = write(to, &buffer[writep], wavail);
            if(nwritten < 1)
                break;
            writep = writep + nwritten;
        }
        if(readp == sizeof buffer && writep != 0)
            readp = 0;
        if(writep == sizeof buffer)
            writep = 0;
    }
}

We attempt to read if we have buffer space available and there was no end-of-file or error on the read side, and we attempt to write if we have data in the buffer; if end-of-file is reached and the buffer is empty, then we are done.

This code will behave clearly suboptimal (it's example code), but you should be able to see that it is acceptable for the kernel to do less than we asked for both on reads and writes, in which case we just go back and say "whenever you're ready", and that we never read or write without asking whether it will block.

Solution 2

From the same man page:

On exit, the sets are modified in place to indicate which file descriptors actually changed status.

So use FD_ISSET() on the sets passed to select to determine which FDs have become ready.

51,317

Mike

Have a BS in computer science from SIUE Worked @ Motorola for 6 years as an embedded systems software engineer Currently reside in OH working for Emerson as a software engineer

Updated on July 09, 2022

Comments

Mike almost 2 years
I don't know why I'm having a hard time finding this, but I'm looking at some linux code where we're using select() waiting on a file descriptor to report it's ready. From the man page of select:
```
select() and pselect() allow a program to monitor multiple file descriptors,
waiting until one or more of the file descriptors become "ready" for some
class of I/O operation 
```
So, that's great... I call select on some descriptor, give it some time out value and start to wait for the indication to go. How does the file descriptor (or owner of the descriptor) report that it's "ready" such that the select() statement returns?
- Nikolai Fetissov over 11 years
  
  beej.us/guide/bgnet/output/html/multipage/selectman.html
- Mike over 11 years
  
  @NikolaiNFetissov - From your link, After select() returns, the values in the sets will be changed to show which are ready for reading or writing, and which have exceptions. So what caused the return of select() that told us the socket is ready for reading? That's what I do not understand
- Nikolai Fetissov over 11 years
  
  When in-kernel network stack detects that there's an event pending on any of the socket descriptors your process is woken up from the wait and select returns. The FD sets are in-out parameters - you tell the kernel what you are interested in, it tells you back what happened.
- Mike over 11 years
  
  @NikolaiNFetissov - So you're saying I open a fd and call select because I want to read something. On the other end of the socket someone has written to that fd and now the kernel tells select to wake me up because it's "ready" to read?
- Nikolai Fetissov over 11 years
  
  Yes, but the main function of select(2) (and poll(2), or epoll(7)) is I/O de-multiplexing - you can wait on multiple sockets and react to events when they come.
- Noah Huppert over 5 years
  
  @NikolaiFetissov's link to the beej.us guide is 404, new link as of 12/29/18: beej.us/guide/bgnet/html/multi/selectman.html
- Optimized Coder over 4 years
  
  @NikolaiFetissov's & noah-huppert's link to beej's guide select manual is 404ing, new link: beej.us/guide/bgnet/html/#select
Mike over 11 years

I must be missing the point here, I'm asking "what caused select() to return the file descriptor is ready" and I'm hearing, "select returns when they're ready, because they're ready". What is the definition of a "ready" socket?
Ignacio Vazquez-Abrams over 11 years

One that can be read from, written to, or has had some other exceptional occurrence happen, depending on which sets it was in. "Those listed in readfds will be watched to see if characters become available for reading (more precisely, to see if a read will not block; in particular, a file descriptor is also ready on end-of-file), those in writefds will be watched to see if a write will not block, and those in exceptfds will be watched for exceptions."
rici over 11 years

You shouldn't actually assume that the read (or write) won't block, since something might happen between the select() returning and the read() or write() being issued. For example, someone else might read the data/fill the pipe. It's more like a signal that the operation has a chance of not blocking. Also, select() will wake up if there is an error condition on the fd, since that would cause an immediate read()/write() to return immediately with the error condition.
Simon Richter over 11 years

If I'm the only one accessing this file descriptor, then I have that guarantee, at least for sockets and pipes. And if someone else accesses my fds, I will get weird interleaved data anyway.
rici over 11 years

Yes, if you have the fd open in only one process, and its an unnamed pipes or a socket. But not if it is a named pipe. (Technically, if you could somehow know that you were the only process reading/writing the named pipe, it's true, but how do you know?) I raise the point only because it is a classic problem aka "why does my server freeze at random intervals?"
Simon Richter over 11 years

Yup, for additional safety one could set the fd to nonblocking mode and just treat the resulting EWOULDBLOCK as an error.
JustAMartin over 7 years

What if at one point we have nothing to write and writefds is empty set? I've seen some specifications saying that it is not correct to pass an empty set to select(), so you should check if any fd has been actually added to writefds and if not then pass NULL to select() instead of writefds.
Simon Richter over 7 years

@JustAMartin, that shouldn't make a difference. For WinSock, we need to check that at least one fd in any set is set, or it will return an error -- but this can be done by checking maxfd.
Jacob Garby almost 6 years

This is exactly what I was looking for.