Why do we need to call poll_wait in poll?

13,690

Solution 1

poll_wait adds your device (represented by the "struct file") to the list of those that can wake the process up.

The idea is that the process can use poll (or select or epoll etc) to add a bunch of file descriptors to the list on which it wishes to wait. The poll entry for each driver gets called. Each one adds itself (via poll_wait) to the waiter list.

Then the core kernel blocks the process in one place. That way, any one of the devices can wake up the process. If you return non-zero mask bits, that means those "ready" attributes (readable/writable/etc) apply now.

So, in pseudo-code, it's roughly like this:

foreach fd:
    find device corresponding to fd
    call device poll function to setup wait queues (with poll_wait) and to collect its "ready-now" mask

while time remaining in timeout and no devices are ready:
    sleep

return from system call (either due to timeout or to ready devices)

Solution 2

The poll file_operation sleeps if you return 0

This is what was confusing me.

When you return non-zero, it means that some event was fired, and it wakes up.

Once you see this, it is clear that something must be tying the process to the wait queue, and that thing is poll_wait.

Also remember that struct file represents "a connection between a process and an open file", not just a filesystem file, and as such it contains the pid, which is used to identify the process.

Playing with a minimal runnable example might also help clear things up: https://stackoverflow.com/a/44645336/895245

Share:
13,690

Related videos on Youtube

demonguy
Author by

demonguy

Updated on September 15, 2022

Comments

  • demonguy
    demonguy over 1 year

    In LDD3, i saw such codes

    static unsigned int scull_p_poll(struct file *filp, poll_table *wait)
    {
        struct scull_pipe *dev = filp->private_data;
        unsigned int mask = 0;
    
        /*
         * The buffer is circular; it is considered full
         * if "wp" is right behind "rp" and empty if the
         * two are equal.
         */
        down(&dev->sem);
        poll_wait(filp, &dev->inq,  wait);
        poll_wait(filp, &dev->outq, wait);
        if (dev->rp != dev->wp)
            mask |= POLLIN | POLLRDNORM;    /* readable */
        if (spacefree(dev))
            mask |= POLLOUT | POLLWRNORM;   /* writable */
        up(&dev->sem);
        return mask;
    }
    

    But it says poll_wait won't wait and will return immediately. Then why do we need to call it? Why can't we just return mask?

  • demonguy
    demonguy almost 9 years
    Then when does the process sleep?
  • demonguy
    demonguy almost 9 years
    You mean, poll call from user space will block the process, right ?
  • Gil Hamilton
    Gil Hamilton almost 9 years
    Yes. When you call poll(2) in user space, that goes to a function called "sys_poll" inside the kernel (see fs/select.c in kernel source). Likewise, select(2) => sys_select, etc. All those functions follow more or less the pseudo-code I gave above.
  • EML
    EML about 8 years
    This is completely wrong. poll_wait doesn't 'trigger' at all. It simply adds a wait queue to the poll_table.
  • Kevin Ding
    Kevin Ding about 3 years
    I have a question: what does wait_queue_head_t do? void poll_wait (struct file *, wait_queue_head_t *, poll_table *);
  • Gil Hamilton
    Gil Hamilton about 3 years
    It's a data structure that anchors the head of the queue of "waiting processes" (within this device). So that if an interrupt comes in that delivers data (for Read) or frees up space (for Write), the device can notify the core kernel that any waiting process on the queue can be awakened (which would result in each process being unblocked [scheduled to run] and hence cause a return to user space from the select/poll syscall that the process in the queue is blocked in).