Should You Continue Polling Socket For Readiness After An EAGAIN or EWOULDBLOCK Error?

5,156

This link shows the meaning of error codes in GNU library. EAGAIN/EWOULDBLOCK means resources temporarily unavailable. The call might work if you try later. An example is the case of non-blocking IO operation that will block.

Share:
5,156

Related videos on Youtube

EdNdee
Author by

EdNdee

Updated on September 18, 2022

Comments

  • EdNdee
    EdNdee over 1 year

    I am creating a web crawler with a multiplexed download manager using Linux epoll (Linux 2.6.30.x). I pick links from a database of over 40,000 domains (each domain having between 1 and 2000 urls), a total of 250,000 urls. I multiplex the downloads so that on average I have not more than 2 parallel streams per host (as per the HTTP spec recommendation), and also so that I loop over between a batch of 10 to 50 hosts at a time. I have chosen non-blocking sockets and epoll for speed and scalability (am low on RAM) and ease of use compared to the poll, select and signal-driven I/O.

    I download the first few 100s of urls very smoothly and rapidly. Trouble is, I keep getting EAGAIN/EWOULDBLOCK error from certain links (sockets) that otherwise seem ready (i.e. I can use my PC's browser to open the links at any point). But even after epolling them repeatedly expecting their status to change to EPOLLIN, they remain EAGAIN/EWOULDBLOCK. These links build-up very quickly so that I have to stop the whole download.

    What really does EAGAIN/EWOULDBLOCK mean? Is EAGAIN/EWOULDBLOCK a permanent status, so that once detected I should delist that socket from any further observation?

    Kindly help.

    • David Schwartz
      David Schwartz about 12 years
      Can you clarify exactly what's happening? Are you getting an epoll read hit or write hit? What operation is returning EAGAIN/EWOULDBLOCK?
    • EdNdee
      EdNdee about 12 years
      I've 3 threads -thread1 issues epoll_ctl(epoll_writefd, EPOLL_CTL_ADD,..) and epoll_ctl(epoll_readfd, EPOLL_CTL_ADD,..) for each live host socket -less than 50 active; thread2 issues epoll_wait(epoll_writefd,...,-1) to check write readiness, when ready the actual http request, then epoll_ctl(epoll_writefd, EPOLL_CTL_DEL,..) to remove socket from further write epoll; thread3 issues epoll_wait(epoll_readfd,..,-1) to check read readiness, when ready, download page repeatedly (until error or complete), then issues epoll_ctl(epoll_readfd, EPOLL_CTL_DEL,..) to remove socket from further read epoll.
    • David Schwartz
      David Schwartz about 12 years
      Okay, so what operation returns EWOULDBLOCK? I think what you're missing is this: If a read operations returns EWOULDBLOCK, you don't want to try to read again until you get another epoll read hit.
    • EdNdee
      EdNdee about 12 years
      Solved! Thanks David. "If a read operations returns EWOULDBLOCK, you don't want to try to read again until you get another epoll read hit" - That's actually quite important coz the thread would then block, I hadn't initially figured that out! I appreciate your help.
    • David Schwartz
      David Schwartz about 12 years
      The thread shouldn't block because you should have set the socket non-blocking. (If you want to block, why use epoll? And if you don't want to block, you must set the socket.) What will happen, though, is that the thread will spin.