Writing to a closed, local TCP socket not failing

c linux sockets tcp glibc

27,333

Solution 1

This is what the Linux man page says about write and EPIPE:

   EPIPE  fd is connected to a pipe or socket whose reading end is closed.
          When this happens the writing process will also receive  a  SIG-
          PIPE  signal.  (Thus, the write return value is seen only if the
          program catches, blocks or ignores this signal.)

When Linux is using a pipe or a socketpair, it can and will check the reading end of the pair, as these two programs would demonstrate:

void test_socketpair () {
    int pair[2];
    socketpair(PF_LOCAL, SOCK_STREAM, 0, pair);
    close(pair[0]);
    if (send(pair[1], "a", 1, MSG_NOSIGNAL) < 0) perror("send");
}

void test_pipe () {
    int pair[2];
    pipe(pair);
    close(pair[0]);
    signal(SIGPIPE, SIG_IGN);
    if (write(pair[1], "a", 1) < 0) perror("send");
    signal(SIGPIPE, SIG_DFL);
}

Linux is able to do so, because the kernel has innate knowledge about the other end of the pipe or connected pair. However, when using connect, the state about the socket is maintained by the protocol stack. Your test demonstrates this behavior, but below is a program that does it all in a single thread, similar to the two tests above:

int a_sock = socket(PF_INET, SOCK_STREAM, 0);
const int one = 1;
setsockopt(a_sock, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one));
struct sockaddr_in a_sin = {0};
a_sin.sin_port = htons(4321);
a_sin.sin_family = AF_INET;
a_sin.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
bind(a_sock, (struct sockaddr *)&a_sin, sizeof(a_sin));
listen(a_sock, 1);
int c_sock = socket(PF_INET, SOCK_STREAM, 0);
fcntl(c_sock, F_SETFL, fcntl(c_sock, F_GETFL, 0)|O_NONBLOCK);
connect(c_sock, (struct sockaddr *)&a_sin, sizeof(a_sin));
fcntl(c_sock, F_SETFL, fcntl(c_sock, F_GETFL, 0)&~O_NONBLOCK);
struct sockaddr_in s_sin = {0};
socklen_t s_sinlen = sizeof(s_sin);
int s_sock = accept(a_sock, (struct sockaddr *)&s_sin, &s_sinlen);
struct pollfd c_pfd = { c_sock, POLLOUT, 0 };
if (poll(&c_pfd, 1, -1) != 1) perror("poll");
int erropt = -1;
socklen_t errlen = sizeof(erropt);
getsockopt(c_sock, SOL_SOCKET, SO_ERROR, &erropt, &errlen);
if (erropt != 0) { errno = erropt; perror("connect"); }
puts("P|Recv-Q|Send-Q|Local Address|Foreign Address|State|");
char cmd[256];
snprintf(cmd, sizeof(cmd), "netstat -tn | grep ':%hu ' | sed 's/  */|/g'",
         ntohs(s_sin.sin_port));
puts("before close on client"); system(cmd);
close(c_sock);
puts("after close on client"); system(cmd);
if (send(s_sock, "a", 1, MSG_NOSIGNAL) < 0) perror("send");
puts("after send on server"); system(cmd);
puts("end of test");
sleep(5);

If you run the above program, you will get output similar to this:

P|Recv-Q|Send-Q|Local Address|Foreign Address|State|
before close on client
tcp|0|0|127.0.0.1:35790|127.0.0.1:4321|ESTABLISHED|
tcp|0|0|127.0.0.1:4321|127.0.0.1:35790|ESTABLISHED|
after close on client
tcp|0|0|127.0.0.1:35790|127.0.0.1:4321|FIN_WAIT2|
tcp|1|0|127.0.0.1:4321|127.0.0.1:35790|CLOSE_WAIT|
after send on server
end of test

This shows it took one write for the sockets to transition to the CLOSED states. To find out why this occurred, a TCP dump of the transaction can be useful:

16:45:28 127.0.0.1 > 127.0.0.1
 .809578 IP .35790 > .4321: S 1062313174:1062313174(0) win 32792 <mss 16396,sackOK,timestamp 3915671437 0,nop,wscale 7>
 .809715 IP .4321 > .35790: S 1068622806:1068622806(0) ack 1062313175 win 32768 <mss 16396,sackOK,timestamp 3915671437 3915671437,nop,wscale 7>
 .809583 IP .35790 > .4321: . ack 1 win 257 <nop,nop,timestamp 3915671437 3915671437>
 .840364 IP .35790 > .4321: F 1:1(0) ack 1 win 257 <nop,nop,timestamp 3915671468 3915671437>
 .841170 IP .4321 > .35790: . ack 2 win 256 <nop,nop,timestamp 3915671469 3915671468>
 .865792 IP .4321 > .35790: P 1:2(1) ack 2 win 256 <nop,nop,timestamp 3915671493 3915671468>
 .865809 IP .35790 > .4321: R 1062313176:1062313176(0) win 0

The first three lines represent the 3-way handshake. The fourth line is the FIN packet the client sends to the server, and the fifth line is the ACK from the server, acknowledging receipt. The sixth line is the server trying to send 1 byte of data to the client with the PUSH flag set. The final line is the client RESET packet, which causes the TCP state for the connection to be freed, and is why the third netstat command did not result in any output in the test above.

So, the server doesn't know the client will reset the connection until after it tries to send some data to it. The reason for the reset is because the client called close, instead of something else.

The server cannot know for certain what system call the client has actually issued, it can only follow the TCP state. For example, we could replace the close call with a call to shutdown instead.

//close(c_sock);
shutdown(c_sock, SHUT_WR);

The difference between shutdown and close is that shutdown only governs the state of the connection, while close also governs the state of the file descriptor that represents the socket. A shutdown will not close a socket.

The output will be different with the shutdown change:

P|Recv-Q|Send-Q|Local Address|Foreign Address|State|
before close on client
tcp|0|0|127.0.0.1:4321|127.0.0.1:56355|ESTABLISHED|
tcp|0|0|127.0.0.1:56355|127.0.0.1:4321|ESTABLISHED|
after close on client
tcp|1|0|127.0.0.1:4321|127.0.0.1:56355|CLOSE_WAIT|
tcp|0|0|127.0.0.1:56355|127.0.0.1:4321|FIN_WAIT2|
after send on server
tcp|1|0|127.0.0.1:4321|127.0.0.1:56355|CLOSE_WAIT|
tcp|1|0|127.0.0.1:56355|127.0.0.1:4321|FIN_WAIT2|
end of test

The TCP dump will show also show something different:

17:09:18 127.0.0.1 > 127.0.0.1
 .722520 IP .56355 > .4321: S 2558095134:2558095134(0) win 32792 <mss 16396,sackOK,timestamp 3917101399 0,nop,wscale 7>
 .722594 IP .4321 > .56355: S 2563862019:2563862019(0) ack 2558095135 win 32768 <mss 16396,sackOK,timestamp 3917101399 3917101399,nop,wscale 7>
 .722615 IP .56355 > .4321: . ack 1 win 257 <nop,nop,timestamp 3917101399 3917101399>
 .748838 IP .56355 > .4321: F 1:1(0) ack 1 win 257 <nop,nop,timestamp 3917101425 3917101399>
 .748956 IP .4321 > .56355: . ack 2 win 256 <nop,nop,timestamp 3917101426 3917101425>
 .764894 IP .4321 > .56355: P 1:2(1) ack 2 win 256 <nop,nop,timestamp 3917101442 3917101425>
 .764903 IP .56355 > .4321: . ack 2 win 257 <nop,nop,timestamp 3917101442 3917101442>
17:09:23
 .786921 IP .56355 > .4321: R 2:2(0) ack 2 win 257 <nop,nop,timestamp 3917106464 3917101442>

Notice the reset at the end comes 5 seconds after the last ACK packet. This reset is due to the program shutting down without properly closing the sockets. It is the ACK packet from the client to the server before the reset that is different than before. This is the indication that the client did not use close. In TCP, the FIN indication is really an indication that there is no more data to be sent. But since a TCP connection is bi-directional, the server that receives the FIN assumes the client can still receive data. In the case above, the client in fact does accept the data.

Whether the client uses close or SHUT_WR to issue a FIN, in either case you can detect the arrival of the FIN by polling on the server socket for a readable event. If after calling read the result is 0, then you know the FIN has arrived, and you can do what you wish with that information.

struct pollfd s_pfd = { s_sock, POLLIN|POLLOUT, 0 };
if (poll(&s_pfd, 1, -1) != 1) perror("poll");
if (s_pfd.revents|POLLIN) {
    char c;
    int r;
    while ((r = recv(s_sock, &c, 1, MSG_DONTWAIT)) == 1) {}
    if (r == 0) { /*...FIN received...*/ }
    else if (errno == EAGAIN) { /*...no more data to read for now...*/ }
    else { /*...some other error...*/ perror("recv"); }
}

Now, it is trivially true that if the server issues SHUT_WR with shutdown before it tries to do a write, it will in fact get the EPIPE error.

shutdown(s_sock, SHUT_WR);
if (send(s_sock, "a", 1, MSG_NOSIGNAL) < 0) perror("send");

If, instead, you want the client to indicate an immediate reset to the server, you can force that to happen on most TCP stacks by enabling the linger option, with a linger timeout of 0 prior to calling close.

struct linger lo = { 1, 0 };
setsockopt(c_sock, SOL_SOCKET, SO_LINGER, &lo, sizeof(lo));
close(c_sock);

With the above change, the output of the program becomes:

P|Recv-Q|Send-Q|Local Address|Foreign Address|State|
before close on client
tcp|0|0|127.0.0.1:35043|127.0.0.1:4321|ESTABLISHED|
tcp|0|0|127.0.0.1:4321|127.0.0.1:35043|ESTABLISHED|
after close on client
send: Connection reset by peer
after send on server
end of test

The send gets an immediate error in this case, but it is not EPIPE, it is ECONNRESET. The TCP dump reflects this as well:

17:44:21 127.0.0.1 > 127.0.0.1
 .662163 IP .35043 > .4321: S 498617888:498617888(0) win 32792 <mss 16396,sackOK,timestamp 3919204411 0,nop,wscale 7>
 .662176 IP .4321 > .35043: S 497680435:497680435(0) ack 498617889 win 32768 <mss 16396,sackOK,timestamp 3919204411 3919204411,nop,wscale 7>
 .662184 IP .35043 > .4321: . ack 1 win 257 <nop,nop,timestamp 3919204411 3919204411>
 .691207 IP .35043 > .4321: R 1:1(0) ack 1 win 257 <nop,nop,timestamp 3919204440 3919204411>

The RESET packet comes right after the 3-way handshake completes. However, using this option has its dangers. If the other end has unread data in the socket buffer when the RESET arrives, that data will be purged, causing the data to be lost. Forcing a RESET to be sent is usually used in request/response style protocols. The sender of the request can know there can be no data lost when it receives the entire response to its request. Then, it is safe for the request sender to force a RESET to be sent on the connection.

Solution 2

You have two sockets - one for the client and another for the server. Now your client is doing the active close.This means TCP's conection termination has been started by the client ( A tcp FIN segment has been sent from the client send).

At this stage you see the client socket in FIN_WAIT1 state. Now what is the state of the server socket now? It is in CLOSE_WAIT state.So the server socket is not closed.

The FIN from the server has not been sent yet. (Why - since the application has not closed the socket). At this stage you are writing over the server socket so you are not getting an error.

Now if you want to see the error just write close(client_fd) before writing over the socket.

close(client_fd);
printf( "Write result: %d\n", write( client_fd, "123", 3 ) );

Here the server socket is no more in CLOSE_WAIT state so you can see return value of write is -ve to indicate the error. I hope this clarifies.

Solution 3

After having called write() one (first) time (as coded in your example) after the client close()ed the socket, you'll be getting the expected EPIPE and SIGPIPE on any successive call to write().

Just try adding another write() to provoke the error:

...
printf( "Errno before: %s\n", strerror( errno ) );
printf( "Write result: %d\n", write( client_fd, "123", 3 ) );
printf( "Errno after:  %s\n", strerror( errno ) );

printf( "Errno before: %s\n", strerror( errno ) );
printf( "Write result: %d\n", write( client_fd, "A", 1 ) );
printf( "Errno after:  %s\n", strerror( errno ) );
...

The output will be:

Accepting
Server sleeping
Client closing its fd... Client exiting.
Errno before: Success
Write result: 3
Errno after:  Success
Errno before: Success
Client status is 0, server status is 13

The output of the last two printf()s is missing as the process terminates due to SIGPIPE being raised by the second call to write(). To avoid the termination of the process, you might like to make the process ignore SIGPIPE.

27,333

regularfry

Updated on July 16, 2022

Comments

regularfry almost 2 years

I seem to be having a problem with my sockets. Below, you will see some code which forks a server and a client. The server opens a TCP socket, and the client connects to it and then closes it. Sleeps are used to coordinate the timing. After the client-side close(), the server tries to write() to its own end of the TCP connection. According to the write(2) man page, this should give me a SIGPIPE and an EPIPE errno. However, I don't see this. From the server's point of view, the write to a local, closed socket succeeds, and absent the EPIPE I can't see how the server should be detecting that the client has closed the socket.

In the gap between the client closing its end and the server attempting to write, a call to netstat will show that the connection is in a CLOSE_WAIT/FIN_WAIT2 state, so the server end should definitely be able to reject the write.

For reference, I'm on Debian Squeeze, uname -r is 2.6.39-bpo.2-amd64.

What's going on here?

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/socket.h>
#include <sys/select.h>
#include <netinet/tcp.h>
#include <errno.h>
#include <string.h>
#include <stdlib.h>
#include <fcntl.h>

#include <netdb.h>

#define SERVER_ADDRESS "127.0.0.7"
#define SERVER_PORT 4777


#define myfail_if( test, msg ) do { if((test)){ fprintf(stderr, msg "\n"); exit(1); } } while (0)
#define myfail_unless( test, msg ) myfail_if( !(test), msg )

int connect_client( char *addr, int actual_port )
{
    int client_fd;

    struct addrinfo hint;
    struct addrinfo *ailist, *aip;


    memset( &hint, '\0', sizeof( struct addrinfo ) );
    hint.ai_socktype = SOCK_STREAM;

    myfail_if( getaddrinfo( addr, NULL, &hint, &ailist ) != 0, "getaddrinfo failed." );

    int connected = 0;
    for( aip = ailist; aip; aip = aip->ai_next ) {
        ((struct sockaddr_in *)aip->ai_addr)->sin_port = htons( actual_port );
        client_fd = socket( aip->ai_family, aip->ai_socktype, aip->ai_protocol );

        if( client_fd == -1) { continue; }
        if( connect( client_fd, aip->ai_addr, aip->ai_addrlen) == 0 ) {
            connected = 1;
            break;
        }
        close( client_fd );
    }

    freeaddrinfo( ailist );

    myfail_unless( connected, "Didn't connect." );
    return client_fd;
}


void client(){
    sleep(1);
    int client_fd = connect_client( SERVER_ADDRESS, SERVER_PORT );

    printf("Client closing its fd... ");
    myfail_unless( 0 == close( client_fd ), "close failed" );
    fprintf(stdout, "Client exiting.\n");
    exit(0);
}


int init_server( struct sockaddr * saddr, socklen_t saddr_len )
{
    int sock_fd;

    sock_fd = socket( saddr->sa_family, SOCK_STREAM, 0 );
    if ( sock_fd < 0 ){
        return sock_fd;
    }

    myfail_unless( bind( sock_fd, saddr, saddr_len ) == 0, "Failed to bind." );
    return sock_fd;
}

int start_server( const char * addr, int port )
{
    struct addrinfo *ailist, *aip;
    struct addrinfo hint;
    int sock_fd;

    memset( &hint, '\0', sizeof( struct addrinfo ) );
    hint.ai_socktype = SOCK_STREAM;
    myfail_if( getaddrinfo( addr, NULL, &hint, &ailist ) != 0, "getaddrinfo failed." );

    for( aip = ailist; aip; aip = aip->ai_next ){
        ((struct sockaddr_in *)aip->ai_addr)->sin_port = htons( port );
        sock_fd = init_server( aip->ai_addr, aip->ai_addrlen );
        if ( sock_fd > 0 ){
            break;
        } 
    }
    freeaddrinfo( aip );

    myfail_unless( listen( sock_fd, 2 ) == 0, "Failed to listen" );
    return sock_fd;
}


int server_accept( int server_fd )
{
    printf("Accepting\n");
    int client_fd = accept( server_fd, NULL, NULL );
    myfail_unless( client_fd > 0, "Failed to accept" );
    return client_fd;
}


void server() {
    int server_fd = start_server(SERVER_ADDRESS, SERVER_PORT);
    int client_fd = server_accept( server_fd );

    printf("Server sleeping\n");
    sleep(60);

    printf( "Errno before: %s\n", strerror( errno ) );
    printf( "Write result: %d\n", write( client_fd, "123", 3 ) );
    printf( "Errno after:  %s\n", strerror( errno ) );

    close( client_fd );
}


int main(void){
    pid_t clientpid;
    pid_t serverpid;

    clientpid = fork();

    if ( clientpid == 0 ) {
        client();
    } else {
        serverpid = fork();

        if ( serverpid == 0 ) {
            server();
        }
        else {
            int clientstatus;
            int serverstatus;

            waitpid( clientpid, &clientstatus, 0 );
            waitpid( serverpid, &serverstatus, 0 );

            printf( "Client status is %d, server status is %d\n", 
                    clientstatus, serverstatus );
        }
    }

    return 0;
}

Per Johansson almost 12 years

You're not setting ai_family = AF_INET, but you assume you get a sockaddr_in returned. That's likely to break sometime in the future.
ChrisH almost 12 years

At the risk of not answering the question, why are you relying on a write() to see if the connection is closed? Have you looked at select() and/or poll()? By blocking on accept() you're always just accepting the first connect to your port, whether it is the connection you want or not.
alk almost 12 years

Have you tried a shutdown() on the client side socket prior to calling close()?
regularfry almost 12 years

@ChrisH: I get a similar result with select(), closing the socket from the client end is invisible to the server. It doesn't cause select() to return the fd in any of the three states.
Per Johansson almost 12 years

@regularfry the socket should be returned in the read set from select, since it gets EOF on read.

regularfry almost 12 years

Well... yes. That's the problem. The way I should be detecting the state of the TCP session is by making the write call and catching the failure. It's precisely that which isn't working.
Eric Y almost 12 years

@regularfry Until you call close(client_fd) the socket will still be valid server side. There should be some kind of TCP timeout you can check for after the write attempt on an open socket.
Per Johansson almost 12 years

This is the correct answer, but I think you could be a bit more clear. The client side can't close down writing on the server side, only reading. The client won't be able to read anything the server writes though.Also, after you call close on the server side, the fd will be invalid. If you're using threads another thread might have claimed it. Calling write after close is a really bad idea.
Tanmoy Bandyopadhyay almost 12 years

Thanks Per, for your observation. I have written the close(client_fd) on the server, just to explain the cause of not getting an error.And yes in this case, to make the client read something, instead of close we need to use shutdown (with SHUT_WR) which will allow the client to read.
regularfry almost 12 years

No, if I add a second read that one fails as expected. And yes, I do get a return value of 3 from the first.
regularfry almost 12 years

@Tanmoy: all that makes sense, except for the fact that if I add a second write after the first, I get the error I expect, and as described in the man page. If what you're describing were the whole story, there'd be no difference between the first and second write(). You say I should call close() before write() to see the error on the server, but I can't see how I'm supposed to know when to do that if not via a failing write().
regularfry almost 12 years

Yes. Why is this necessary? How can I get the error on the first write? The protocol I'm dealing with doesn't really have any room for spurious data in it, and the server side should have enough information to make the first write fail. Besides, just using a second write merely pushes the problem back a stage - what happens when the client disconnects between the first and second writes?
Per Johansson almost 12 years

@regularfry Likely the kernel has some timeout on how long it waits for the other side to shutdown. After that timeout it'll start sending "connection reset" replies which will then make write fail. I.e first write is ok but gets connection reset reply, that's why the second write fails.
Tanmoy Bandyopadhyay almost 12 years

@regularfry, Can you pl. see if you can add the following lines of code before your writes, int tcp_info_length; struct tcp_info tcp_info; tcp_info_length = sizeof(tcp_info) ;getsockopt(client_fd, SOL_TCP, TCP_INFO, (void *)&tcp_info, (sockl en_t *)&tcp_info_length ); The member tcp_info.tcpi_state is having the current TCP state. I am getting the value 8(CLOSE_WAIT) a 7(TCP_CLOSE) before/after the first write respectively.Can you use this information to handle this case. Pl. see the header file tcp.h in netinet folder for all the values of TCP states.
regularfry almost 12 years

That matches what I see. It does seem odd that it should be necessary to do a getsockopt on every write just to be sure the socket wasn't closed, though.
R.. GitHub STOP HELPING ICE almost 12 years

You really should change that close before write to shutdown or something. As Per Johansson said, the code as written is incorrect to show the issue (it's just giving EBADF) and could be dangerous in a multi-threaded program.
sha over 6 years

what an answer. Simply superb!
softwarevamp over 5 years

How do i know the read end of socket is closed if connects to a remote server?
jxh over 5 years

@softwarevamp The answer merely illustrates with a program that connects on the same machine. Try the experiment yourself with a remote server and see my answer still holds.
mrdecemberist almost 3 years

Such a great answer. Never knew that's what the linger option did, makes a world of sense now.