Thursday, October 20, 2011

How to forcibly close a socket in TIME_WAIT?

SkyHi @ Thursday, October 20, 2011

Let me elaborate. Transmission Control Protocol (TCP) is designed to be a bidirectional, ordered, and reliable data transmission protocol between two end points (programs). In this context, the term reliable means that it will retransmit the packets if it gets lost in the middle. TCP guarantees the reliability by sending back Acknowledgment (ACK) packets back for a single or a range of packets received from the peer.
This goes same for the control signals such as termination request/response. RFC 793 defines the TIME-WAIT state to be as follows:
TIME-WAIT - represents waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request.
See the following TCP state diagram: alt text
TCP is a bidirectional communication protocol, so when the connection is established, there is not a difference between the client and the server. Also, either one can call quits, and both peers needs to agree on closing to fully close an established TCP connection.
Let's call the first one to call the quits as the active closer, and the other peer the passive closer. When the active closer sends FIN, the state goes to FIN-WAIT-1. Then it receives an ACK for the sent FIN and the state goes to FIN-WAIT-2. Once it receives FIN also from the passive closer, the active closer sends the ACK to the FIN and the state goes to TIME-WAIT. In case the passive closer did not received the ACK to the second FIN, it will retransmit the FIN packet.
RFC 793 sets the TIME-OUT to be twice the Maximum Segment Lifetime, or 2MSL. Since MSL, the maximum time a packet can wander around Internet, is set to 2 minutes, 2MSL is 4 minutes. Since there is no ACK to an ACK, the active closer can't do anything but to wait 4 minutes if it adheres to the TCP/IP protocol correctly, just in case the passive sender has not received the ACK to its FIN (theoretically).
In reality, missing packets are probably rare, and very rare if it's all happening within the LAN or within a single machine.
To answer the question verbatim, How to forcibly close a socket in TIME_WAIT?, I will still stick to my original answer:
/etc/init.d/networking restart
Practically speaking, I would program it so it ignores TIME-WAIT state using SO_REUSEADDR option as WMR mentioned. What exactly does SO_REUSEADDR do?
This socket option tells the kernel that even if this port is busy (in
the TIME_WAIT state), go ahead and reuse it anyway. If it is busy, but with another state, you will still get an address already in use error. It is useful if your server has been shut down, and then restarted right away while sockets are still active on its port. You should be aware that if any unexpected data comes in, it may confuse your server, but while this is possible, it is not likely.





As far as I know there is no way to forcibly close the socket outside of writing a better signal handler into your program, but there is a /proc file which controls how long the timeout takes. The file is

/proc/sys/net/ipv4/tcp_tw_recycle
and you can set the timeout to 1 second by doing this:

echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
However, this page contains a warning about possible reliability issues when setting this variable.

There is also a related file

/proc/sys/net/ipv4/tcp_tw_reuse
which controls whether TIME_WAIT sockets can be reused (presumably without any timeout).

Incidentally, the kernel documentation warns you not to change either of these values without 'advice/requests of technical experts'. Which I am not.

The program must have been written to attempt a binding to port 49200 and then increment by 1 if the port is already in use. Therefore, if you have control of the source code, you could change this behaviour to wait a few seconds and try again on the same port, instead of incrementing.

REFERENCES
http://stackoverflow.com/questions/41602/how-to-forcibly-close-a-socket-in-time-wait