Introduction to TCP Connection Establishment for Software Developers
Last updated on 27 November 2021
Transmission Control Protocol (TCP) provides a reliable, connection-oriented, byte-stream, transport layer service. And its implementation is quite interesting.
In this article, we'll explore how the TCP connection establishment works, how it ensures reliability by maintaining a connection state and if it fits every use-case.
But before getting into the internals of the handshake, let's have a look at TCP.
A TCP connection is defined to be a 4-tuple consisting of two IP addresses and two port numbers. Each IP address - port number pair represents an endpoint.
This means a single server can connect to many clients if their IP address and/or port number is unique.
Where TCP fits in?
Transmission Control Protocol (TCP) is one of the transport layer protocols available to us and it is widely used, for good reasons.
To understand why it is even needed, let's take a look at the protocol stack in the TCP/IP model:
The HTTP request coming from the application layer (eg. your browser) goes through all the layers to get sent across the internet. Internet layer handles sending out the little chunks of data that are also known as IP datagrams. The datagrams act as an envelope for the TCP segments and the job of the IP layer is to send them across the internet.
Since the IP layer is not aware of the TCP connection, two packets corresponding to the same connection often get sent over different routes. This makes the data transfer over the internet unreliable and gives rise to various issues like duplicate packets, out-of-order packets, packet loss, etc.
TCP provides handling for all these scenarios and provides a guaranteed, loss-less, in-order delivery of packets at the receiving end.
TCP Header & IP datagram
TCP segment carries all the meta information about the connection in a header. Basic TCP header is 20 bytes (w/o options), this means 20 bytes of data overhead for any packet to travel. Let's understand what constitutes a TCP header:
Source and Destination port.
Sequence Number: This identifies the first byte in the segment sent to receiving TCP.
Acknowledgement Number: This contains the next Sequence Number that the sender of the acknowledgement expects to receive i.e.,1Acknowledgement Number = Sequence Number + 1
Window Size: This is the number of bytes that the receiving TCP is willing to receive. It is a 16 bit field, limiting the window size of 65,535 bytes. We use Window Scaling as work-around for this bottleneck.
TCP Checksum: This is mandatorily sent by the sending TCP and verified by the receiving end in order to detect data corruption.
Urgent Pointer: This mechanism in TCP is used to send some specific & urgent data to the other end. It is valid only if the URG field is set.
Other Bit Fields: Majorly two fields are used during the connection establishment process:
- SYN: This bit is turned on in the first segment, at the start of connection establishment phase.
- ACK: It is used when acknowledgement needs to be sent out. Always on except for the first & last (connection teardown) packet.
All this information about the connection is stored in the TCP header. Combining this header with the application data gives us the TCP segment:
But till this stage, we are only aware of source and destination ports. We also need the source and destination IP addresses in order to uniquely identify a TCP connection (remember?). That happens in the next layer (i.e., IP layer) during transmission.
IP layer simply adds its own header on top of the TCP segment it receives making it an "IP Datagram". These headers are progressively stripped off at the receiving end, in reverse order.
So, TCP and IP layer collectively make up a unique TCP connection. And, we get "TCP/IP Protocol Suite" ⚡.
Connection establishment is started by an active opener (usually the client) who wants to connect to a passive opener (usually a server) and a total of three TCP segments are transferred during the process. The goal of this exercise is to let each end of the connection know that a connection is starting, share some important configurations (aka TCP options) and exchange the Initial Sequence Number (ISN).
Let's take a look at each step more closely:
[Segment 1]: The client sends a SYN segment
The first TCP segment sent by the active opener (or client) contains the following:
- Server's port stored in Destination Port
- SYN bit set in the TCP Flags
- ISN of the client stored in Sequence Number
- Some configuration options stored in TCP options (we'll tackle them next)
[Segment 2]: The server responds with a SYN-ACK segment
The server sends its own SYN segment. It also acknowledges the segment received from the client. It sends a segment with:
- SYN bit turned on
- Sequence Number = ISN(server)
- ACK bit turned on (to acknowledge the segment received from the peer)
- Acknowledgement Number = ISN(client) + 1
[Segment 3]: The client sends a final ACK segment
Finally, the client acknowledges the SYN received from the server with an ACK. Essentially:
- It sets the ACK bit to acknowledge the server's SYN segment
- Sequence Number = ISN(client) + 1
- Acknowledgement Number = ISN(server) + 1
If SYN segment is lost, it is retransmitted until an ACK for it is received.
There are some additional configuration settings that help in an efficient flow of data in a TCP connection. Some of these options can only be set once during the connection establishment process while others can be used at any point in time during the connection lifespan.
Let's take a look at some of most commonly used TCP options ⚡
Maximum Segment Size (MSS)
It is the largest segment that a TCP is willing to receive from its peer and, consequently, the largest size its peer should ever use when sending.
The important thing to note here is that MSS only counts the application data and not the TCP & IP headers. Maximum Transferable Unit (MTU), on the other hand, looks at the whole packet including the TCP & IP headers.
MSS and consequently MTU's size is configurable but it has to be under the maximum size capability of the Ethernet Frame that carries those packets. MTU size can be set greater the Frame's capability but then the packet would need to go through fragmentation to get delivered.
Window size tells the peer in the connection how much receive buffer it has allocated or left for that particular connection. This option is set during the connection establishment phase and cannot be changed during the connection lifetime.
The window size in the TCP header is of 16 bits which makes the max value to be 65536 bytes (2^16) only. In cases of high latency networks, having a window size of 64KB can make the round trips bigger & Round Trip Time (RTT) longer.
Window Scaling TCP option is a 14 bit field which left shifts the Window Size value making it significantly larger with max upto 1GB (65,535 bytes * 2^14). This is most useful when working with a high latency large bandwidth situation.
Let's understand with the help of an illustration here:
Here the maximum size of packets that sender can send before receiving any acknowledgement is 64KB. We can observe that the sender is idle after it sends the maximum possible bytes of data and is waiting for an acknowledgement so that it can send more data.
Now let's look at the packet transmission after window scaling is introduced:
After the window scaling is set, the sender is able to send twice the amount of data and it makes the idle time less. Thus, a better utilization.
Likewise, using bigger window scaling factor will further increase the effective window size. Bigger the window size, more data sending TCP can send without receiving any acknowledgements.
Selective Acknowledgements (SACK)
Packets sent over the network often get lost resulting in sudden jumps in acknowledgement numbers and it makes the byte-stream non-continuous. This creates “holes” in the received data and sending TCP doesn’t know which packets needs retransmission.
With SACK supported at both ends (negotiated during connection establishment), a receiver is able to communicate the packets it received after the gap. Two fields help in figuring out the missing packets:
- Acknowledgement Number set to the last packet offset it received before the gap.
- A SACK block in the TCP options containing the block of data it received after the gap.
The sending TCP takes the offset difference between the first packet after the gap and the last packet before the gap. This makes it easy for the sending TCP to recognize what block of data it needs to retransmit.
So for example, if a receiving TCP sends an (duplicate) acknowledgement of 1000 and the SACK block contains a range of 1100-1500, it is clear that the sending TCP needs to retransmit only the packets from 1000 to 1100.
This helps TCP to pad fields to a multiple of 4 bytes when the actual data doesn't follow the size constraints.
It indicates the end of the options list and shows that no further processing of the options list is required.
Why we need a three-way handshake?
If we closely observe what happens in the handshake process, the two parties are essentially keeping track of an offset value (sequence number) which they use to send & receive data. Both ends essentially maintain a connection state.
Keeping track of an offset value allows both parties in the connection to determine if there're any issues with the packets being transmitted & received. It helps in determining duplicate packets, correcting out-of-order packets and retransmit in case of packet loss.
These issues occur because of how IP layer works. It has no context of a "TCP connection". At the level of the router, it sends packets based on some path computations which means that two packets corresponding to the same TCP connection may go through a different route to reach the same destination. This is why TCP has to handle for those scenarios.
Here's an example of mild packet re-ordering 🔽
Correcting out-of-order packets
As packets corresponding to the same TCP connection often travel over different routes, they reach the receiving TCP out-of-order. Since TCP guarantees in-order delivery, it stores the out-of-order packets in its receiver buffer and waits for the missing packets to fill the "holes" in the byte stream.
Here, the receiving TCP received packet four (P4) before it could receive packet three (P3). As a result, it keeps the P4 in its receive buffer & sends a (duplicate) acknowledgement of last packet received and waits for P3 to come in. Once P3 arrives, it sends the acknowledgement corresponding to the last packet it has successfully received, which is P4.
Thus, Sequence Number plays a crucial role in keeping track of lost packets, out-of-order packets and even duplicate packets. This brings the need of a connection establishment and maintaining the connection state at both ends.
Why choose UDP over TCP?
There are several cases where UDP is preferred over TCP. Some factors that contribute to it are:
- A service can't afford the overhead of TCP handshakes or the handshake cost is fairly significant relative to the actual data being sent.
- Occasional packet loss is acceptable (depends on the use-case).
Some examples where UDP is preferred over TCP are - multiplayer games, weather data, video streaming, etc.
We've learned about the need for a TCP connection and how that helps, with various configuration options to suit a wide range of requirements. It's fascinating to see how much abstraction TCP provides. A developer working on the application level never has to think about it.
If you want to explore more, a good starting point would be Chris Greer's playlist on TCP.
If you wanna dive deep into TCP/IP, I'd highly recommend the TCP/IP Illustrated Vol. 1 book. It covers the topic in great depth.
Time to close this connection now 😉
For any feedback and queries, feel free to comment down or reach out on Twitter.