TL;DR

  • TCP Fast Open (TFO) allows clients to send data in the initial SYN request, without waiting for a full handshake to occur. This removes an entire round trip almost transparently from the application.
  • Cookies are included in the TCP options header.
  • It’s available in Linux 3.7+, nginx and HAProxy has support already. Client support is lacking with only Chrome/Chromium on Linux, ChromeOS and Android 5.0 (Lollipop), and only if enabled manually.
  • There’s some potential issues with delivering duplicate data to the server’s application, which wouldn’t occur in normal TCP, although it is unlikely, some applications may not be compatible.
    • It’s perfect for static content (CDNs),
    • Every website which uses TLS (it’ll reduce that handshake time).

Overview

Standard TCP requires the client and server to establish a three way handshake (3WHS) before data can be delivered to the server’s listening application. This introduces latency of one round trip before the server’s application receives the data, and a total of two round trips by the time the server can respond. This can be significant for latency sensitive applications such as HTTP.

TCP Fast Open (TFO), defined by RFC 7413, still requires an initial 3WHS to be established before data can be sent for the first time. During this handshakes first SYN packet, the client can request a TFO cookie, and a compatible server responds with a cookie in its SYN-ACK response.

Once the client receives the SYN-ACK response it caches the cookie, responds with the application data (such as HTTP GET, or if it’s TLS, ClientHello) as per normal. At least two round trips were required, one for the connection setup (initial SYN and SYN-ACK) and the other for the application request and response.

But now the client has a TFO cookie, a new connection’s SYN packet sent to the same server now also includes the application data. The server validates the cookie and responds to the client with a SYN-ACK whilst also immediately sending the data to the server’s listening application for a response. Reducing the total round trips to one.

Support

Servers

  • Linux 3.7+ e.g. Red Hat Enterprise Linux (RHEL/CentOS) 7, Ubuntu 14.04 LTS
  • nginx 1.5.8
    • nginx provided RPMs do not have support, other packages may have support
  • HAProxy 1.5

Clients

  • Chrome/Chromium on Linux, Chrome OS or Android (not sure if Chrome flags has it disabled, and if not, whether users need Linux 3.13 to have it enabled by default)
  • Linux 3.6+ e.g. Red Hat Enterprise Linux (RHEL/CentOS) 7, Fedora 18, Amazon Linux AMI 2014.03, Ubuntu 13.04
  • Chrome OS
  • Android Lollipop (5.0)

Websites

  • Google’s assets such as Google Search, YouTube, Blogger, DoubleClick etc

Detailed Process

Sending TFO Request

During the initial TCP connection, a client requests the kernel create a TCP connection with the TFO options enabled. In Linux, a client normally uses the connect() and write() system calls to create and send on a TCP connection. But to use TFO, these calls need to be combined - to allow application data to also be sent during the initial connection phase. Therefore, not only does the TCP stack the application is running on need to support TFO, but clients must also be written to support this.

API changes required to the application are more fully described in LWN’s article TCP Fast Open: expediting web services.

The SYN packet generated by the clients TCP stack will set the TCP Option 34 which corresponds to TCP Fast Open Cookie. Because no TFO connections have been established, the cookie is empty - indicating to the server the client supports TFO and would like a cookie. There is also no data within this SYN packet.

A SYN packet’s maximum TCP options length is 40 bytes, a Linux 3.18 kernel currently uses 20 (MSS 4 bytes, TCP SACK 2 bytes, Timestamps 10 bytes, NOP 1 byte and Window Scale 3 bytes). Adding TFO cookie increases this to 32. The TFO RFC states if the SYN packet does not have enough space to fit the TFO options, disable TFO for this connection. Note, it doesn’t state this for SYN-ACK replies, but one would assume this would also be true.

Receiving a TFO Request

During transit, some middleware may not understand this TCP option and instead drop the packet. In which case, the client should retransmit the SYN packet without a TFO option - i.e. attempt a standard TCP connection instead. The client should also cache the negative response, so future connections will not need to wait for a timeout to this destination, and instead try a normal TCP connection the first time.

The server application, much like the clients, must also be written to support TFO. In the server’s case, it must set the TFO_FASTOPEN socket option via the setsockopt() system call and provide a maximum queue length (discussed later in DoS Amplification Attacks).

An example client and server, using system calls in Go can be found github.com/bradleyfalzon/tcp-fast-open

When the server’s TCP stack receives the SYN packet, if it does not support TFO it will ignore the TFO option and establish a normal TCP connection. If the server supports TFO, and the listening socket has requested TFO support, it will generate a message authentication code (MAC) of the client’s IP address (Linux also includes the server’s IP address) using the server’s secret key.

If the received SYN packet does not contain a cookie, but contains TFO options, the generated cookie is sent to the client in the server’s SYN-ACK packet, using the same TFO option space and a normal TCP connection is established without any round trip savings.

Receiving a TFO Request Response

When the client receives the SYN-ACK packet with the TFO cookie set, it caches this cookie, the server’s IP address, and the server’s advertised maximum segment size (MSS). This cookie can now be used by the client when it tries to connect to the same IP address with client TFO support enabled (note, the cookie is bound to the server’s IP and doesn’t include the server’s destination port number).

Sending TFO+Data

When a client tries to connect to the same server again with TFO socket options set, the initial TCP SYN packet will be generated with the previously cached TFO cookie and application data up to the MSS size. If the application data exceeds the MSS size, additional data must wait to be sent until after the 3WHS is established - negating the use of TFO.

The default MSS for IPv4 is only 536 bytes (IPv6 is 1220 bytes), where as the typical MSS is closer to 1460 bytes, which is why it’s important to cache the MSS - to try and fit all the data in the initial request. So, keep request sizes within ~1460 bytes by reducing HTTP Cookie sizes.

Receiving TFO+Data

On the server’s side, the received SYN packet is checked for TFO cookies, and if so, the server generates a cookie. The generated cookie is compared to the cookie in the SYN packet. If the comparison is successful, the server replies with a SYN-ACK of the SYN + data and sends the data to the listening application. If the comparison fails, the server replies with a SYN-ACK only acknowledging the SYN (not data) and includes the generated TFO cookie for the client to use next time.

Receiving a TFO+Data Response

When the client receives the SYN-ACK packet, it checks if the data was acknowledged, if it was it ACKs the SYN-ACK and waits for further responses (if applicable) from the server.

If the data was not acknowledged in the received SYN-ACK packet, the client still ACKs the server’s SYN-ACK, but also sends another packet with the application data (as it would a normal TCP connection). If the received SYN-ACK packet contains a TFO cookie the client will cache it for use next time.

This system provides the following properties:

  • Connections benefit from TFO only after an initial TCP connection is established
  • Client application and kernel support is required, as well as server application and server kernel support
  • Clients can handle most bad middleware by retransmitting with TFO
  • Clients that suppport TFO that connect to servers that do not is handled gracefully by simply ignoring the TFO cookie request

Potential Issues

  • Applications that receive the initial SYN data must be tolerant of duplication, in many cases the application is TLS which is tolerant of duplicate data. Other applications that are not technically tolerant may simply accept the risk if the impact is low (e.g. the risk of a forum website double posting may be acceptable, but purchasing all items in the cart twice might not be).
  • If a client needs to send more than the cached MSS, TFO is unavailable. For example, my PayPal cookies are currently 3298 bytes, well above the 1460 MSS currently advertised.

Duplicate Data & Idempotency

Applications with TFO enabled sockets must be able to handle receiving duplicate data due to retransmitted SYN packets. This is only an issue for the data within the initial SYN packet. Consecutive packets sent after the initial TCP handshake (and non TFO TCP connections) detect duplicate data and drop the extra packets without delivery to the application.

Duplicate data is possible by at least two methods outlined by the RFC, and potentially and third discussed later.

The first condition, where a server “forgets” it received the initial SYN packet (e.g. due to a server reboot), has been discussed and two work arounds proposed:

  1. Delay enabling TFO by a few minutes
  2. Regenerate a new server key upon reboot, either randomly or based on a boot ID.

Solution 1 is a reasonable solution, with the obvious drawback of not having TFO enabled immediately and potentially(? - need to check) causing a client to clear its TFO cookie cache. Delayed start could be managed by user land tools, by simply disabling TFO by default and creating a service to enable TFO 5 minutes after server start (reboot).

Solution 2 is already implemented in the Linux as the server key is randomly generated each reboot (earlier versions before Linux 3.13 generated a new TFO key on boot, newer versions only once a socket sets the relevant TFO socket option). However, the use of TFO in some load balanced topologies, such as Direct Server Return (DSR), requires the servers to share the same TFO key, thereby allowing a client to reuse the same cached cookie on any server in the farm. By randomly generating the server key each reboot, all servers will need to change their key to the new key, which could reduce the effectiveness by needlessly increasing the amount of key rotation - which could be significant on larger farms.

The second and following conditions, have no obvious solutions provided by TFO itself, which means either TFO is incompatible with the application or, as for the third condition, a change must be made to the higher level architectures to support TFO.

A third condition which can cause duplicate data, that wasn’t obviously mentioned in the RFC, is a server farm where a retransmitted SYN packet arrives at a different server before the client receives acknowledgement (SYN-ACK) of the first packet. This scenario requires the SYN-ACK packet to be dropped or delayed and requires the load balanced topology to deliver the second SYN packet to a different server (either because no persistence is configured or because the server had been removed from the load balanced farm). Just like non TFO TCP connection, the server acknowledges the first SYN packet before it sends the application’s response, therefore, there’s no risk of slow application processing cause a retransmit.

A further note, one of the original developers of TFO states “today the client may send a non-idempotent request twice already with standard TCP.” - although it might be possible already, it’s not clear how much TFO increases this risk.

DoS & Amplification Attacks

There is a risk of amplification attacks, as described in more detail in the RFC. If an attacker can steal a valid cookie for the victim’s IP address, the attacker can generate a small request which may solicit a very large response sent directly to the victim.

The RFC further describes protections by requiring the server to automatically disable TFO (for that socket) in the event the TFO queue length exceeds an application defined max queue length. The max queue length defines the maximum number of outstanding unacknowledged SYN-ACK packets. Server applications are required to set this on listening sockets via the setsockopt system call. Exceeding the max queue length value will cause new connections to ignore TFO cookies and revert to a standard TCP handshake. The number of ignored TFO cookies can be monitored via the TCPFastOpenListenOverflow counter, it is not logged. See tcp_fastopen_queue_check in Linux 3.18.

Because the attacker’s requests are received by the application, standard DoS detection mechanisms can easily detect this, but in the case of TLS, the attacker can generate a small ClientHello request which will solicit a very large response from the server (containing the server’s certificate as well as intermediate certificates). Because the attack occurs at a lower level, within the TLS library, it’s less likely to be logged and may be unnoticed.

During a DoS attack the victim will receive SYN-ACK packets with TFO option bits set without seeing an earlier SYN request. A firewall or intrusion detection system (IDS) may be able to detect this type of attack. The victim and server would also see RST packets from the victim as the packets are rejected by the destination, providing additional clues of a possible TFO DoS attack.

An attacker that has reliable means of accessing the victims network and able to successfully obtaining valid cookies from a series of servers, maybe able to launch a larger or prolonged attack as each server will need to individually detect that it’s involved in an attack and individually disable TFO. A victims network may therefore decide to remove the TFO option from all outgoing SYN packets to prevent TFO on the network completely.

The max queue length counter is the total number of outstanding SYN-ACK packets, i.e. it is per socket not per client. Therefore when an attack is detected by the server, TFO is disabled for all clients (not just the victim). It may not be desirable for a single attacker to disable TFO for all customers, although this won’t cause a denial of service, it will disable the benefits of TFO and is trivially exploited.

Middleware Issues

Some middleware, such as firewalls and NAT boxes may cause issues with the new TCP option. Additionally, because the Linux continues to set the TFO option to 254, which is the experimental kind, it maybe more likely to be dropped.

It’s even been reported some middleware boxes, after detecting the TFO option in the initial SYN packet, drop subsequent SYN packets without the TFO option.

Also, if a device is behind a Carrier Grade NAT (CGN) with many public IP addresses constantly changing, a cookie may become invalidated often, reducing the effectiveness of TFO. High latency mobile devices which benefit the most from TFO are also most likely to be affected by changing public IP addresses due to CGNs. I currently have no data on this.

Minor Linux API Caveats

A strong benefit of TFO is the ability to reduce the requirement for long running, persistent, TCP connections (such as HTTP servers with keep alives). Persistent connections allows a client to send more data, at a later time, without needing to wait for a 3WHS - but this connection is not free to maintain, costing both clients and servers and is further described in the TFO RFC. Although TFO can immediately reduce the requirement for persistent connections (for connections established with TFO), Linux does not currently have an API for the application to determine whether the connection was negotiated with TFO. Persistent connections would only be enabled for non TFO connections - but because the application has no API to detect TFO connection, it cannot optionally disable persistent connections. Mobile clients may also wish the disable, or shorten, persistent connections and/or browser TCP preconnects when TFO connections are available.

Potentially the getsockopt system call could be capable of providing this information, as it provides other socket information such as whether the socket is listening, debugging etc.

TFO has been assigned TCP option 34 by IANA, however, the Linux (3.18 at the time of writing) currently uses the old experimental TCP option 254. This leads to an interesting thought, once the Linux supports the new TCP option number, will it continue to support the old experimental option number? Current Linux servers, such as Red Hat Enterprise Linux 7, and its derivatives, will continue to use set (in the case of a client) or check (in the case of a server) the older option number. Will the kernel developers simply stop checking the experimental number, legitimately breaking backwards compatibility - requiring a backport?

Linux API does not provide a mechanism for key rotation, once a new key is generated, cookies generated with the old key are immediately invalid. This benefit is briefly discussed in the RFC in Server Cookie Handling.

When a connection exceeds the max_qlen (the maximum number of unacknowledged TFO requests, used to reduce DoS attacks), log the event via pr_notice, printk or similar. Most people will probably never monitor the correct counters to detect this event, but will likely monitor kernel messages.

Enabling TFO in the Kernel

Linux supports configuring both overall client and server support via /proc/sys/net/ipv4/tcp_fastopen (net.ipv4.tcp_fastopen via sysctl). The options are a bit mask, where the first bit enables or disables client support (default on), 2nd bit sets server support (default off), 3rd bit sets whether data in SYN packet is permitted without TFO cookie option. Therefore a value of 1 TFO can only be enabled on outgoing connections (client only), value 2 allows TFO only on listening sockets (server only), and value 3 enables TFO for both client and server.

Note, even though these options maybe enabled, application level support must also be enabled.

To enabled TFO and be persistent across reboots, you can use sysctl like the following:

echo 'net.ipv4.tcp_fastopen=3' > /etc/sysctl.d/50-tcp_fastopen.conf
sysctl -p /etc/sysctl.d/50-tcp_fastopen.conf

Linux (from version 3.13) generates a key when an application sets the relevant setsockopt syscall options for the first time. Until a key is set, the proc value is all zeros, so don’t be alarmed.

By default, because the key does not persist between reboots, production use of TFO should include saving the key securely (generating random keys, setting restrictive file permissions) via sysctl. This will ensure clients can use the existing cookie without needing a new key generated.

To generate a new key and make persistent via sysctl:

RAND=$(openssl rand -hex 16)
NEWKEY=${RAND:0:8}-${RAND:8:8}-${RAND:16:8}-${RAND:24:8}
echo "net.ipv4.tcp_fastopen_key=$NEWKEY" > /etc/sysctl.d/50-tcp_fastopen_key.conf
chmod 600 /etc/sysctl.d/50-tcp_fastopen_key.conf; chown root /etc/sysctl.d/50-tcp_fastopen_key.conf
sysctl -p /etc/sysctl.d/50-tcp_fastopen_key.conf
unset RAND NEWKEY

Monitoring

To view client connection statistics for clients, ip tcp_metrics is available from iproute2 in version v3.7.0 (2012-12-11). This application can show you cached MSS as well as the TFO cookie used for a single IP or all IPs (exclude the show 127.0.0.1 option in that case).

$ ip tcp_metrics show 127.0.0.1
127.0.0.1 age 93935.839sec rtt 875us rttvar 500us cwnd 10 fo_mss 65495 fo_cookie cec297e8b2723c29

For both clients and servers the following counters are available in /proc/net/netstat, for an easy overview you can use the following (adjusting the column numbers if required):

grep '^TcpExt:' /proc/net/netstat | cut -d ' ' -f 87-92  | column -t
  • TCPFastOpenActive - number of successful outbound TFO connections.
  • TCPFastOpenActiveFail - number of SYN-ACK packets received that did not acknowledge data sent in the SYN packet and caused a retransmissions without SYN data. Note the original SYN packet contained a cookie + data, this is not number of connections to servers that didn’t support TFO.
  • TCPFastOpenPassive - number of successful inbound TFO connections.
  • TCPFastOpenPassiveFail - number of inbound SYN packets with TFO cookie that was invalid.
  • TCPFastOpenCookieReqd - number of inbound SYN packets requesting TFO with TFO set but no cookie
  • TCPFastOpenListenOverflow - number of inbound SYN packets that will have TFO disabled because the socket has exceeded the max queue length.

Additional options created that maybe useful: commit

  • TCPSynRetrans: number of SYN and SYN/ACK retransmits to break down retransmissions into SYN, fast-retransmits, timeout retransmits, etc.
  • TCPOrigDataSent: number of outgoing packets with original data (excluding retransmission but including data-in-SYN). This counter is different from TcpOutSegs because TcpOutSegs also tracks pure ACKs. TCPOrigDataSent is more useful to track the TCP retransmission rate.

Rotating Keys

Linux provides /proc/sys/net/ipv4/tcp_fastopen_key (net.ipv4.tcp_fastopen_key via sysctl) which can be used to display the current key, as well as change the key to a new key. The key is 16 bytes, expressed as 32 character hex string, broken into 4 8 character blocks, separated by dashes.

Rotation of keys can be achieved in exactly the same way we generated keys in sysctl.

In a multi server server environment, you’ll want to randomly generate a key once, and set the same key on all servers.

Use Cases

Mobile Devices

TFO, being designed to reduce latency by removing an entire round trip, benefits mobile devices that often have a higher latency than other Internet connections provide.

Table data below, provided by the High Performance Browser Networking book, illustrates the typical latency for mobile networks.

Generation Data Rate Latency
2G 100–400 Kbps 300–1000 ms
3G 0.5–5 Mbps 100–500 ms
4G 1–50 Mbps < 100 ms

As discussed in other sections, a theoretical benefit of TFO could be to reduce request and response time to half (if the response fits within TCP’s initial congestion window (initcwnd). A mobile device, with a latency of 500ms, would require two round trips using a standard, non-TFO, TCP connection. Assuming there is no server processing time, this connection would take 1000ms. With TFO, the first round trip is removed, so the connection time is now 500ms.

Static Site / CDN

Due to the possible, however unlikely, event of duplicate data, static content such as JavaScript, CSS, images, etc. have no idempotent requirements and would greatly benefit from reduced latency benefits of TFO.

Websites that are primarily static sites, or micro sites, have no idempotent requirements and therefore safe to use TFO.

For this type of content, which is also less than 14,600 bytes, the response may often fit within the server’s TCP initial congestion window (assuming a value of 10), the entire request and response time could be reduced by up to 50% (assuming the server has zero processing/fetching time).

Reducing TLS Latency

TLS introduces at least one additional round trip (for versions up to and including 1.2, TLS 1.3 addresses this), for new connections, two round trips are required.

Whether the TLS service supports session resumption via session tickets or session IDs, only the first TLS round trip penalty is required which contains the ClientHello request and server responses (which can be sent in multiple packets). New connections, or connections with expired/invalid tickets will have one more additional round trip.

With TLS resumption, only one round trip in total would be required to setup the connection (the same amount as a normal TCP connection without TFO and TLS). This works because the initial SYN packet will contain the ClientHello TLS data, as well as the session resumption ticket or ID. The second packet will contain the application data, which will drop retransmitted data - providing idempotence protection.

DNS Servers using TCP

DNS primarily uses UDP for transactions. It originally defined the maximum packet size of 512 bytes. A client would request a record with UDP, and if the response required a larger packet than 512 bytes, the response was truncated and a bit set instructing the client to retry with TCP. This would require a total of 3 round trips.

This was recognised and Extension mechanisms for DNS (EDNS) was released in 1999, and has since been superseded by RFC6891.

EDNS permits UDP responses to exceed the 512 limit, and allows responses to be sent over multiple UDP packets. Catering for protocols such as DNSSEC which may exceed this limit.

Because of EDNS, TCP usage by DNS resolvers is uncommon (0.61% reported by a medium ISP’s DNS resolver) for most transactions.

Regardless of when TCP maybe used, TFO can assist in reducing the round trips required for successful UDP downgrade to TCP and subsequent 3WHS. But the real world benefits will be minimal.

Reducing Tor Latency

Additionally Tor is a TCP protocol with TLS that can use TFO. However, I haven’t had a chance to verify if, during the initial Tor TLS negotiation, the client’s first request can fit within one packet (containing the clients certificates). If it cannot, then the remaining data must be transmitted after the 3WHS, nullifying the benefits of TFO.

Multipath TCP

Multipath TCP (MTCP) is an effort to support multiple TCP connections in the connection, taking advantage of multiple links for bandwidth and/or availability - all the while being transparent to the application.

A currently draft RFC, draft-barre-mptcp-tfo, seeks to address possible issues and inefficiencies with using MTCP and TFO concurrently. Among others, it provides suggestions for the likely exhausted TCP option space (maximum limit 40 bytes) as both TFO and MTCP requires 12 bytes each, totally 24 bytes, with recent Kernels already using an additional 20 bytes.

Other TCP Protocols

TFO is applicable to any TCP connection that is sensitive to latency, not just HTTP connections. Other protocols such as file servers (SAMBA, CIFS, NFS etc.) and potentially more could benefit from a reduce RTT.

Currently many browsers, such as Chrome/Chromium and Firefox, establish a TCP connection (TCP preconnect) to a server preemptively - before the user requests a resource. The idea is to establish at least one TCP connection before a user requires it, when the connection is required an already established connection can be used. Removing the TCP’s 3WHS for the request.

There are a few minor issues with this is. First, a TCP connection must be established and held open, some load balancers and/or web servers close an option connection that has not made a request after a short timeout, and records this as an error. This causes additional logging and potentially wasted effort investigating the errors. See HAProxy issues and Firefox issue

Another issue with this is that the TCP connection must be re-established constantly, whenever the browser thinks the user is about to connect to the site.

A separate mechanism could be employed using TFO. Whereby the browser preemptively establishes a TCP connection with TFO options allowing the TCP stack to obtain a cookie. Future connections would not require a preconnect if a TFO cookie was successfully obtained. Note however, the cookie may be invalid (server’s key has changed) or the source IP of the client may change, therefore the browser may choose to initiate a preconnect after some period of time.

Note, cookie prefetching would require an API for the application to detect whether a cookie was successfully obtained, to know whether it can close the connection immediately and to not establish a preconnect in the future. No API is currently available in Linux.

History

  • Feb 17th 2012
    • TCP Fast Open initial draft released.
  • Sept 30th 2012
    • Linux 3.6 adds TFO client support. Commits.
  • Dec 10th 2012
    • Linux 3.7 adds TFO server support. Commits.
  • Jun 30th 2013
    • Linux 3.10 stops clearing cached TFO cookies when clearing other TCP metric caches. Commit
  • Nov 3rd 2013
    • Linux 3.12 encrypts server IP along with client IP when generating MAC. Commit.
  • Jan 19th 2014
    • Linux 3.13 enables TFO client support by default by changing the tcp_fastopen sysctl value from 0 to 1. Commit.
    • Linux 3.13 randomly generates TFO key only once a socket requests TFO, instead of each boot. Commit.
  • Aug 3rd 2014
    • Linux 3.16 adds IPv6 TFO support. Commit.
  • Dec 18th 2014