Exploring why decreasing TLS record size can improve webpage load



TLDR: Reducing TLS record size will need less data to decrypt initial block and allow loading other assets before loading full document. This will also improve TTFB because decryption library will pass decrypted data quicker

While working on our websites we use several ways to speed it up. Some people to to extremes and use just tiny HTML page, some try to keep page always under 14 kilobytes to fit into TCP’s congestion window, convert everything into webp, compress with gzip etc. Here I wanted to explore one more way: decreasing TLS record size to let browsers read header and start loading other resources before the whole of html is loaded.

This will not magically speed your site to be 10x faster but it is an interesting way that some top websites use to load a bit more faster (especially on slow and unreliable networks).

What is TLS record layer

When a client sends request to a URL (requests a document) server will prepare a response and send it back to the client. There are many different document types that server will be sending back (css, js, images, etc) but for our case we are interested in HTML. So server will prepare and HTML, pack it into an HTTP, encrypt it and pass it down to other systems to do other processing and deliver it to the client.

Most of the time this HTML will be larger that the maximum allowed blocks for transporting for the lower system and will be cut into blocks. The data will be split into MSS sized blocks to escape IP fragmentation and more efficient transport (There is also layer 1-2 but we don’t need it here).

One aspect that we usually glance over it that there is another level of data splitting between HTTP and TCP and it is TLS records. It is another envelope that we use to split information and it has a maximum size of 16 KB (16384 bytes). 16 KB is the default split size and so all sites that use default settings split their content using this size.

Client and server communication. Data split

What does it give us? The thing is that the client can read incoming data at the granularity of this TLS record size. Even though that data will come in about 1.5 KB chunks and tls library will be able to decode it in smaller chunks it cannot return to the client decrypted data until it reads and verifies all the of the record. Why? It is because of how records are structured and processed. Each record starts with a small header and ends with auth tag (MAC) that verifies record. And even though the library itself decrypts it in smaller blocks it cannot give the decrypted data back to the client until it verified that data is correct and not corrupted.

TLS record split into segments

So this is what TLS record layer does (more or less).

  • Takes chunks of application data (from HTTP, SMTP, etc.).
  • Splits them into records (max plaintext size = 16 KB = 16,384 bytes).
  • Adds a MAC or AEAD authentication tag.
  • Encrypts the record (using AES, ChaCha20, etc.).
  • Prepends a record header (5 bytes).

Later bottom network layers could split it even into smaller chunks for transportation. It could be done like this for a maximum record size. Typical MTU is 1500 bytes which gives us a 1460 bytes payload (after subtracting IP/TCP headers). The TLS record is about 16410 bytes. If we divide 16410/1460 we get about 12 TCP segments that need to be transported. This is already larger than congestion windows and it means that there will be even more delay for initial read.

Interpretation

You might say so what? I can wait not only for first records but for the whole document to be downloaded and start working with it as soon as the whole is ready. And this is probably the case for regular apps. But the trick is modern browsers and how they work with content loading for a webpage.

The first issue is that a modern webpage is rarely a single document. After downloading initial HTML page it will have to load CSS and JS files and if those extra files are blocking (which most sites have at least a few) then page will not start rendering until it gets all the files that are required to render that page.

Modern browsers try to be smart and start loading those blocking extra resources as fast as possible and to do that they don’t wait for the whole HTML document to load to proceed parsing it. They start parsing it right away and as soon as they read HTML header and find required files they add them to the queue immediately. By this pipelining requests overall webpage initial render speeds up quite a bit on slower connections.

Wikipedia resources load in chrome

Let’s look at how chrome renders wikipedia’s main page. It first starts loading it HTML document and about half way through start loading other pages. Even Though the whole document is 116 KB it is gzipped to be 27.3 KB at the end. And since TLS record is 16 KB it is safe to assume that somewhere in the middle of downloading it got its first record, decrypted it and started parsing. During parsing of HTML it found some other resources that it needs and started downloading them immediately without waiting for the whole page to download.

Wikipedia resources load in chrome

Now the same test with google shows similar results as it also starts downloading other content before the whole HTML page loaded. Even though the downloaded google page is 52 KB gzipped which slightly bigger than that of wikipedia the time for other resources download is significantly closer to the start of HTML load. The reason for that is that google uses smaller TLS record sizes and this helps it to start parsing and checking for other resources faster.

Examples

I initially stumbled upon this while I was playing with my small network debugging/exploring tool. (the tools is working on all platforms but I have only android version due to request from my android friends). It has a tool where I usually explore at what stage the request stalls or to quickly explore if it is a server who is taking a lot of time to compute the result. What took me by surprise is sites return different different chunks on read requests. Initially I thought I made a mistake but on inspection I found decryption library just holds data until it fully decrypts it.

Here are some examples sites. I were testing some VPN related stuff and thus overal speed might seem slow but we are looking at contet sizes only here.

Sites with smalle TLS record size example

Let’s first look into regular sites. The tool does not clearly shows that requests are 16 KB. This is because of how Dart lang does networking. They have a two 8 KB buffers to communicate decrypted data from C++ to Dart and thus we see these random looking 8191+82191+2 bytes. It is implementation details and does does not really matter in this case as we are not trying to get exact results. Rough approximations are good enough. We see that samsung.com, wikipedia.org and even tiktok uses 16 KB records. There is also hackernews but considering it asceticism it looks like these type of optimizations won’t be of much help in their case.

Sites with smalle TLS record size example

Here is some output from that tool for sites with smaller TLS record sizes. Of course there is google, instagram and facebook as they probably care about these types of optimizations and try to squeeze every bit of optimization. But there is also my site but I didn’t do this optimization. It happens that my blog is hosted on github pages and github does this for all hosted pages.

Google seems to size their TLS record to fit exactly the full HTTP header into it. I cannot prove it but all requests to google and youtube in my tests were exactly fitting HTTP header no more and no less. Facebook seems like going with 1500 limit but I have not done enough tests to search for any kind of patterns.

Conclusions

First of all, why is everybody using 16 KB record sizes? It is probably because everybody is using default settings for popular web servers. Most people don’t bother and leave the default setting for record size. If you want to test you put this into your nginx conf (size of 2 KB is arbitrary here):

ssl_buffer_size 2k;

Another question might be what is the overhead of decreasing record size? It depends on encryption method but generally there will be 5 bytes for the header and 21-70 bytes for MAC. So overhead for 16 KB is around 0.1%-0.4%. For 2 KB is around 1%-3% of bytes.

Does it worth it? Probably does if increasing TTFB is valuable for your business and your customers. It won’t magically speed your site 10x but will load a page with multiple resources faster and will have significant impact for users with slower and unreliable connections.