The method I’m explaining in this article is not new, however, it is interesting enough to create a live demonstration. I’m talking about a technique to track web users (or more precisely their web browsers) using Transport Layer Security (TLS) with HTTP Strict Transport Security (HSTS). Tracking users via regular cookies is very common. Tracking via TLS and HSTS is somewhat rare. Users don’t have an easy-to-use way to remove the tracking information. Until a few years ago, this tracking method even survived the private browsing mode. So, how does it work?
TLS and HSTS
Transport Layer Security (TLS) is a protocol that enables us to transfer information over the internet via an encrypted communications channel. HTTPS uses TLS. HTTP Strict Transport Security is a policy framework, which gives the browser security information for future requests to a website. The server can signal that the browser should use HTTPS (and never HTTP) for requests to the domain for a certain amount of time. If a browser knows this, visiting the site via HTTP causes the browser to rewrite the request to use the HTTPS version.
The motivation for this technique is security. Many websites redirect HTTP requests to HTTPS. If malicious attackers performed a man-in-the-middle attack, they could modify the initial redirect or serve a fake website directly without ever getting to the secure HTTPS version of the site. With HSTS, if you have visited the HTTPS version before, your browser knows that the HTTP request should be an HTTPS request and alters the request before any man-in-the-middle could intercept the insecure connection.
The same concept, however, can also be used to track visitor of the website. The key is the browser remembering to rewrite HTTP requests to HTTPS. The tracking website asks the browser to access the HTTP version of a subdomain. If the request is made via HTTPS instead of the HTTP, the server (or the script running in the browser) knows that this client visited the HTTPS version of the subdomain in the past. This is a single bit of information: HTTP request (bit not set) or HTTPS request (bit set).
There need to be a couple of subdomains in order to track users. Assume that
we have 32 subdomains. This would mean, we can store over 4 billion different
identifiers (more than the total number of public IPv4 addresses). When the site
is accessed for the first time, a new identifier is generated and encoded as a
binary string, i.e., a sequence of 1’s and 0’s. If for example, the binary string
(when working with five subdomains) is
00101, the browser is asked to access the HTTPS version of the third and fifth
subdomains, because the third and fifth bits are set.
By accessing the HTTPS versions of these selected subdomains, the server tells the browser that it should only use HTTPS for future requests to these selected subdomains. When the user revisits the website with the same browser, the browser remembers to rewrite the HTTP requests to subdomains for which the bit in the identifier string was set. Based on the pattern of HTTP or HTTPS requests to the subdomains, the server (or the script running in the browser) can decode the binary string to get the identifier again.
Deleting cookies will not help, because there are no cookies. The only thing that helps to get rid of this super cookie (if one want to call it like that), is deleting the (recent) browser history. Modern web browsers make sure that this super cookie does not propagate into or out of the private browsing mode as it did a few years ago.
Sounds scary, right? Go check out the live demonstration.