It Is Time to Move Beyond End-to-End Encryption
Arvid Lunnemark — December 29, 2021
As any informed internet citizen of the 2020s will tell you, two things are true. The first is that privacy is important. The second is that the internet does not have a good track record of ensuring privacy.
The services we use every day are being hacked again and again, governments all over the world gather information about our contacts and habits, and employees of messaging giants read their partners' supposedly private messages.
As any informed internet citizen of the 2020s will also tell you, there is ostensibly a solution: end-to-end encryption. End-to-end encryption makes sure that a message can only be read by its intended recipient, even if the government spies on the network, even if the service you're using has malicious employees, and even if said service gets hacked. The messaging app Signal is perhaps the prime example of an end-to-end encrypted messaging service, and it cannot be overstated how great Signal is for privacy. No one can read your messages, provably. I love Signal, and if you aren't using it, you should.
Yet, regrettably, end-to-end encryption is not the ultimate solution.
Why not? Because it leaks metadata.
Metadata is everything in a communication except the actual message content itself. It is who you talk to, how often you talk to them and for how long, and who the people you talk to talk to. While end-to-end encryption hides the actual data, it does nothing to protect metadata.
It may seem like it's not too much of a problem that metadata isn't protected. After all, what's a hacker going to do with the fact that you texted your ex 10 times in a row last night? Embarrass you, maybe. While that may or may not convince you that metadata is worth protecting1, consider the woman who called the domestic violence hotline last week, the government employee who shared files with journalists the past month, or the Belarusian student who joined an opposition party group chat this morning. For all of them, leaking who they talked to (i.e., metadata) could be disastrous, even if the message contents remain protected.
Don't just take it from me. Academic research over the past decades has time and time again made clear the vast amount of information that can be extracted simply from knowing who talks to who, when. Metadata is important, and end-to-end encryption doesn't hide it at all.
It is therefore time to move beyond end-to-end encryption, and enter complete privacy: when you talk to a friend, no information at all should be revealed, be it content or metadata. No one except you and your friend should be able to know that you two are communicating. End-to-end encryption solves content privacy. To go beyond, we need metadata privacy.
How do we achieve metadata privacy? At a high level, there are two approaches. The first is to design our services such that they delete any records of who is talking to who as soon as technically possible. This is what Signal is doing. While this is a good start, it requires users to trust that the Signal servers are actually running the code that they tell us they are running. It also does not preclude a powerful network observer from analyzing timing events to figure out partial metadata, and it does not protect against a hacker who gains access to Signal's servers.
The second approach is to require cryptographically complete privacy: regardless of what the server is doing, regardless of any network observers, and regardless of hackers, no metadata at all should be possible to extract (assuming standard cryptographic assumptions, e.g. that factoring integers is hard). The second approach is to the first approach like end-to-end encryption is to in-transit-only encryption. With cryptographically complete privacy for metadata, as with end-to-end encryption for content, you need to trust no one but your own computer. It is obvious that we should prefer the second approach to the first.
Today, to my knowledge, no completely private service of any kind exists. There is Tor, but it does not have provable anonymity, and as a consequence, it suffers from many different privacy attacks. The reason complete privacy doesn't exist yet is that it's a hard theoretical problem: researchers have studied it for years and years, trying out and inventing new cryptographic and algorithmic solutions. Complete privacy has long been at odds with scalability, and while it still is, recent research has brought complete privacy into the realm of feasibility.
We can do complete privacy now. We can protect metadata. And we should. It is time for us to move beyond end-to-end encryption.
My name is Arvid, and I'm one of the co-founders of Anysphere, a completely private communication platform. Apply to join Anysphere here.
If you're interested in working on or talking about a problem that is technically fascinating from both a cryptography perspective and a performance engineering perspective, as well as socially important for both the everyday person and the dissenter, email me: email@example.com.