We all know to be careful about the details we share online, but the information we seek can also be revealing. Search for driving directions, and our location becomes far easier to guess. Check for a password in a trove of compromised data, and we risk leaking it ourselves.
These situations fuel a key question in cryptography: How can you pull information from a public database without revealing anything about what you’ve accessed? It’s the equivalent of checking out a book from the library without the librarian knowing which one.
Concocting a strategy that solves this problem—known as private information retrieval—is “a very useful building block in a number of privacy-preserving applications,” said David Wu, a cryptographer at the University of Texas, Austin. Since the 1990s, researchers have chipped away at the question, improving strategies for privately accessing databases. One major goal, still impossible with large databases, is the equivalent of a private Google search, where you can sift through a heap of data anonymously without doing any heavy computational lifting.
It would be like having a librarian scour every shelf before returning with your book.
Now, three researchers have crafted a long-sought version of private information retrieval and extended it to build a more general privacy strategy. The work, which received a Best Paper Award in June at the annual Symposium on Theory of Computing, topples a major theoretical barrier on the way to a truly private search.
“[This is] something in cryptography that I guess we all wanted but didn’t quite believe that it exists,” said Vinod Vaikuntanathan, a cryptographer at the Massachusetts Institute of Technology who was not involved in the paper. “It is a landmark result.”
The problem of private database access took shape in the 1990s. At first, researchers assumed that the only solution was to scan the entire database during every search, which would be like having a librarian scour every shelf before returning with your book. After all, if the search skipped any section, the librarian would know that your book is not in that part of the library.
That approach works well enough at smaller scales, but as the database grows, the time required to scan it grows at least proportionally. As you read from bigger databases—and the internet is a pretty big one—the process becomes prohibitively inefficient.
In the early 2000s, researchers started to suspect they could dodge the full-scan barrier by “preprocessing” the database. Roughly, this would mean encoding the whole database as a special structure, so the server could answer a query by reading just a small portion of that structure. Careful enough preprocessing could, in theory, mean that a single server hosting information only goes through the process once, by itself, allowing all future users to grab information privately without any more effort.
For Daniel Wichs, a cryptographer at Northeastern University and a co-author of the new paper, that seemed too good to be true. Around 2011, he started trying to prove that this kind of scheme was impossible. “I was convinced that there’s no way that this could be done,” he said.
But in 2017, two groups of researchers published results that changed his mind. They built the first programs that could do this kind of private information retrieval, but they weren’t able to show that the programs were secure. (Cryptographers demonstrate a system’s security by showing that breaking it is as difficult as solving some hard problem. The researchers weren’t able to compare it to a canonical hard problem.)
So even with his hope renewed, Wichs assumed that any version of these programs that was secure was still a long way off. Instead, he and his co-authors—Wei-Kai Lin, now at the University of Virginia, and Ethan Mook, also at Northeastern—worked on problems they thought would be easier, which involved cases where multiple servers host the database.
In the methods they studied, the information in the database can be transformed into a mathematical expression, which the servers can evaluate to extract the information. The authors figured it might be possible to make that evaluation process more efficient. They toyed with an idea from 2011, when other researchers had found a way to quickly evaluate such an expression by preprocessing it, creating special, compact tables of values that allow you to skip the normal evaluation steps.
That method didn’t produce any improvements, and the group came close to giving up—until they wondered whether this tool might actually work in the coveted single-server case. Choose a polynomial carefully enough, they saw, and a single server could preprocess it based on the 2011 result—yielding the secure, efficient lookup scheme Wichs had pondered for years. Suddenly, they’d solved the harder problem after all.
At first, the authors didn’t believe it. “Let’s figure out what’s wrong with this,” Wichs remembered thinking. “We kept trying to figure out where it breaks down.”
But the solution held: They had really discovered a secure way to preprocess a single-server database so anyone could pull information in secret. “It’s really beyond everything we had hoped for,” said Yuval Ishai, a cryptographer at the Technion in Israel who was not involved in this work. It’s a result “we were not even brave enough to ask for,” he said.
Cryptographers have a long history of results that were initially impractical.
After building their secret lookup scheme, the authors turned to the real-world goal of a private internet search, which is more complicated than pulling bits of information from a database, Wichs said. The private lookup scheme on its own does allow for a version of private Google-like searching, but it’s extremely labor-intensive: You run Google’s algorithm yourself and secretly pull data from the internet when necessary. Wichs said a true search, where you send a request and sit back while the server collects the results, is really a target for a broader approach known as homomorphic encryption, which disguises data so that someone else can manipulate it without ever knowing anything about it.
Typical homomorphic encryption strategies would hit the same snag as private information retrieval, plodding through all the internet’s contents for every search. But using their private lookup method as scaffolding, the authors constructed a new scheme which runs computations that are more like the programs we use every day, pulling information covertly without sweeping the whole internet. That would provide an efficiency boost for internet searches and any programs that need quick access to data.
While homomorphic encryption is a useful extension of the private lookup scheme, Ishai said, he sees private information retrieval as the more fundamental problem. The authors’ solution is the “magical building block,” and their homomorphic encryption strategy is a natural follow-up.
For now, neither scheme is practically useful: Preprocessing currently helps at the extremes, when the database size balloons toward infinity. But actually deploying it means those savings can’t materialize, and the process would eat up too much time and storage space.
Luckily, Vaikuntanathan said, cryptographers have a long history of optimizing results that were initially impractical. If future work can streamline the approach, he believes private lookups from giant databases may be within reach. “We all thought we were kind of stuck there,” he said. “What Daniel’s result gives is hope.”
This article was originally published on the Quanta Abstractions blog.
Lead image: Allison Li for Quanta Magazine