Ask, and it shall be given you; seek, and ye shall find; knock, and it shall be opened unto you.” Verse 7:7 from the Gospel of Matthew is generally considered to be a comment on prayer, but it could just as well be about the power of search. Search has become one of the key technologies of the information age, powering industry behemoths and helping us with our daily chores. But that’s not where it ends. Scientists are starting to understand that search powers much of the natural world, too.
Saket Navlakha, of the Salk Institute for Biological Studies, works at the “interface of theoretical computer science, machine learning, and systems biology,” a field, he told me, that he and his colleagues are calling “algorithms in nature.” Evolution needs algorithms just as software engineers do, Navlakha says, because it “has also had to deal with building efficient, reliable, low-cost systems that help animals and organisms survive.” His hope is to find in nature “new ideas and new engineering principles” that can be exploited by human scientists and engineers.
In a study published on Friday, Navlakha and a couple colleagues, Sanjoy Dasgupta and Charles F. Stevens, did just that. They found that the fruit fly brain had some valuable lessons for anyone developing similarity search algorithms. Stevens had been studying fly neural circuits, specifically how they associate different behaviors, like approach or avoidance, with odors in the environment. “When he started telling me about it,” Navlakha says, “I realized that what the fly needs to do is do something like a similarity search. It turns out that in engineering, this is a very common problem faced by pretty much every tech company today,” and the fruit fly’s solution to it “improves upon what has been done before.”
Nautilus caught up with Navlakha to chat about the results.
How is the fruit fly brain inspiring better search algorithms?
One of the main differences between the fruit fly and the usual computer science approach is that computer scientists will typically take their data, either a video or image or song, and do what’s called a dimensionality reduction: They take their very high-dimensional object, and they’ll try to reduce it and then look for similarity in this lower dimension of space. It’s kind of like doing a principle components analysis, a popular technique to take some data and try to plot it, for example, in two dimensions, while still preserving the structure, so that you can visualize it better. What the fly actually does is, instead of reducing it, it expands the dimension into much larger than it was, and it creates a very sparse point in a high dimensional space.
The fly is introducing this new idea that we just simply haven’t thought of.
What do you mean, “the fruit fly expands the dimension”?
Let’s say you have 100 people, and you want to group them into some number of groups. Computer scientists will typically crowd the people into a small room. They’re going to be jammed in this very low-dimensional space. But now imagine you take those 100 people and you spread them out over a football field. It’s going to be very easy for you to identify the group structure, because there’s so much space where you can place these people. That’s what the fly is doing. It does this using something called a “random projection,” which is a way to change the dimension of your data while still preserving distances among the objects the data describes.
Why is the fruit fly solution more efficient?
The gain in the efficiency is where the random projection takes place. This is a little bit technical. Typically in computer science, the way you would compute the random projection is very expensive—it’s a gaussian random projection, if you want to know the technical term. The fly amazingly also uses a random projection, but it uses a much more efficient type of random projection: a sparse binary random projection. That’s where the real computational win comes. Even though the fly brain is using more neurons, it’s doing it at a much higher efficiency because of this random projection type, which means the fly has many more neurons to represent the objects in its database.
How do you know the fruit fly is doing all of this?
Over the last decade or so, people have gone into the fly brain and counted the number of neurons in different parts of the olfactory circuit, studied their firing properties, and traced out their synaptic connections. They gave us insight about what each part of the circuit is doing, which we then analyzed from an algorithmic perspective. If I have 2,000 neurons, then that means every object can be represented by some different combination of the 2,000 neurons. If I do it the computer science way, every object is going to be represented by just a combination of, let’s say, 10 or 20 neurons. Having more neurons is equivalent to having this larger football field space to store objects.
Why don’t we already have better search algorithms than a fruit fly?
There are tricks that the fly is using that we just have not thought of using, because they’re somewhat un-intuitive. Typically you would want to reduce the dimension. But the fly is introducing this new idea that we just simply haven’t thought of. Evolution has had a lot of time to figure out new strategies, or new ways of combining them. This is one of the goals of neuroscience: to understand how the brain does amazing things at such a low power, efficiently, that no computer can match today, and be able to translate those ideas into machine computation.
What have our ideas for better search algorithms been based on, other than nature?
The first way of solving the similarity search problem is just to go one by one and compare your query with each item in the database—linear search. For a long time, that was a good enough solution, because we only had a few thousands things in our databases, but now we have billions of videos and millions of products, and we can’t just do a one by one search. This led to a different class of algorithms called KD trees, which work in certain cases when you have large databases but very low dimensional objects. Then people said, “That’s not realistic either, we need to do this in very high dimensions,” and it’s been an iterative process studied over the last three decades, as the demands of the problem have changed, as the data has changed, and as we’ve gotten more data. It’s a problem that has evolved over the years, and by slow progress we’ve come to this solution.
How much better is the fruit fly’s search algorithm compared to the main approach?
On average, it probably does like 30 to 50 percent better. We took a few standard benchmarks that people use for this problem. We did two image datasets—images of different numbers, and the other was random images of natural scenes—and then one document similarity search. The amount that it improved was similar on all data sets.
Would I notice a difference in response time for an image search?
The response time would be similar, but the performance, or how relevant the things are that the algorithm suggests, will improve.
When will the fruit fly search be implemented in technology?
This could happen within months. We’re hoping to test it on larger datasets and against other algorithms and, if all goes well, it could be translated into technologies soon after.
Brian Gallagher is the editor of Facts So Romantic, the Nautilus blog. Follow him on Twitter @brianga11agher.
WATCH: The attraction of mathematical biology.