Resume Reading — How to Build a Search Engine for Mathematics

Close

How to Build a Search Engine for Mathematics

The surprising power of Neil Sloane’s Encyclopedia of Integer Sequences.

On the average summer Saturday, the mathematician Neil Sloane woke up to a crisis. “There are always crises,” he said— albeit…By Siobhan Roberts

On the average summer Saturday, the mathematician Neil Sloane woke up to a crisis. “There are always crises,” he said— albeit crises of the teapot tempest variety. One Saturday over breakfast, he faced an inbox message titled “edits from outer space.” Without authorization, a contributor in France had deleted an entry in Sloane’s Online Encyclopedia of Integer Sequences, which, like Wikipedia, is powered by volunteer contributors and editors.

The Day’s Work: Neil Sloane in his attic study, command central for the encyclopedia. He has taped to the wall an epigram from Kipling that reads “He had a theory that if a man did not stay by his work all day and most of the night he laid himself open to fever: so he ate and slept among his files.”Siobhan Roberts

But everyday, tending his encyclopedia like a garden, weeding and pruning and planting, Sloane also delights in the more pleasant surprises. On that same Saturday morning, for instance, a nice new sequence arrived. This specimen was governed by a rule that, as Sloane explained with signature bouncy exuberance, “gives you a list of numbers, only 16 numbers, and the biggest is 999,999,000,000. Six nines and six zeroes. Which is pretty amazing! Out of the blue we end up with this number.”

And indeed, it was a blue-sky day that Saturday at Sloane’s house in Highland Park, New Jersey, with perfect poufs of clouds and cicadas singing as temperatures neared their seasonal crescendo, … 82, 85, 86, 90, 94, 95. Sloane lives in a library as much as a house (crossed with a curiosity cabinet of ephemera), bookshelves insulating every room, with ring theory and number theory climbing the staircase from the second floor to his attic study. The attic is command central for the encyclopedia, which is a curated database and search engine of over 250,000 sequences, which interconnect with the world in any number of ways.

Search the keyword “cloud,” for instance, and you get sequence A136281:

Among the references you find the comment: “These are thunderstorm graphs. Their connected components are a single cycle (clouds), a path (lightning bolts), or an isolated vertex (raindrops).”

Chasing Coincidences

Whenever I fly, I like to talk to the person sitting next to me. Once in a while, I find that we know at least one person in common. If you are like me, perhaps coincidences such as this happen...READ MORE

Enter the keyword “sky” and you get sequence A074481 with the comment: “These primes form a pattern similar to an astronomical radiant (the point in the sky from which a meteor shower appears to originate).”

Enter “cicada” and you get A161664 with the definition “Safe periods for the emergence of cicada species on prime number cycles” and a link to a paper by the esteemed Cambridge mathematician Alan Baker, asking: “Are there Genuine Mathematical Explanations of Physical Phenomena?”1

Most users, though, mine the encyclopedia not by searching keywords but by searching sequences. They might have uncovered or invented a sequence in their own research, so they’re hunting for a numerical match. In this way, the encyclopedia serves a kind of Google for mathematics, with each sequence serving as a fingerprint of a particular mathematical or scientific property.2

The resulting reach and range of the encyclopedia sends one down a cascading index encompassing the natural sciences, physical sciences, earth and space, logic and math, applied sciences and technology, social sciences, business and finance, and beyond.

It’s this ability to connect disciplines that gives the encyclopedia its power.

Some sequences are governed by a mathematical formula, but not all of them, and some have no mathematical underpinning at all. “To me,” said Sloane, “an integer sequence is just a string of numbers, whole numbers. There’s no particular need for a mathematical relationship between them. They could be, for instance, the birth dates of the U.S. presidents.”

Another new sequence that arrived recently came courtesy of Sloane’s brother. He was visiting the Picasso Museum in Barcelona and came across a sequence in the artist’s 1936 work, “Poème: Mathématiquement pure image illusoire du ronflement écoeurant” (translation: “Mathematically pure illusory image of sickening snoring”). And there’s a sequence pertaining to the Catholic popes and their numerical order versus the order in which they reigned; Pope Francis is the 266th pope, and being the first Francis, he fittingly doesn’t bother with his regnal Roman numeral “I.”

“Of course,” said Sloane, “there has to be some kind of philosophical relationship, some kind of connection that has to be meaningful. There has to be some unifying thread, but it doesn’t have to be mathematical.” Generally, anything that can be counted is fair game. Even philosophy itself. One such example, according to Charles Greathouse, an editor in chief and trustee of the encyclopedia’s foundation, and an analyst and programmer at Case Western Reserve University, “would be the debate between Hipparchus and Chrysippus about (according to Plutarch) the number of compound propositions, which the former gave as 103,049 or 310,954 and the latter as more than 1 million. It seems that Hipparchus was referring to sequences A001003 and A010683 and Chrysippus to A025225. All three are sequences about counting objects very much like what is described in the fragments which remain of those writers.”

It’s this ability to connect disciplines that gives the encyclopedia its power. “The encyclopedia of integer sequences inspires much more new research than any single mathematician,” said Rutgers University math professor Doron Zeilberger. This makes Sloane something of a celebrity—Zeilberger’s been known to call him “the world’s most influential mathematician.” Sloane didn’t prove the likes of Fermat’s Last Theorem, nor the Poincaré Conjecture. But as Zeilberger noted,  “Proving big open problems is often a dead end, like climbing Mount Everest.” By contrast, a sequence, he said, is just the tip of the iceberg.

Start Counting: Sequence A250001 in the encyclopedia defines the number of ways to arrange any number of circles on the plane. There is one way to draw one circle, three ways to draw two circles, 14 ways to draw three circles, 173 ways to draw four circles (recently corrected from a previous count of 168), and a preliminary count predicts there are 16,968 ways to draw five circles.Jon Wild


Neil Sloane, who turned 76 this October, wears large retro-rectangular glasses—his perfect eyesight deteriorated in high school and he started wearing glasses as a graduate student, around the time he began collecting sequences.

He completed his doctoral dissertation at Cornell University in 1967, tackling a problem in artificial intelligence, about neural networks, then called “perceptrons.” He was trying to determine how many neurons are triggered over a neural network when a single neuron fires, and whether this activity continues on forever or whether it dies out. To model the neurons, with a childishly simple example, he produced a “rooted tree,” a mathematical graph with interconnected nodes representing neurons, and the root node representing the end of the activity. With this line of investigation he produced a sequence with seven terms: 0, 1, 8, 78, 944, 13,800, 237,432. For the fourth term, for example, he considered a neural network comprised of four neurons. He calculated the average distance from all four nodes to the root, and obtained the number 78. In a network of five neurons, he got 944; with six neurons, 13,800; and with seven neurons, 237,432.

This sequence looked promising, though Sloane couldn’t figure out the pattern or formula that would give him the next and all further terms, and by extension the sequence’s rate of growth. He searched out the sequence at the library to see if it was published in a math book on combinatorics or the like, and found nothing. Along the way, however, he came upon other sequences of interest, and he stashed them away for further investigation. He eventually computed the formula using a tool from 1937, Pólya’s enumeration theorem.

But this roundabout process had been frustrating. The task should not have been so difficult. He should have been able to simply look up his sequence in a comprehensive reference guide for all extant integer sequences. Since no such thing existed, he decided to build it himself. “I started collecting sequences,” he said. “I went through all the books in the Cornell library … And articles and journals and any other source I could find.”

Perceptrons: Neil Sloane’s notebook from 1964 with the first sequence, inspired by his Ph.D. on neural networks, that started the database.Siobhan Roberts

Sloane kept his collection first on punched cards, then in a “handbook”—A Handbook of Integer Sequences, published in 1973, with the copyright held by Bell Telephone Laboratories, where he started working in 1968. In 1995 he launched an automated email lookup service called Superseeker, whereby the curious submitted sequence queries and the database replied with answers. In 1996 he opened up his repository for public browsing at oeis.org. With the lab’s blessing, Sloane put it up on the research division’s website. They were happy to host, since sequences brought traffic; if you collect it, they will come. When Sloane and his preoccupation were written up at Slashdot it drove so much traffic that the site crashed. Said Sloane, “My management—on the pure research side—were quite proud of this.”

By the mid 1990s, the encyclopedia had also begun to prove its research value. One day Sloane was working away in his office at what was by that time AT&T Bell Labs when his colleague down the hall, Paul Wright, walked in and pitched the cell tower problem: What’s the best way to situate base station towers, maximizing the signal and minimizing power use, such that the towers are not too close together, causing interference, and also working with certain land constraints as to where the towers could or could not be located.

Sloane got to work on the pure mathematics side of this practical problem with his summer student, Mira Bernstein, now the executive director of the Canada/USA Mathcamp, and on the advisory board and faculty of Proof School. They computed the best arrangement for a small number of towers, and then much to their surprise found, via the nascent encyclopedia, a match with a sequence in the altogether different context of number theory, involving counting the maps on doughnut shaped tori.

“We managed to help the telephony side of the business,” said Sloane, “and produce some nice mathematics and some interesting sequences, and proved that these two problems turned out to be equivalent.”3, 4

It’s in this sense that a sequence is a fingerprint—a lingua franca or a barcode or canonical form—that can unlock the identity of a little known mathematical or scientific object, or objects and their hitherto unknown interconnectedness.

Ultimately, it all comes back to counting things, and counting is a universally handy tool. Which in turn makes the encyclopedia handy, too.

“One thing that mathematicians would love is if there was a way to search for math. And that doesn’t exist,” said Nadia Heninger, an assistant professor of computer and information science at the University of Pennsylvania who did a summer internship at AT&T Labs with Sloane as her mentor. “If you discover some object, you may be thinking about it in a way that no one has ever thought about before,” she said, noting also that you’re likely using terminology of your own invention, making searching hard. “You can’t type a mathematical object into Google and you can’t really type an object into Wikipedia. But you can evaluate your object based on a sequence of numbers.” And if you plug the sequence into the OEIS, that comes pretty close to searching for math. “The OEIS is a way of translating your object into a canonical form,” she said.

Ultimately, it all comes back to counting things, and counting is a universally handy tool. Which in turn makes the encyclopedia handy, too. “Suppose you are working on a problem in one domain, say, electronics, and while solving a problem you encounter a sequence of integers,” said Manish Gupta, a coding theorist by training who runs a lab at the Dhirubhai Ambani Institute of Information and Communication Technology. “Now you can use the encyclopedia and search if this is well known. Many times it happens that this sequence may have appeared in a totally unrelated area with another problem. Since numbers are the computational output of nature, to me, these connections are quite natural.”

Gupta and his colleague, Nilay Chheda, cited the encyclopedia in their paper “RNA as Permutation,” which detailed a novel interpretation of RNA in an information-theoretic perspective. Without getting into the technical details, it’s easy to imagine how integer sequences might apply to genetics: A gene is a sequence of DNA, and a DNA sequence in turn defines an RNA sequence. The process of DNA sequencing determines the pattern or order of the AGCT nucleotide bases (adenine, guanine, cytosine, thymine). Researchers in this field, said Gupta, use “different mathematical objects such as graphs, groups, formal languages, and combinatorics. Each such representation gives rise to connections with numbers.” And the most famous connection, he said, is with a sequence of numbers known as the Catalan numbers, which of course has its entry in the encyclopedia, as sequence A004148.

The encyclopedia’s impact on scientific research broadly speaking can be measured by its citations in journals, which currently Sloane has tallied to more than 4,500, ranging through biology, botany, zoology, chemistry, thermodynamics, optics, quantum physics, astrophysics, geology, cybernetics, engineering, epidemiology, and anthropology. It is a numerical database of the human canon.

Note to Self: Neil Sloane is currently digitizing the encyclopedia’s half-century worth of archives, a process he expects to complete next summer, after three year’s work.Siobhan Roberts


Sloane retired from AT&T Labs in 2012. The encyclopedia went home to the server in his attic, and its archives to bookshelves in his bedroom. And by Sloane’s study door there is a telling epigram from Kipling, hand-written and taped to the wall: “He had a theory that if a man did not stay by his work all day and most of the night he laid himself open to fever: so he ate and slept among his files.”

It’s not so surprising then that Sloane has suffered from some sequence-induced insomnia. “This is one of the sequences that wakes me up at night,” he said, delivering the opening talk at a celebratory conference he threw for the encyclopedia’s 50th anniversary. “Like it did last night at 3 a.m.”—

2, 3, 4, 5, 7, 9, 8, 11, 13, 17, 19, 23, 15, 29, 14, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, …

The sequence had arrived as a birthday present in his inbox just a week prior to the party, submitted by Amarnath Murthy, an electronics engineer and hobby mathematician based in Mumbai, who has contributed more than 4,900 sequences to the encyclopedia. Lying awake in the dark that night, Sloane had tried to prove something that seemed obvious: certain numbers (6, 10, 12, for example) never occur in the sequence; the evidence was overwhelming but there was no proof. He succeeded in proving that 6 never appears—the smallest number and the simplest case—but it took hours, and in the morning he found a gap in the proof, leaving the question open.

In getting to know this sequence, Sloane nicknamed it “Strangers on a Train,” after the psychological thriller novel by Patricia Highsmith and the subsequent film by Alfred Hitchcock. “Murthy’s sequence is a string of numbers,” explained Sloane, “and the rule for constructing it is that the nth term has to be a stranger to the next n terms, and that means that it mustn’t have any common factor with the next n terms. And you always take the smallest number that’s available, and that you haven’t used already.”

Aside from his failed proof, “Strangers on a Train” was causing Sloane consternation for another reason as well. At the anniversary festivities—which featured pizza and cake and Douglas Hofstadter, a cognitive scientist at Indiana University and a longtime sequence aficionado—Sloane discussed with Heninger what a nuisance it was to compute the sequence based on that rule. With the current term depending on future terms, which of course haven’t been figured out yet, the calculation—if doing it by hand with pencil and paper, as they were—required messy and laborious trial and error. Heninger had been working on an alternative and easier strategy, together with mathematician and Rubik’s speedcuber and hacker Lucas Garron, which instead involved looking back and taking stock of the sequence’s immediate past. “Looking back is easier than looking ahead,” Sloane said. “But they are equivalent.” Though, he added later, “Looking forward is sort of nicer.”

And indeed, thinking futuristically, when Sloane began his collection all those years ago he noted in his handbook another practical application: Sequences, he said, “might also be useful to have around when the first signals arrive from Betelgeuse.” Specifically, he suggested sequence A001034 would be an auspicious beginning with our alien brethren: 60, 168, 360, 504, 660, 1,092, 2,448, 2,520, 3,420, 4,080 … It is a sequence about symmetry; the orders of the nontrivial simple symmetry groups, the fundamental elementary particles of symmetry. And at that it would establish our credentials. “This message would be a very concise way of saying, ‘We are intelligent beings, interested in mathematics (and by implication knowledge, the higher things in life, music ...) rather than war, power ...’ ” It would be a friendly and optimistic beginning.


Siobhan Roberts is a Toronto-based writer. Her latest book is Genius At Play: The Curious Mind of John Horton Conway.


References

1. Baker, A. Are there genuine mathematical explanations of physical phenomena? Mind 114, 223-238 (2005).

2. Billey, S.C. & Tenner, B.E. Fingerprint databases for theorems. Notices of the American Mathematical Society 60, 1034-1039 (2013).

3. Bernstein, M., Sloane, N.J.A., & Wright, P.E. On sublattices of the hexagonal lattice. Discrete Mathematics 170, 29-39 (1997).

4. Bernstein, M. & Sloane, N.J.A. Some lattices obtained from Riemann Surfaces. Contemporary Mathematics 201, 29-32 (1996).

Join the Discussion