Nautilus Members enjoy an ad-free experience. or Join now .

The Dangerous Evolution of the Coronavirus

These scientists have a new model for identifying variants before they kill.


Relief coursed through me last week when I learned my 2-year-old daughter’s daycare provider got her first dose of the Pfizer vaccine. It left her feeling more groggy than she expected, but that was a good thing, suggesting the vaccine was kicking her immune system into gear.

While I was grateful her vaccination reduced the risk of exposure in my social pod, I was reading about new genetic variants of SARS-CoV-2 from the United Kingdom, South Africa, and Brazil. At a recent press conference, Anthony Fauci, chief medical advisor to President Biden, said the current Pfizer and Moderna vaccines, when confronting the new mutations, were still “well above the line of not being effective.” That’s reassuring, of course, until you consider that vaccines, battling mutations, can fall below the line of effectiveness. That’s not the fault of vaccine design, it’s the result of nature’s design. As long as the virus continues to find new human hosts, it will continue to evolve mutations to escape our immune defenses.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

“The more people infected, the more likely that we will see new variants,” Michel Nussenzweig, a Rockefeller University immunologist, told The New York Times. “If we give the virus a chance to do its worst, it will.”

Both the U.K and the South Africa coronavirus variants have high escape potential.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

Thankfully, scientists are on the case figuring out what form the virus’ worst may take next. A team at the Massachusetts Institute of Technology that includes a biological engineer, a computational biologist, and a mathematician, has come up with a novel way of predicting which SARS-CoV-2 mutations can escape the human immune system. Their research could help vaccine makers stay one step ahead of evolution and cut off deadly mutations at the pass. In a new paper, published in Science with the enticing title, “Learning the language of viral evolution and escape,” the scientists elucidate a promising bridge between ideas about natural language, like grammar and meaning, and how viruses and bacteria, among other things, evolve. “We’re taking steps toward understanding the language of biology,” said Bonnie Berger, one of the paper’s authors.

Berger and coauthors Bryan Bryson and Brian Hie recently joined me on Zoom to explain their research. Their work stems from the thorny—or perhaps I should say “spikey”—ability of viruses to mutate in ways that allow them to evade our immune system. It is, after all, SARS-CoV-2’s spike protein, a multifunctional molecular machine, that helps the virus break into our cells. As a report this month in The Lancet noted, the variant that recently arose in the U.K., which is “rapidly becoming a global threat,” is characterized by “multiple mutations in the spike protein.” The Moderna and Pfizer vaccines work against SARS-CoV-2 by giving our cells instructions on how to make a harmless piece of the virus’ spike protein; that information sets up our immune system to effectively fight off the virus. When the spike protein changes, however, it can become harder for our immune system to recognize and, ultimately, thwart.

Berger heads the Computation and Biology group at MIT’s Computer Science and Artificial Intelligence Lab. She explained that mutations in the original Wuhan strand should be investigated for “semantic change,” which “could lead to escape.” A semantic change, or change in meaning, in a protein comes down to its amino-acid sequences, which follow a set of rules. A protein is a polypeptide chain of amino acids, letters in the 20-character protein “alphabet” that code for a particular protein structure and purpose. The sequences of these letters, joined to each other with peptide bonds, are transcribed and translated from DNA and RNA—or, in the case of SARS-CoV-2, just RNA. For the virus’s spike protein to link successfully to our cells, its sequence has to adhere to a syntax. “We think of that as a biological grammar,” said Hie, a doctoral candidate at the Computer Science and Artificial Intelligence Lab at MIT.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

To escape the immune system’s search-and-destroy cells, Hie said, the virus must camouflage itself. In Science, the authors write, “We identified escape mutations as those that preserve viral infectivity but cause a virus to look different to the immune system, akin to word changes that preserve a sentence’s grammaticality but change its meaning.” This clever rewrite, Berger said, is evolution at work. “Evolution is a language,” she said. “Its language developed like English developed—over time. Protein sequences, or nucleic acid sequences, develop over time.”

The scientists designed machine-learning algorithms, originally created to work with human languages like English, to identify the new “words” in the virus’ protein sequences. “The same principles used to train a language model on a sequence of English words can train a language model on a sequence of amino acids,” they write. Fundamentally, Hie said, the algorithms work like sentence-completion models, such as the autocorrect and autocomplete functions in the iPhone and Google’s Gmail.

If we give the virus a chance to do its worst, it will.

They trained their language model on 4,172 spike protein sequences, much less than what was available to work with for proteins associated with other viruses, for which the researchers also created language models to predict viral escape—44,851 sequences of a protein in the influenza A virus, and 57,730 sequences of a protein in the HIV virus. They found that the degree of a protein sequence’s grammatical correctness, according to their language models, was significantly correlated with viral fitness “across all strains and across studies that examined single or combinatorial mutations,” despite the fact that their language models “were not given any explicit fitness-related information.” This suggests that a mutation’s “grammaticality” actually captures information about how fit that mutation is for infecting people. It also, the researchers said, increases the dimensions of scientists’ understanding of how a change in meaning can encode the perturbing of a protein’s purpose.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

Bryson is a member of the Ragon Institute at Massachusetts General Hospital and heads his own lab at MIT’s Biological Engineering department. He put it this way: “We’ve blasted the doors open with CSCS in terms of an intellectual paradigm.” CSCS stands for constrained semantic change search. It’s the task of looking for mutations to the spike protein with both high grammaticality and high semantic change. There’s not really an analog for doing this usefully on some body of text in the English language. And that’s the point. “Before, people would just look at grammaticality of English or the meaning, the semantics,” Berger said. “But looking for escape—high grammaticality and high semantic change—you wouldn’t do that for the English language, which is why we came up with a new natural-language processing model. Natural-language processing drove the solving of a biological problem.”

In Bryson’s lab, scientists can now experiment with spike-protein mutants. “We’re saying, ‘OK, we can generate this,’” Bryson said. “In our model, it falls in the 98th percentile.” That means if you compare this mutant’s sequence to previously seen coronavirus sequences, then 98 percent of those previously seen coronavirus sequences have lower semantic change, as predicted by the model, than the mutant. “We can see how well it infects cells,” Bryson said. “We can ask, ‘How well does it bind the receptor ACE2?’—the cellular doorway the spike protein opens to cause COVID-19—‘How well does it survive in the presence of antibodies?’ We now have a little bit more understanding of how semantic changes relate to all of the biological features of infection.”

A mutation’s “grammaticality” actually captures information about how fit that mutation is for infecting people.

Both the U.K and the South Africa variants have, according to the researchers’ model, high escape potential. The latter scored higher for semantic change than the former. “Those may be prone to escape and may not be targeted by the vaccine, but that needs to be experimentally verified,” Berger said. “We’re just doing quantitative prediction.”

Nautilus Members enjoy an ad-free experience. Log in or Join now .

I asked the scientists how the development of our current vaccines might have been different if Moderna and Pfizer knew about this language of viral escape. Bryson said, “I think our model underscores the importance of using the full length of the spike as an immunogen, as opposed to prioritizing particular regions of the protein over others.” He said it is fortunate that a lot of the vaccine designs are focused on the full-length spike protein, which their model suggests is a good move. At the same time, Bryson warned, “our model does quantify certain parts of the protein as having high escape potential.”

However, with that knowledge, Bryson said, chemists could fine-tune vaccines to target the sequences most likely to escape. “We could say, OK, even as we get more and more sequences, how can we think about designing new vaccine strategies that focus the immune response on this particular region of the protein?” In the computer, he explained, he could generate new spike mutations and determine (to use the language metaphor) whether they change the meaning of the protein sequence and become dangerous. Based on that modeling, Bryson said, “We can do deep virology, immunology, antibody-binding—all these types of experiments that allow us to explore those pieces.”

The team is looking to take their model to uncharted territory beyond protein sequences. “This conceptual framework is not limited to immune selection and immune escape, but can generalize to different kinds of evolutionary pressure,” Hie said. “Proteins will mutate themselves to preserve their fitness and function. So you can use the same framework to understand drug resistance—to chemotherapy or to antibiotics.” Bryson is interested in understanding how their model can analyze bacterial genomes in a drug-pressure setting. “You could imagine how proteins in malaria also undergo a lot of mutation to avoid immunity,” he said. “There’s lots of different places where we could use this, and we’re going to try as many as we can.”

Nautilus Members enjoy an ad-free experience. Log in or Join now .

Brian Gallagher is an associate editor at Nautilus. Follow him on Twitter @bsgallagher.

Lead image: kora_sun / Shutterstock

close-icon Enjoy unlimited Nautilus articles, ad-free, for less than $5/month. Join now

! There is not an active subscription associated with that email address.

Join to continue reading.

You’ve read your 2 free articles this month. Access unlimited ad-free stories, including this one, by becoming a Nautilus member.

! There is not an active subscription associated with that email address.

This is your last free article.

Don’t limit your curiosity. Access unlimited ad-free stories like this one, and support independent journalism, by becoming a Nautilus member.