Remember when everything was “2.0?” Michelle Obama was Jackie Kennedy Onassis 2.0? Facebook was considered an example of Web 2.0? We all got what it meant and it hung in the air, or in “cyberspace,”—another faddish word—for half a decade until it was displaced from daily word usage by “3.0” and the “net.”
Words, like people, have their 15 minutes of fame. But there are words that deserve our rapt attention because they are immune to the changing winds of fashion. These words serve as the building blocks of language and provide a link between early human ancestors and the present day. Searching for those “ultra-conserved” words, as they’re known, has been the mission of many a linguist.
In The Origin of Language, published nearly 20 years ago, Stanford linguist Merritt Ruhlen proposed that there was a single language spoken by humankind more than 30,000 years ago before the thawing of the last Ice Age, 15,000 years ago, dispersed human populations to different parts of the world.
This proto-language had very little in common with the grammar and syntax of today’s languages. It wouldn’t sound like anything spoken today. Ruhlen attempted to reconstruct it by identifying related words across the world’s estimated 6,000 languages and showing how they have served as the backbone of all languages.
The majority of Ruhlen’s colleagues in the field rejected his theory outright. The methodology did not follow the rules of traditional linguistics. They also objected to the suggestion that there was a geographically united community of human ancestors. The debate turned heated. One scholar, who preferred to remain anonymous, said that there were “punches thrown” at a conference.
But Ruhlen had his supporters, and the controversy remained at a stalemate for a decade and a half. Until the unlikely entrance of a scholar from another field.
The majority of Ruhlen’s colleagues in the field rejected his theory outright.
For someone who crashed an insulated academic field and redefined it entirely, Mark Pagel comes off as quite unassuming. An evolutionary biologist at Reading University in the south of England, Pagel studies human evolution and paleontology through genetic sequencing.
As Pagel insisted during a recent interview, he waded into the proto-language controversy quite accidently; he was simply turning to linguistics to gain a better understanding of how genes enter and exit the genetic pool, and was curious whether words followed the same pattern of transmission and replacement.
At first glance, there might not seem to be an obvious connection between genes and words. DNA, or genetic replicators, is transmitted through generations while new words enter a language all the time with seemingly unpredictable frequency. How much of your last conversation was about your smartphone, “baking some blondies,” or the great facility of some pop stars to “twerk?” It would not seem intuitive that our everyday spoken communication could be a window unto our evolutionary past.
Yet that is exactly what Pagel sought to show. He conjectured that the longevity of a word could be quantified similarly to genetic markers. If high levels of melanin1; are necessary to a population living in a hot climate, the gene for melanin will persist. Likewise, Pagel reasoned, that if a word were indispensable, such as “hand,” it would persist through different languages with little variation to its original root.
“Languages are genes talking, getting things that they want,” he said in a 2011 TED Talk.
So Pagel decided to let the words speak for themselves.
He began with a statistical analysis of a contemporary literary corpus of four modern languages—Russian, Spanish, English, and Greek—which are representative of the four branches of the Indo-European language family, which contains 87 languages. The content included spoken language, books on history, literature, news media, and music recordings. (It should be noted that Pagel relied on a stock archive used by traditional linguists. His methodology was novel, however.)
It would not seem intuitive that our everyday spoken communication could be a window unto our evolutionary past.
Instead of comparing words one by one and analyzing their root structures in the context of that language’s syntax as Ruhlen had done, Pagel used a statistical algorithm known as computational phylogenesis to crunch the enormous amounts of data.
He used a dataset of 200 vocabulary meanings known as the Swadesh fundamental vocabulary list that encompassed the most fundamental meanings such as numbers, pronouns, and important verbs and nouns. In all 200 meanings, frequently used words had the lowest number of cognates, were replaced at a less rapid rate than infrequently used words and, therefore, Pagel conjectured, had the slowest rate of change. A word like “hand” would not have changed much since the origin of proto Indo-European language some 8,700 years ago, though it might have lost a vowel or swapped a consonant in its journey to Persian or Slavic.
By contrast, the words that were replaced frequently had multiple synonyms. Take a word for head covering like “hat.” In the 19th century, a popular synonym would have been “bonnet.” There is no etymological connection between “hat” and “bonnet” and thus we say the words were replaced. Likewise, Pagel found that a concept like “dirty” had nine replacements in the same period. For a stretch of time people might have said “foul,” then decided “messy” expressed their dissatisfaction of their surroundings best.
Pagel connected a word’s stability to its longevity in a paper published in Nature in 2007. Last April, he published an article in The Proceedings of the National Academy of Sciences (PNAS) that showed a direct relationship between a word’s frequency of use and its age. Pagel and his team presented a tentative law that calculates that the words used more than once per 1,000 words in a modern language were “seven to ten times more likely to show deep ancestry.”
Some of the words had such a long life that they reached as far back as the origin of the Indo-European language almost nine millennia ago. Pagel began to wonder if some of the words predated the start of the Indo-European language itself.
A word like “hand” would not have changed much since the origin of proto Indo-European language some 8,700 years ago, though it might have lost a vowel or swapped a consonant in its journey to Persian or Slavic.
Unlike Ruhlen, who sought to prove the existence of a proto-language by stripping the roots and comparing the composite words from the world’s language groups through sound and etymology, Pagel focused on identifying the word-meanings and started to trace them back.
“Our logic was that we actually predicted which words these would be, and then found their cognates,” [through statistical analysis] Pagel explained.
The team took the 200 word meanings and constructed tables of their forms in seven different Eurasian proto-languages, which together span nearly 4,000 contemporary languages.
What he found pointed to an amazing confluence of biology and language.
Thirteen years after the publication of Ruhlen’s The Origin of Language, Pagel used the tools of computational algorithms to show that there are 23 words that can be traced 15,000 years back to the first mass human migration across Eurasia at the thawing of the last Ice Age.
The “ultraconserved” words provide insight into the first concepts that had to be communicated verbally by human ancestors:
Pronouns: You (both familiar and formal); I; We; This; That
Questions: What; Who
Verbs: To give; To Hear; To Pull; To Spit; To Flow
Adjectives/adverbs: Not; Old; Black
Nouns: Man/Male; Mother; Hand; Fire; Bark; Ashes; Worm2
It’s a coincidence3 that the number is the same as the chromosome pairs that carry genetic material in the human cell. But the analogy is telling – Pagel explains that finding a pure strain of language is much like finding a distinctive line of DNA, linking us to our ancestors through biology.
This is why word transmission is so important to human evolution. “The fact that words can mimic genes in that sense,” Pagel says, “is absolutely amazing.”
Simon Kirby, a professor of language evolution at the University of Edinburgh, takes it a step further and says that “seeing language can be thought of as an evolutionary system in its own right,” and we can see that just like genes, words act selfishly. When a word is particularly useful it inserts itself into conversation. Its utility makes it unlikely that it will mutate.
If utility of purpose is the organizing principal by which words are inherited, Pagel says he and his team were surprised by some of the 23 ultra conserved words. Words such as “ash,” “bark,” and “worm” were among the group.
“That was really amusing and enjoyable for us—worm” says Pagel. “We started talking to anthropologists, who reminded us one of the things you have to realize is that Westerners don’t suffer from worms, but almost everybody else does.”
Evolutionary experts also noted it was probably a hot topic for hunter-gatherers sitting by the campfire concerned about their health issues. The same goes for ash and bark, other proto-words that emerged, which anthropologists believe were used not only for lighting fires, but also as medicine.
Pagel points out that the work is a best-hypothesis, not a proof. Furthering our understanding of language migration, etymology, and communication with empirical methods from evolutionary biology holds great promise for the study and reconstruction of ancient languages, and their cultures.
Finding a pure strain of language is much like finding a distinctive line of DNA, linking us to our ancestors through biology.
“I think everyone agrees that these languages should have relatives,” says Joseph Salmons, a professor of Germanic languages at Wisconsin University and a protolanguage skeptic. “There are genetic relationships between the world’s languages that we have not grasped yet. It’s not likely that these languages arose independently.”
The entry of computational and quantitative science in the study of linguistics has opened new doors to the study of origins of human evolution.
And there are academic institutions that are springing up to study evolution through language. The Language Evolution and Computation center at Edinburgh University is starting to look at language as a new gateway to understanding biological change.
Studying how words are produced neurologically allows researchers to focus on the last common ancestor between hominoids and primates, estimated to have lived about 6 million years ago. Their shared sounds and behaviors could reveal the nexus between the species.
“We use the comparative method to try to draw conclusions about our common ancestor with chimpanzees,” explained Edinburgh’s Keelin Murray.
This will lead to the next step of interesting evolutionary study such as understanding our common ancestry with primates.
“It will even help us to draw conclusions about when we became uniquely human,” she said.
The famous words show that, just like DNA is the building block of biology, there are concepts that are essential to any system of verbal communication that will never go away. Not even on Web 4.0.