ADVERTISEMENT
Nautilus Members enjoy an ad-free experience. or Join now .

How Talking Machines Got Their Voices

From the cockpit to consumer tech, synthetic voices shed light on gender biases

Article Lead Image
Sign up for the free Nautilus newsletter:
science and culture for people who love beautiful writing.
NL – Article speedbump
Explore

Imagine yourself in the cockpit of a fighter jet, practicing maneuvers over the desert of the American Southwest. Suddenly your altimeter reading is falling, and you must act quickly. The complex panel of instruments in front of you should be second nature to use, but in the moment of crisis, the panels blur together, and your muscle memory must take over. You begin to make adjustments to solve the problem while simultaneously considering the worst-case scenario. A voice interrupts you, firm but calm, in a soothing alto that reminds you of your mother: “Pull up … Pull up … Pull up,” it repeats, and you do what the voice commands, avoiding disaster.

In the 1970s, as McDonnell Douglas was developing the F-15 Eagle fighter jet, testing revealed to engineers that pilots’ reactions to warning lights were too slow, especially as the cockpit display increased in complexity. In addition, the development of “heads up” display technology meant that pilots increasingly received information about their aircraft within their field of vision instead of having to look down at a panel of meters and lights. Engineers were concerned that a cacophony of warning bells and buzzers would just add confusion to the mix.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

Testing by the U.S. Air Force had shown that a verbal warning system would be more effective—that a human voice breaking into the cockpit would convey a sense of urgency, as well as offer clear and unambiguous directions at the point of need. Systems using recorded warnings had already been installed in some aircraft in the 1960s, but voice synthesis promised to make voice warning systems lighter and more reliable.

Over the last several years, there has been a shift toward often younger, male-sounding synthesized voices for many applications.

Engineers purportedly chose a female voice for the warnings because they believed it would stand out to male fighter pilots. A young actress was recruited to record a series of words that were integrated into the warning system of the F-15. That actress, Kim Crow, recalls that after one of the test flights, the pilot was asked how everything worked; he said, “It was wonderful, except for that Bitching Betty.” The name stuck.

ADVERTISEMENT
Nautilus Members enjoy an ad-free experience. Log in or Join now .

According to “Green’s Dictionary of Slang,” a “Betty,” meaning an attractive woman, came into use with reference to long-suffering Stone Age housewife Betty Rubble from the cartoon The Flintstones. In the days of recorded warning systems, the B-58 Hustler flight crews referred to that aircraft’s warning system as “Sexy Sally.” There were also systems that used male voices, the nickname for which was “Barking Bob.” Although “Bitching Betty” seems derogatory, some pilots have said that they use it as a term of endearment; the voice warnings can save their lives, after all.

Until the 1980s, consumer-grade synthesized voices were in a pitch range that most listeners associated with a male gender. These voices didn’t come close to approximating the prosody or timbre of human voices, but they could produce recognizable language and were often identified with the personal pronoun “he.” Early attempts at synthesizing female-sounding voices consisted of scaling the formants—the peak frequencies that define vowel sounds—of the “male” voice, but this did not succeed in “[turning the male voice] into a convincing female speaker,” as MIT research scientist Dennis Klatt noted.

Meanwhile, recordings of female voices providing information and instructions in urban environments—public transportation and security announcements, vending and automatic checkout and teller machines—became increasingly common and were chosen to forge what one scholar called a “soft coercion.” These are voices that tell you where to go, what to do, and how to behave in order to move in an orderly way through the urban environment, and they are meant to maintain calm efficiency, not unlike Bitching Betty.

In November 1983, The New York Times published an editorial by sociologist Steven Leveen under the title “Technosexism.” Leveen noticed that there were “millions of mechanical objects” now speaking “through the new technology of speech synthesis,” including computers, clocks, elevators, automobiles, vending machines, and even bathroom scales. He was concerned that they were perpetuating cultural stereotypes by “associating females with low-level service jobs, while associating males with tasks that are broader in range and higher in status.”

Leveen had done a little bit of research before writing his editorial. He was aware that synthesizing a higher-pitched voice was actually more “expensive,” that it required more data be stored “on a microchip,” and he was aware that product developers were willing to absorb that cost because of market research. Most of the “market research” in Leveen’s examples amounted to assumptions about gender roles gathered through interviewing mostly professional men. A video game developer: “Have you ever been to a baseball game with a female announcer?” An executive from National Semiconductor: “the [supermarket scanner] systems use exclusively female voices because the male voice … sounded ‘just a little bit strange.’” Coca-Cola vending distributors (mostly male): “felt the male voice was not as pleasing.” And Chrysler, which incorporated a “male” voice into its 1983 cars because testers had stated that when a female voice told them their car’s oil pressure was low, it “hit [them] the wrong way.”

Although “Bitching Betty” seems derogatory, some pilots have said that they use it as a term of endearment.

ADVERTISEMENT
Nautilus Members enjoy an ad-free experience. Log in or Join now .

Leveen concluded that “it’s not a coincidence that males are usually the ones purchasing the systems, and that they find female voices more desirable,” although this preference was domain specific. His concern was that the gendered voice distribution between low-status and higher-status applications would “subtly influence our children’s beliefs about which activities and careers are open to them.” Leveen’s concerns about “technosexism” in the 1980s are often echoed in today’s critiques of female-sounding voice assistant applications like Siri, Alexa, and Cortana, all originally defaulted to female in the United States. While his argument didn’t gain much traction at the time, it foreshadowed ongoing debates about gender and technology.

But over the last several years, there has been a shift toward often younger, male-sounding synthesized voices for many applications, including domestic and customer service assistants: In 2015, the United Kingdom grocery chain Tesco changed the voice of all its self-checkout machines from female to male; IBM’s Watson modeled the vocal quality of the typical Jeopardy! winner—an educated white man in his mid-20s to 40s, and then became a Jeopardy! champion itself; Jibo, a social robot for the home, was supposed to be another member of the family, and developers chose a friendly and enthusiastic young adult male voice for it modeled on Michael J. Fox’s performance of Marty McFly in the Back to the Future films; and Apple offers several voices for Siri, including male- and female-sounding voices with subtle characteristics of African-American Vernacular English, and no longer defaults to the original female unless the user chooses it. According to the Guinness Book of World Records, the most downloaded sat nav voice before Google Maps became widely used for personal navigation was the animated oaf Homer Simpson, as voiced by Dan Castellaneta.

Despite this shift, giving a system a voice—whether a stereotypical “smart wife” or the dulcet tones of Morgan Freeman—reinforces the illusion that corporate informational interactions are personal, and personal interactions are purely informational. Put another way, changing the sound of Siri’s voice (something that is easy to do) doesn’t change the fact that Siri is the “voice” of a U.S.-based technology corporation that manifests a great deal of power by controlling the information collected and provided through Siri. Tech corporations prioritize using our biases for their benefit, while dismissing the reinforcement of stereotypes as a cultural problem rather than a technological one.

Of course, the cultural problem can be a technological problem. We learn to value the humanity of people that we perceive as different from ourselves through experience. As synthesized voices become common, replacing with networked technologies what might have previously been interactions with other people, we lose exposure to the vocal diversity and expressiveness of other human beings and risk losing some of our capacity to truly understand one another. The temptation to simulate human expressiveness through technology only deepens this disconnect, opening the door to manipulation and deceit rather than fostering meaningful connection.

This article was adapted with permission from an excerpt of Vox ex Machina: A Cultural History of Talking Machines published by MIT Press Reader.

Lead image: RoseRodionova / Shutterstock

ADVERTISEMENT
Nautilus Members enjoy an ad-free experience. Log in or Join now .
close-icon Enjoy unlimited Nautilus articles, ad-free, for less than $5/month. Join now

! There is not an active subscription associated with that email address.

Subscribe to continue reading.

You’ve read your 2 free articles this month. Access unlimited ad-free stories, including this one, by becoming a Nautilus member.

! There is not an active subscription associated with that email address.

This is your last free article.

Don’t limit your curiosity. Access unlimited ad-free stories like this one, and support independent journalism, by becoming a Nautilus member.