California mom Deborah Del Mastro recently got a call from a man who claimed her 37-year-old daughter, Sarah, had been kidnapped by a Mexican drug cartel. The man then played a short recording of Sarah’s panicked voice. Del Mastro wired the man $5,400. She only realized it was a scam when Sarah didn’t show up at the appointed pick-up spot. She called her daughter directly, and Sarah answered the phone. She was at work.
Voice-cloning scams like these are becoming more and more common. AI can duplicate a voice with just seconds of audio, which a scammer can easily collect from a short spam call.
But there’s another more insidious way that AI voice-cloning technology could be used to manipulate you: A team of researchers recently found that voices that are similar to our own sound more persuasive. They published their results in the Journal of Marketing Research.
If a company wanted to personalize its marketing to individual consumers, creating an extra tug to buy, it could theoretically repurpose samples of a person’s voice in advertisements. So many of us already use voice-command and voice-search apps, and there aren’t any regulations preventing the companies from saving samples of your unique vocal fingerprint and repurposing them to goose sales.
I spoke with study co-author Kimberly Hyun, an assistant professor of marketing at University of Cincinnati’s Carl H. Lindner College of Business, about why we might be easily persuaded by voices that are similar to our own, what gender has to do with it, and when a voice clone might begin to sound creepy rather than credible.
Read more: “What Searchable Speech Will Do To You”
Why is it so easy to reproduce someone’s voice using AI?
The answer is simple. The technology is already here. To create the voices of Alexa or Siri, any kind of AI that we’re talking to right now, we use the human voice. Over time, companies and researchers have collected thousands and millions of samples of human voices. Voice-identification technology has also improved over time. We’re using it daily. In the study, we cite estimates that 20 percent of the population uses voice search. But the percentage has probably increased since we ran those numbers because Gemini and ChatGPT and all of these search engines now offer voice activation.
Are these companies and apps storing recordings of individual consumer voices?
It’s possible. I believe they’re using some voice data to better serve customers, on the surface level, because the voice gives away a lot of different information—your intent, your personality, and your emotional state, depending on how you speak. But as of right now, we don’t really know what they’re doing with our voices.
You looked at not just how easily we’re persuaded by voices that are similar to our own but also how persuaded we are by vocal averages. Can you tell me why you looked at averages?
Our first study involved the show Shark Tank. We mapped and calculated the vocal similarities between the sharks and the entrepreneurs. In a second study, we turned to the online crowdsourcing platform Kickstarter. But we were unable to collect the voices of potential investors. We only had access to the voices of the entrepreneurs making pitches, so we calculated how close these were to the average human voice, with the idea that the average voice might sound more familiar than the voices at the edges of the human range.
I’m wondering what the average human voice might sound like. Is there a celebrity or political figure out there who has an average voice?
Celebrities and political figures are a slightly different case. We’re saying that essentially what persuades people is familiarity—a voice that sounds most like me, and activates the brain. We think, “Oh, familiar voice, it’s actually more safe.” We will trust recommendations or tend to comply more when the voice sounds like our own. But voice similarity matters most when we don’t have any other external cues or ways of evaluating what they’re telling us, which isn’t the case with a celebrity. But with celebrities’ voices, we can often immediately identify them.
Read more: “Deepfake Luke Skywalker Should Scare Us”
I thought most people don’t like the sound of their own voice. Do you have any hypotheses about why voices that are similar to our own are most persuasive?
True. That’s why we actually compute voice similarity in two different ways and test them out—the objective similarity and the subjective similarity. Subjective similarity is more how we perceive voices to be similar, and objective similarity is computed by machine learning. But it’s an interesting story. When we record our own voice and then listen to it, it’s a little cringey, a little embarrassing to listen to, but it seems like, at least on the subtle cue level, it’s actually influencing your decisions.
We have a couple of hypotheses: One is that it’s an evolutionary mechanism. People with similar voices might signal belonging to a safe or in-group. Two is cognitive balance theory, a theory in social psychology developed in the 1940s and ’50s that explains why we sometimes shift our views to align with people we like.
What influence does gender have on how persuasive a voice is?
A lot of literature says male voices are more persuasive in a number of contexts. That’s why we controlled for gender in our studies. We wanted to show the effect of timbre similarity independent of gender. Even after controlling for gender and pitch—which is strongly correlated with gender—timbre similarity still matters. Still, gender might matter for certain product categories. Some products are considered more masculine versus feminine, so matching gender there could be more persuasive.
You mention in the paper the possibility that voices that are too similar to our own might stop feeling safe and would begin to feel creepy instead.
I think that goes back to your question about how we felt when we hear our own voices: It becomes cringey, even creepy or embarrassing. So far, in our data set, across all of our experiments, the AI voices don’t sound exactly similar to the participant’s voice, to the consumer’s voice, so we find that they’re more persuasive. But as the technology advances, if consumers learn that companies are intentionally trying to convince them with familiar voices, we might see that uncanny valley effect. ![]()
Enjoying Nautilus? Subscribe to our free newsletter.
Lead image: ArtemisDiana / Adobe Stock






