Our Aversion to A/B Testing on Humans Is Dangerous

Research suggests that people have an irrational aversion to A/B tests, which could limit the extent to which important institutions like hospitals, legislatures, and corporations base their decisions on objective evidence.Photograph by Fernando Cortes / Shutterstock

Facebook once teamed up with scientists at Cornell to conduct a now-infamous experiment on emotional contagion. Researchers randomly assigned 700,000 users to see on their News Feeds, for one week, a slight uptick in either positive or negative language or no change at all, to determine whether exposure to certain emotions could, in turn, cause a user to express certain emotions. The answer, as revealed in a 2014 paper, was yes: The emotions we see expressed online can change the emotions that we express, albeit slightly. Conversations about emotional contagion were quickly shelved, however, as the public disclosure of the study sparked an intense backlash against what many perceived to be an unjust and underhanded manipulation of people’s feelings. Facebook would later apologize for fiddling with users’ emotions and pledge to revise its internal review practices.

Yet there’s reason to think the backlash wasn’t entirely reasonable. The magnitude of the researchers’ manipulation was small; what users saw on their altered feeds wasn’t much different, if at all, from what they would normally see in a given week. What’s more, Facebook had universally implemented a radical change in the way people express themselves, with unknown effects on mental well-being, in merely launching their platform, said Michelle Meyer, an assistant professor of bioethics at Geisinger Health System. “But nobody was saying that they should be federally investigated, no one was suing them over this, no one was saying they were monsters or maybe had driven people to kill themselves. There just wasn’t that sort of outsized reaction,” she told me. “But as soon as you randomize people to see essentially the same thing that they were going to see anyway, given some reasonable period of time, people freaked out.”

Facebook’s emotional contagion study is just one of several examples of businesses trying to better understand their products and services via randomized experiments, or “A/B tests”—that is, experiments in which subjects are randomly assigned to receive one of two different treatments and then compared on some outcome measure. Like the emotional contagion study, many of these experiments caused public outcry for reasons that may seem intuitive and obvious, such as the lack of informed consent, or the idea that it’s wrong to knowingly treat people unequally. But a recent series of studies from Meyer and her colleagues casts doubt on these explanations. Their research suggests that people have an irrational aversion to A/B tests, which could limit the extent to which important institutions like hospitals, legislatures, and corporations base their decisions on objective evidence.

Across 16 studies, Meyer and her colleagues randomly assigned 5,873 participants to read vignettes of a leader deciding how best to achieve a goal. The leader could land on one of three possible decisions: to universally implement one policy, to universally implement another policy, or to conduct an A/B test to compare the two policies before choosing one. Participants were randomly assigned to see one of the three possible decisions and rate its appropriateness. The vignettes spanned a wide range of topics, from the regulation of self-driving cars to poverty reduction to increasing enrollment in employee retirement plans. And, importantly, for each topic, the researchers were careful to select two untested policies that were roughly equal in appropriateness, so that neither of the two choices would be obviously better than the other. One vignette, for example, described a hospital director’s effort to reduce infection rates by reminding doctors of standard safety protocols. The director’s options were to A) print the protocols on the back of doctors’ badges, B) print the protocols on posters to be placed in rooms where the doctors worked, or A/B) randomly assign patients to be treated by a doctor who either wore the badge or worked in a room with a poster, and then compare rates of infection between the two groups.

Meyer and her team found that in nearly every situation they tested, the decision to conduct an A/B test was deemed least appropriate by a considerable margin. In other words, people preferred those in power to universally implement untested policies at their discretion instead of testing them first. This so-called “A/B effect” held up even in participants with a STEM education or with high levels of scientific literacy.

When asked to justify their ratings, many participants who were shown the A/B tests expressed concern that the decision-makers in the vignettes never asked people for consent before conducting tests on them. Yet the issue of consent was raised by fewer than 1 percent of the participants who were assigned a vignette where the leader universally implemented an untested policy. This inconsistent standard for consent puzzled Meyer and her colleagues because, as they wrote in their article, “in all cases, people were subjected without their consent to one of the same two untested policies with unknown effects.” Why were the A/B tests singled out?

One possible cause of the A/B effect, according to the researchers, is the “proxy illusion of knowledge,” or the belief that other people know more than they actually do. Participants may have found it unsettling to see people in power admit a need to conduct tests—admit, in other words, that they don’t know enough. “It makes you feel a little safer, a little more comfortable in your day-to-day, to imagine the director of the hospital where you’re being treated is omniscient, or knows what’s going to work, or has looked at your case specifically and has already chosen what the best course of action is,” said Patrick Heck, a postdoctoral research fellow at Geisinger Health System and a co-author of the paper. “When in fact that’s quite rarely the case.”

It may also be that A/B tests are off-putting because of the cultural baggage associated with science and experimentation. “From the Nazis to Tuskegee to human vivisection to Frankenstein—most of which, by the way, were not actual scientific experiments,” said Meyer, “there’s a colloquial use of the word ‘experiment’ and also ‘random’ that has very negative connotations.”

Whatever the cause, there appears to be a persistent distaste for A/B tests across a wide swath of the population. Yet abjuring randomized experiments, Meyer said, can put more power in the hands of a few. “I want to live in a world where practices and policies and treatments are as evidence-based as possible, and not based on the intuitions of people who happen to become CEO of a company or happen to become head of a hospital,” Meyer said. “There are many attributes that lead people into those positions of power, but magically knowing in advance what does and doesn’t work is not one of them.”

Meyer, Heck, and the rest of their team are currently focusing their efforts on finding ways to make A/B tests more palatable. One potential strategy is to help people become more familiar and thus more comfortable with the process and purpose of randomized experiments. Also, in view of the negative connotations to words like “experiment” and “random,” it may help to frame A/B tests in different terms, like “trial.”

The role of facts and analysis in public life appears to be shrinking, so such efforts to broaden the appeal of randomized experiments have arguably never been more important. “Policy and government and healthcare and all these really high-touch systems and organizations that exist to improve people’s lives are fundamentally connected to science—to social science and especially to the basic sciences,” said Heck. “And of course the workhorse for the basic sciences is randomized evaluation.”

With that in mind, it may be wise for us all to shake off the heebie-jeebies and let ourselves become guinea pigs once in a while.

Scott Koenig is a doctoral student in neuroscience at CUNY, where he studies morality, emotion, and psychopathy. Follow him on Twitter @scotttkoenig.

