Why Did a Major Paper Ignore Evidence About Gender Stereotypes?

Some scientists may be motivated to support compelling narratives—social psychology has a long and checkered history that includes cherry-picking results, studies, and publications in order to advance them.Photograph by Everett Collection / Shutterstock

Let’s start with a quiz.

  1. Who was more likely to vote for Donald Trump in 2016, men or women?
  2. Who is more likely to commit a murder, men or women?
  3. Who receives higher grades in high school, boys or girls?
  4. Who is more likely to be labeled as having some sort of behavior problem in elementary school, boys or girls?

The answers are, respectively: men, men, girls, boys. Is it that surprising? If you got at least one right, without resorting to flipping a mental coin, you have just demonstrated to yourself that not all beliefs (stereotypes) about males and females are wrong. If you got three or four right, you should be convinced that your gender stereotypes are not inaccurate. You’re not alone: Lots of other people may—many actually do—hold fairly accurate gender stereotypes.

As a social psychologist, I study (among other things) the accuracy of gender stereotypes—the beliefs men and women, boys and girls, have about themselves as groups. Perhaps surprisingly, the accuracy of these beliefs is one of the largest and most replicable findings in social psychology, an exception to what’s sometimes referred to as “the replication crisis”—the suggestion that, until recently, much of social psychology has been falling far short of its scientific ideals.

A whole panoply of flaws lead invalid and unjustified claims to become part of the social psychology canon.

Stereotype accuracy is correspondence between people’s beliefs about groups and what those groups are actually like, as indicated by data one can usually find in Census or other government reports, and, sometimes, in meta-analyses of sex differences. Nearly all of the stereotype accuracy correlations exceed .50 (which can be interpreted as people being right 75 percent of the time), and many are over .80 (right 90 percent of the time). For example, women are, on average, better at reading nonverbal cues than men are, and people are pretty good at recognizing that. The average effect-size in social psychology corresponds to a correlation of about .20.

Because people share accurate gender stereotypes, I was mystified when the Annual Review of Psychology published a major review, “Gender Stereotypes,” in January declaring how inaccurate gender stereotypes are. The Annual Review of Psychology is widely considered to be one of the most influential and high-impact repositories of the highest quality reviews in the field. “Annual Review of Psychology articles offer expert, integrative reviews that go beyond top-of-the-head sound bites or clickbait, instead examining a topic’s nuances and pros and cons, the weight of the evidence, and gaps in our knowledge to date,” the editors state on the journal’s website.

In this particular case, however, the claim to only publish work that evaluates nuances, pros and cons, and the weight of the evidence, fell short. The paper stirs questions about whether social psychology is living up to the standards of scientific conduct so clearly articulated in the editors’ statement. Scientific conclusions need to be based on a careful evaluation of the full set of evidence; authors should not have the option of simply making up conclusions that conflict with an abundance of evidence.

And yet that’s what happened in “Gender Stereotypes.” The paper’s author, Naomi Ellemers, a social psychologist at Utrecht University, stuck to studies of gender stereotypes with effects in the range of around .20, offering no explanation as to why the more powerful studies—16 in all, from 11 papers—demonstrating gender stereotype accuracy were excluded. Despite citing over 100 articles, the Annual Review of Psychology paper did not cite a single one of these papers. Nor did she cite any of the several reviews of the evidence on gender stereotype accuracy (listed at the end of this essay).

To be sure, gender stereotypes are not perfectly accurate and there is a large literature showing that they sometimes lead to biases. For example, one famous study found that faculty in STEM evaluated a male applicant for a lab manager position more positively than an identical female applicant; and another found that faculty in STEM were much more likely to hire a female applicant to a faculty position than an identical male applicant. Studies of bias warrant being taken seriously, even though bias effects tend to be quite small. Nonetheless, what justification can there be for ignoring empirical studies of gender stereotype accuracy, which is typically much stronger than bias?

The paper stirs questions about whether social psychology is living up to the standards of scientific conduct.

One answer is, in social psychology, this situation is not surprising. A whole panoply of flaws lead invalid and unjustified claims to become part of the social psychology canon—claims to “truth” that become widely accepted. The canon once claimed that fear of confirming cultural stereotypes powerfully undermined the achievement of African-Americans and women; it doesn’t (see also the Nautilus post “How Stereotypes Slow Athletes Down”). The canon once claimed that self-fulfilling prophecies (people’s expectations for others leading to social processes that cause those others to confirm those expectations) were powerful and pervasive; they aren’t. The canon once claimed that changing implicit biases could be a powerful way to fight discrimination. It isn’t.

What’s more, questionable interpretive practices in social psychology, and not just suboptimal methodology or statistics, often lead to unjustified conclusions in literature reviews, even if the underlying empirical research is valid and replicable. You can file this Annual Review of Psychology paper in the latter category. (A draft of this essay was sent to Dr. Ellemers and the editors of the Annual Review of Psychology inviting them to provide feedback regarding anything that might be inaccurate or misrepresented. I have received no feedback except for the corrected spelling of a name from one of the journal’s editors.)

The review starts off reasonably enough. “There are many differences between men and women,” Ellemers writes in the abstract. “To some extent, these are captured in the stereotypical images of these groups.” This would seem to acknowledge at least a moderate degree of accuracy. But then she continues: “Stereotypes about the way men and women think and behave are widely shared, suggesting a kernel of truth.” The “kernel of truth” phrasing has a long history in social psychology. From my 2012 book, Social Perception and Social Reality:

Variations on the idea that there might be some truth to stereotypes became known as the “earned reputation” theory and the “kernel of truth” hypothesis both of which emphasized that, although stereotypes were largely inaccurate exaggerations, they did contain “a kernel of truth”…I do not know whether those promoting this idea thought about it in the following manner, but it always brought to my mind an image of a single kernel of decent corn (the “kernel of truth”) in an otherwise entirely rotten cob (the rest of the stereotype exaggerating and distorting that truth). Still, one kernel is better than none.

The exaggeration hypothesis has long and deep roots within social psychology. It long was the only perspective that permitted researchers to acknowledge that people were not always completely out of touch with social reality, while simultaneously allowing researchers to position themselves well within the longstanding traditions emphasizing stereotype error and bias.

Ellemers continues, later in the paper: Gender stereotyping “typically leads people to overemphasize differences between groups and underestimate variations within groups.” This is manifestly disconfirmed by existing studies, which provided more much evidence that people underestimate than that they exaggerate gender differences (that is, there is more current evidence that people think gender differences are smaller than they really are than that they are bigger than they really are). And then she concludes, “If there is a kernel of truth underlying gender stereotypes, it is a tiny kernel, and does not account for the far-reaching inferences we often make about essential differences between men and women.” If Ellemers had included the missing studies relevant to her review, it would have been very difficult to reach that conclusion.

There is something more serious at stake than whether Ellemers’ claims are right or wrong. Scientists cannot be in the business of ignoring evidence. This type of problem threatens the credibility of social psychology at least as much as unreplicable findings, faulty statistics, and suboptimal research methods. It also risks undermining public support for the social sciences more broadly: Why should the public continue to help fund social sciences if it cannot be reasonably assured that scientists’ conclusions will be responsive to their own data?

The problem goes beyond Ellemers. Assuming at least two reviewers (which would be minimal vetting for such an important outlet as the Annual Review of Psychology) and one editor, that is at least three other scientists complicit in ignoring the literature on gender stereotype accuracy. How is this possible? Some scientists may be motivated to support compelling narratives—social psychology has a long and checkered history that includes cherry-picking results, studies, and publications in order to advance them.

There is an alternative. Social psychology could live up to the sort of goals the Annual Review of Psychology has adopted: presenting nuanced perspectives, pros and cons, and actually evaluating, rather than ignoring, the evidence that bears on its conclusions.

Lee Jussim is a social psychologist at Rutgers University who has written on social perception, accuracy, self-fulfilling prophecies and stereotypes, and prejudice for more than 30 years.

Reviews of Stereotype Accuracy (Including Gender Stereotypes)

Jussim, L., Crawford, J.T., Anglin, S. M., Chambers, J., Stevens, S. T., & Cohen, F. (2016).  Stereotype accuracy: One of the largest relationships and most replicable effects in all of social psychology. In T. Nelson (ed.), Handbook of prejudice, stereotyping, and discrimination (2nd ed), pp. 31-63.  Hillsdale, NJ: Erlbaum.

Jussim, L., Crawford, J.T., & Rubinstein, R. S. (2015). Stereotype (in)accuracy in perceptions of groups and individuals. Current Directions in Psychological Science, 24, 490-497.

Jussim, L., Cain, T., Crawford, J., Harber, K., & Cohen, F. (2009). The unbearable accuracy of stereotypes. Pp. 199-227 in T. Nelson (ed.), Handbook of prejudice, stereotyping, and discrimination. (Hillsdale, NJ: Erlbaum).

The 11 Articles Reporting 16 Studies Assessing Gender Stereotype Accuracy Not Included in Ellemers’ Annual Review of Psychology Chapter on Gender Stereotypes

Allen, B. P. (1995). Gender stereotypes are not accurate: a replication of Martin (1987) using diagnostic vs. self-report and behavioral criteria. Sex Roles, 32, 583-600. (note: despite the title, the article found a correlation of .61 between sex stereotypes and criteria after removing a single outlier—see Jussim et al, 2016, referenced above).

Beyer, S. (1999). The accuracy of academic gender stereotypes. Sex Roles, 40, 787-813.

Briton, N. J., & Hall, J. A. (1995). Beliefs about female and male nonverbal communication. Sex Roles, 32, 79-90.

Cejka, M. A., & Eagly, A. H. (1999). Gender-stereotypic images of occupations correspond to the sex segregation of employment. Personality and Social Psychology Bulletin, 25, 413-423.

Hall, J. A., & Carter, J. D. (1999). Gender-stereotype accuracy as an individual difference. Journal of Personality and Social Psychology, 77, 350-359.

Halpern, D. F., Straight, C. A., & Stephenson, C. L. (2011). Beliefs about cognitive gender differences: Accurate for direction, underestimated for size. Sex Roles, 64, 336-347.

Lockenhoff, C. E., Chan, W., McCrae, R. R., De Fruyt, F., Jussim, L., De Bolle, M., … & Pramila, V. S. (2014). Gender stereotypes of personality: Universal and accurate? Journal of Cross-Cultural Psychology, 45, 675-694.

Martin, C. L. (1987). A ratio measure of sex stereotyping. Journal of Personality and Social Psychology, 52, 489-499.

McCauley, C., & Thangavelu, K. (1991). Individual differences in sex stereotyping of occupations and personality traits. Social Psychology Quarterly, 54, 267-279.

McCauley, C., Thangavelu, K., & Rozin, P. (1988). Sex stereotyping of occupations in relation to television representations and census facts. Basic and Applied Social Psychology, 9, 197-212.

Swim, J. K. (1994). Perceived versus meta-analytic effect sizes: An assessment of the accuracy of gender stereotypes. Journal of Personality and Social Psychology, 66, 21-36.