Male Characters Dominate in Literature

What characters – male or female dominate the literature? No one might have analysed this till now but a new analysis shows that male characters are four times more prevalent in literature than female characters.

Mayank Kejriwal, a research lead at USC’s Information Sciences Institute (IST), accessed data through the Gutenberg Project corpus which contains English-language 3,000 books. The genre of books ranged from adventure and science fiction, to mystery and romance, and in varied mediums, including novels, short stories, and poetry.

Co-author of the study and Machine Learning Engineer at Meta Akarsh Nagaraj, M.S, helped uncover the 4:1 male-female literary imbalance.

“Gender bias is very real, and when we see females four times less in literature, it has a subliminal impact on people consuming the culture. We quantitatively revealed in an indirect way in which bias persists in culture,” he added.


The study outlines several methods for defining female prevalence in literature. They used Named Entity Recognition (NER), which is a prominent NIP method for extracting gender-specific characters. Keriwal said that one of the ways was to look at how many female pronouns are in a book compared to male pronouns. Another technique is to quantify how many female characters are the main characters in it. This helped in determining whether the male characters were central to the story.

The researchers also found that the discrepancy between male and female characters decreases under female authorship it clearly showed us that women in those times would represent themselves much more than a male writer would,” said Nagaraj


Kejriwal acknowledged that AI tools for identifying plural words, such as “they,” which may be referring to a non-dichotomous individual, do not yet exist. Still, the study’s findings build the framework for approaching such social issues and building the technologies that can address these deficits.

The researchers hope that the study provides a blueprint for future work on quantifying the qualitative findings they discovered through the study’s methodologies. Without the inherent bias from human-designed surveys, the NLP technology also enabled them to find adjective associations with gender specific characters, deepening their understanding of bias and its pervasiveness in society.

“Even with misattributions, the words associated with women were adjectives like “weak,” “amiable, “pretty,’ and sometimes ‘stupid,” said Nagaraj. “For male characters, the words describing them included ‘leadership,’ ‘power,’ ‘strength’ and ‘politics.

While the team did not ultimately quantify this facet of their study, this difference in qualitative descriptions between gender-specific characters provides future scope for more comprehensive qualitative investigation on word associations with gender. “Our study shows us that the real world is complex but there are benefits to all different groups in our society participating in the cultural discourse,” said Kejriwal.

“When we do that, there tends to be a more realistic view of society,” he said.

Kejriwal is hopeful that the study will serve to highlight the importance of interdisciplinary research — that is, using AI technology to highlight pressing social issues and inequalities that can be addressed. Stakeholders with specialized backgrounds, including computer scientists, can offer tools to process data and answer questions, and policymakers can use this data to enact change.


Please enter your comment!
Please enter your name here