Although gender is a complex, dynamic, cultural construction, most cultures’ binary treatment of gender (“male” and “female”) makes conceptualizing some computational assessments of gender and sexual difference relatively simple.

How do texts by male and female authors compare with one another? How have their differences and similarities changed over time and among locations? How do representations of female characters compare with those of male characters? Do those character representations differ with the gender of the author?

Answers to these questions help to illuminate new aspects of the historical construction of gender and literature’s role in gender’s performance. Some caution is required in comparisons of male and female authors and characters to account for the role of genre, which is itself often gendered (women write more romantic fiction, for example), but this entanglement of gender and genre may also be an important indicator of the kinds of markets available to men and women. For further discussion, see the problems pointed out by Mandell (2019) related to the binary categorization and instability of the notions of ‘writing like a woman’ or ‘writing like a man’.

Although gender-based comparisons may seem to risk reinforcing gender binaries, and other gender stereotyping, analyses of gender’s textual effects may reveal historical inequities and areas of gender convergence, and expose the cultural mechanisms behind apparently “natural” differences. In fact, computational analyses have revealed problematic tendencies in literature concerning gender. In his article, Chenga (2020) points out that descriptions of the human body become more prominent in fiction over time and that these descriptions increasingly bifurcated along a gender binary until very recently. Moreover, Underwood, bammam and Lee (201) report the paradoxical finding that while character gender roles became more flexible over time, there was a sharp decline in space allocated to women in literature, both in terms of authors and of literary characters.



To start examining the notions of ‘writing like a woman’ or ‘writing like a man’, compare features such as topics, overrepresented words, or named locations in bodies of texts written by women with those written by men, and observe changes and continuities over time. Compare treatment of female and male characters, such as the descriptions that accrue to them, the actions they perform, or the fraction of textual space given over to them. This can be done for instance with Voyant Tools, using facets such as ‘Topics’ or ‘Collocates’.

Gender can also be analysed from the reader’s point of view. You could compare reception history of male and female authors via reviews or records of lending libraries. Including other gender categories beyond male and female authors and characters, such as gender-unspecified authors (anonymous or using first initials), women publishing under male pseudonyms, men writing female narrators, and (especially in contemporary literature) authors, narrators, and characters who identify as trans and gender queer might provide insights into changes in the perception of gender.


Building on the notion of character space presented in the chapter “The Narrator”, you can measure how much space in a book is allocated to male and female characters. BookNLP might not perform perfectly distinguishing characters, but it yields accurate results for the overall distribution of female and male character space. All code from Underwood et al. (2019) is available on GitHub to help you start your analysis. Using the Brown or Gutenberg corpus, explore how character space distribution varies over time. Adding information about genre to the model, do you observe different patterns? Go further exploring what kind of words are used to describe female and male characters according to their respective character spaces.

There are useful tools for both R and Python to analyze gender in literature. In R, the 'gender' package allows for predicting the gender of a name, and the same in Python is possible with for instance, this pipeline. Note, though, that gender prediction tools aren't equally useful across varied contexts because they are generally trained on names associated with particular regions and cultures and omit names that aren't recognized. (See Keyes (2017) and Larson (2017). Using metadata, (author gender, publication year, genre), investigate the distribution of male and female authors over time in general or in a specific genre. For an analysis in R, you can use the HathiTrust genre corpora, as done in this tutorial. The Brown corpus in Python’s NLTK package is already annotated by genre. Alternatively, use topic modeling to detect thematic differences in texts by male and female authors. Do the themes correlate with your findings about genre distribution of male and female authors?


Scripts and sites


Ethnicity »