Author

Introduction

In his classic article, “What is an author?”, Michel Foucault revises older ideas of what would disqualify a text from counting as part of an authorship. At one point, Foucault draws on Saint Jerome's 4th-century considerations of the homonymy of quality, ideology, style and historical presence, and how each of these factors would preclude a work from being ascribed to an author. He then goes on to say that the modern view is more concerned with transitions and modifications, and thus is a more dynamic vision of the author.

Both visions of the author—self-similar and in flux—are relevant to a computational approach to authorship. There are still philological riddles to be solved, where attributing authorship may be helped by a stylometric analysis of a work. For example, Katie Muth has done this in her study of Boeing newsletters, some of which are known to have been written by Thomas Pynchon, but not which ones. Muth was able to establish with relative certainty which were written by Pynchon, and also how they differed distinctly in their approach to communication. Developments in authorship are also highly interesting, and may show breaks with previous topics and styles that may have been observed in general readings, but are not underpinned by a more rigorous explanation of what has changed.

Finally, there is the author’s figure, beyond the works themselves. From Pynchon’s absence to media-savvy novelist (e.g. Tom Wolfe), the public images of authors and their fame and involvement in non-literary genres requires gathering and processing information that benefits from computational approaches.

Applications

Elementary

You can use Google Trends and Google Books Ngram Viewer to track the interest in an author over time or even to compare several authors. Google Trends is a free tool that allows you to visualise and investigate trends in people’s search behavior over time. Google Books Ngram Viewer works in a similar way, but instead of providing data reflecting searches made on Google, the Ngram Viewer provides a graph showing how words (e.g. the name of an author) have occured in a corpus of books (such as “English Fiction”, “Latin”, “British English”) over a selected period of time. Google has made an informative page with ideas and descriptions of how to use Google Books Ngram Viewer. 

Another approach can be to make an image search on e.g. Google to find publicly available images related to an author. Considering how they are staged and in which contexts they appear might also provide useful information about the figure of an author, thus going beyond the works themselves. Moreover, the way authors are grouped together is telling about how they are perceived. Have a look at the Literature-Map that is based on Gnooks, Gnod's literature recommendation system, and the output shows a dynamic network of authors often read together.

Advanced

The computational analysis of writing style, i.e. stylometry, expands the scope of literary studies from ‘close reading’ to so-called ‘distant reading’. Instead of closely studying certain hand-picked works, stylometry allows for exploring large text collections and for finding stylistic patterns and relationships hidden from the human reader. Documents, literary texts in this case, can be reduced to numeric representations of their features, i.e., the words used in the document or syntactic features such as phrase length. These numeric representations can then be compared to each other based on their values and clustered together. 

This offers exciting opportunities for authorship studies. By looking at the aspects of linguistic style, such as word/sentence length, punctuation and word frequencies, it is possible to explore the “linguistic signature” of an author. Authorship attribution is important for both forensic and historical reasons - it helps to prevent plagiarism and to find evidence for authorship of anonymous texts from the past. See, for example, how stylometrics contributed to showing that J. K. Rowling was behind the pseudonym Robert Galbraith (Juola, 2013), and to investigate the pen name Elena Ferrante of the Italian bestselling author (Tuzzi & Cortelazzo, 2018). 

To read more about stylometry, the Computational Stylistic Group offers good resources on their website. The group has developed Stylo, an R package for conducting stylometric studies in R. Run your own authorship analysis with it, following for example their instructions on authorship verification. For a Python approach, read the chapter Stylometry and the Voice of Hildegard in the online guide Humanities Data Analysis: Case Studies with Python (Karsdorp, Kestemont & Allen-Ridden (2022), and follow their exercises.

Get inspired by the methods used in Stylometry-based Approach for Detecting Writing Style Changes in Literary Texts (Gómez-Adorno et al., 2018) to build your own machine learning model. Define a set of relevant stylistic features and train the model to classify, identify, or explore authors of your interest.

Resources

Scripts and sites

Articles

Reader »