The discipline of history, which (fair or not) has not been heralded as a methodological frontrunner, has responded forcefully to digitization as many archives are being digitized and open up new opportunities for research.
Subdisciplines such as history of concepts can now hardly be carried out in good faith without involving digitized archival material that widens investigations of concepts into material that cannot be read in full by humans for a very specific purpose.
Likewise, in literary studies, inquiries into the uses of terms provide an important background for understanding history and making claims about it. While these findings may be auxiliary to a larger thesis that is researched with traditional methods, it is also possible to set up queries whether established categories such as periodization are as meaningful as they are presented in literary history. Periods work by suggesting difference, but researchers such as Ted Underwood (2019) has convincingly argued that there is often much more continuation in content and style, which is not grasped by the logic of periodization.
Computational approaches can thus challenge widely held beliefs of literary history, while also confirming developments that one would not have been able to see without large amounts of data. Underwood has for example done that by showing that the origins of words in poetry and fiction in English literature shift towards a more archaic vocabulary in the years up to 1800, while the general usage of words tends to become more modern. This goes hand in hand with the medievalism and interest in the past expressed by Romantic writers and adds a significant stylistic element to the narrative of a culture of historicism in those years.
Google Books Ngram Viewer allows its users to search for words, terms, and phrases, and to discover how frequently they appear in the millions of Google’s digitized books. In doing so, it also generates graphs that represent these developments. You can narrow down your search by specifying the time period as well as the corpus type (e.g. “English Fiction”) relevant for your analysis. Used properly, Google Books Ngram Viewer can clarify when certain terms became relevant. An example of this could be searching for the term “Posthuman” in the mid 1900s and look at it’s development over time, or comparing the graph with graphs of other comparable time periods or terms. Ngrams can also be used to clarify when a term became less relevant (e.g. the term “truth” which is not what it used to be), or to compare fluctuations of word pairs appearing together.
As demonstrated by March Egnal (2013), the Ngram Viewer can also be used to search for terms believed to be associated with distinctive characteristics of different periods of literary history, allowing us to confirm the existence of such periods as well as deepen our understanding of them.
The Ngram Viewer can also be used as a tool to identify anachronisms in fiction, i.e. cases of the past intruding on the present or vice versa, the present on the past. While the inclusion of anachronisms can be a purposeful literary device used by an author to engage the reader, unintentional inclusion of anachronisms might reveal assumptions and cultural biases of an author. Just as an etymology dictionary can be used to determine when a certain term was coined or entered common usage, The Ngram Viewer can provide information about the popular usage of words.
Using Ngrams to investigate the inclusion of modern language or objects in literature set in the past can be a way of assessing historical accuracy. As an example, the term “black market” was first coined around the 1930s during the Great Depression and gained usage during WWII, making any inclusion of the term in literature set in e.g. the 18th century misplaced. However, it is important to keep in mind that the data used by the Ngram Viewer is limited in the sense that it only reflects a small fraction of all books ever published.
Following Ted Underwood, computational approaches have the potentiality to change and expand the field of literary studies. He claims that literary studies is traditionally based on the assumption that the broad contours of the field are already known, and that the discipline moves forward by criticising or questioning the known boundaries - we do not expect to do totally new discoveries in literary history. However, with computational tools we now might detect hitherto undiscovered patterns or literary features: algorithms might point out shifts in literary language we were not expecting. What do you think? Reflect your views on the relation of literary history and computational methods through Underwood’s and Piper’s books mentioned in the resources.
One approach to study a large corpus with computational methods is to build a topic model of it. This model attributes every document, e.g. novel, a certain number of “topics”, a thematic collection of words that tend to occur together. When the model also contains metadata of the works, such as the author and the year of publication, it is possible to explore how topics change and vary over time.
Moreover, the changes in topics can reflect changes and new trends in the society. Read more about the data set and topic modeling in Ted Underwood’s blog.
Underwood’s topic models are accessible on GitHub to investigate a topic of your interest. For instance, search for nature-related topics and reflect on our relationship with nature. Have the meaning and metaphors of nature changed, and how would you interpret it? The extensive corpus of English-language literature from 1700 to 1922 used in Underwood’s projects is available here if you want to run your own investigation and analyse the development of English literature.
For more ideas, the Programming Historian offers over 80 tutorials to get you started with computational exploration of history.