Style

Introduction

Style is an elusive yet important concept in literary studies. It is also a topic for which multiple computational approaches have been developed over several decades. However, the language a human reader may use to characterize style is not easily reproduced by a computer, one apparent challenge simply being the definition of style. Aesthetic value, the originality of an author, conventions of a genre - or deviations from it, linguistic choices, or any formal properties of a text can all be understood as style (see Herrmann, van Dalen-Oskam & Schöch, 2015). "Understated,” "vivid,” or "ironic" are descriptions that attempt to convey highly complex information and experiences. 

Therefore, Herrmann et. al. (2015) propose a clarified definition to approach the topic: “Style is a property of texts constituted by an ensemble of formal features which can be observed quantitatively or qualitatively.” The quantitative analysis of style can involve computing frequencies, relations, and distributions of features and relevant statistics to assess markers of style. Computers are good at providing clarity concerning questions that may be important, particularly for large corpora. Did Marcel Proust write long sentences? We assume so, but without a computational approach, it takes some effort to demonstrate this. 

So the study of complexity is one area that may be enlightening. The use of various word categories - disclosed by part-of-speech tagging - may show whether a text has a comparatively large number of verbs (which suggests action) or adjectives (which suggests description), for example. All this information may qualify the understanding of the style of a single text, or of a corpus that would be too large to study in detail.

Applications

Elementary

There are very simple, free, and downloadable online digital tools that enable the analyst to discover which patterns an author uses to establish his or her distinct style. Simple online annotation tools such as Hemingway App can be used to get an overview of the complexity of an author’s writing style as well as the overall readability of a text. 

Another possibility is using a word cloud tool that can provide a visual overview of which words are most used in a single text. Voyant Tools even allows users to see the word cloud for a whole body of texts, such as an author’s entire production, thereby giving an overview of an author’s vocabulary. 

One may also use AntConc’s reference corpus function and download one of the word-frequency lists linked to on the program’s website. The word-frequency lists featured there consist of the words most commonly used in the English language. This makes it possible to determine how an author’s particular style diverges from common English.

Advanced

To further spark creative, critical engagement and discussion about new directions for computational stylistics, Sterman, Huang & Paulos (2020) collected a crowdsourced dataset of style and built a machine learning model that yields a high-dimensional style space from the tacit human knowledge. They created two interfaces to explore the model’s style space: The Explorer mode allows for interpreting and viewing works in the style space, and the Editor mode makes it possible to edit and input text while viewing the developing visualisations. All data is available on GitHub, so you can retrain their model or come up with your own ideas to further explore style. Alternatively, if you want to start identifying patterns of writing style, a good first step is part-of-speech (PoS) tagging. As mentioned in the introduction, it allows for labeling each word with a category (e.g. adjective, noun, verb), based on its grammatical definition and textual context. The NLTK package offers pre-labeled corpora and PoS-taggers that you can use to tag a text of your choice. For example, pick an author that you think uses a lot of adjectives and examine if it is statistically true. Both Brown and Gutenberg corpora in NLTK contain novels you can use for that purpose. The tutorial is a good way of getting familiar with PoS-tagging in Python. 

Resources

Scripts and sites

  • Supplementary materials and the full “Style Similarity Dataset” for "Interacting with Literary Style through Computational Tools".

  • A useful tutorial on how to build a part-of-speech (PoS) tagger.

  • Hemingway App, an online tool originally made to polish the sentence structure, complexity, and fluency of a text, which can be used to assess the style of a text.

  • An online handbook for text analysis in R.
  • NLTK, The Natural Language Toolkit for Python.
  • Voyant Tools, a web-based reading and analysis environment for digital texts.

Articles

  • Herrmann, J. B., van Dalen-Oskam, K., & Schöch, C. (2015). Revisiting style, a key concept in literary studies. Journal of literary theory, 9(1), 25-52. https://doi.org/10.1515/jlt-2015-0003
  • Rybicki, J., Hoover, D., & Kestemont, M. (2014). Collaborative authorship: Conrad, Ford and rolling delta. Literary and Linguistic Computing, 29(3), 422-431. https://doi.org/10.1093/llc/fqu016
  • Sterman, S., Huang, E., Liu, V., & Paulos, E. (2020, April). Interacting with Literary Style through Computational Tools. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (pp. 1-12). https://doi.org/10.1145/3313831.3376730 

Sensation »