Skip to Main Content

Analyze Digital Text as Data

A guide to introduce and support researchers interested in distant reading approaches to digital texts.

Consider this a starting point to explore the methods of digital text analysis. I invite you to discover and explore methods and resources that may enrich your own learning, teaching, and approach to research.

What is Text Analysis?

Text analysis (sometimes referred to as text mining or text data mining) is a set of methodologies of using computers to facilitate discovery of new, high-quality information in a text (corpus). In contrast to a close reading of a text, distant reading or topic modeling can provide insights not readily apparent by assessing word frequency and usage within and across texts.  

Frequent Types of Analyses

  • Word frequency (words that appear in a text, sorted by frequency/uniqueness)
  • Collocation (words that commonly appear near another word)
  • Concordance (contexts of a given word or set of words in a corpus)
  • N-grams (common two-, three-, etc.- word phrases)
  • Entity recognition (identifying names, places, time periods, etc.)
  • Dictionary tagging (locating a specific set of words in a corpus)
  • Topic model: a statistical model for finding abstract topics in a corpus
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.