It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Digital Text Analysis
A guide to introduce and support researchers interested in distant reading approaches to digital texts.
Consider this a starting point to explore the methods of digital text analysis. I invite you to discover and explore methods and resources that may enrich your own learning, teaching, and approach to research.
What is Text Analysis?
Text analysis (sometimes referred to as text mining or text data mining) is a set of methodologies of using computers to facilitate discovery of new, high-quality information in a text (corpus). In contrast to a close reading of a text, distant reading or topic modeling can provide insights not readily apparent by assessing word frequency and usage within and across texts.
Frequent Types of Analyses
Word frequency (words that appear in a text, sorted by frequency/uniqueness)
Collocation (words that commonly appear near another word)
Concordance (contexts of a given word or set of words in a corpus)
N-grams (common two-, three-, etc.- word phrases)
Entity recognition (identifying names, places, time periods, etc.)
Dictionary tagging (locating a specific set of words in a corpus)
Topic model: a statistical model for finding abstract topics in a corpus