Library Guides: Analyze Digital Text as Data: Home

Consider this a starting point to explore the methods of digital text analysis. I invite you to discover and explore methods and resources that may enrich your own learning, teaching, and approach to research.

What is Text Analysis?

Text analysis (sometimes referred to as text mining or text data mining) is a set of methodologies of using computers to facilitate discovery of new, high-quality information in a text (corpus). In contrast to a close reading of a text, distant reading or topic modeling can provide insights not readily apparent by assessing word frequency and usage within and across texts.

Frequent Types of Analyses

Word frequency (words that appear in a text, sorted by frequency/uniqueness)
Collocation (words that commonly appear near another word)
Concordance (contexts of a given word or set of words in a corpus)
N-grams (common two-, three-, etc.- word phrases)
Entity recognition (identifying names, places, time periods, etc.)
Dictionary tagging (locating a specific set of words in a corpus)
Topic model: a statistical model for finding abstract topics in a corpus