Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Library Guides at Tulane University Tulane University Libraries Homepage Tulane University Homepage

Digital Text Analysis

A guide to introduce and support researchers interested in distant reading approaches to digital texts.

What is a Corpus?

A corpus is, simply put, a text under study or a set of texts to study (the plural is corpora). For linguists, a corpus is specifically a collection of written or spoken material upon which a linguistic analysis is based.

You may source your corpora from many different sources. These may include Google Books, Project Gutenberg, text digitized from newspapers collected on microfilm or other formats, Twitter, library-subscribed databases of secondary literature, et al.

Corpora for Text Analysis

Open Access Collections

_____________________________________________________

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.