A corpus is, simply put, a text under study or a set of texts to study (the plural is corpora). For linguists, a corpus is specifically a collection of written or spoken material upon which a linguistic analysis is based.
You may source your corpora from many different sources. These may include Google Books, Project Gutenberg, text digitized from newspapers collected on microfilm or other formats, Twitter, library-subscribed databases of secondary literature, et al.
_____________________________________________________