Before downloading the Early Modern 1080 corpus, read about the format of the text files here, and about our text processing workflow here. Corpora are generated from Text Creation Partnership (TCP) XML files. Our downloads do not contain texts from EEBO-TCP Phase II, which will not be in the public domain until five years after the completion of the TCP project for Phase II.
The corpus is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Early Modern 1080
A corpus of 1080 digitized texts built from the EEBO-TCP Phase I and the ECCO-TCP used to generate a topic model for Serendip. Texts selected were originally published between 1530 and 1799. The corpus was built by randomly sampling 40 texts per decade in an attempt to provide a less biased cross-section than just using well-known texts.
- Download the Early Modern 1080 SimpleText plain text files
- zip contents: 1,080 unrestricted SimpleText plain text files; 1 metadata csv; README for Early Modern 1080 Metadata.pdf; README_SimpleText_files.txt
- size: 72.7 MB zipped; 276 MB unzipped
- Download the Early Modern 1080 Metadata (via the Metadata Builder)
- Download the Early Modern 1080 Metadata README (PDF)
- Download the Early Modern Drama Ubiqu+Ity Tokens files
- zip contents: 1,080 Ubiqu+Ity Tokens csvs, TextViewer.html, README_Ubiquity_tokens_files.txt
- size: 166 MB zipped; .97 GB unzipped
- Download the Early Modern 1080 1-Grams (csv, right-click save as)
Credits: Metadata prepared by Mattie Burkert and Katie Lanning, under the supervision of Michael Witmore and Robin Valenza. XML files processed and curated by Deidre Stuffer for release as plain text files.