VEP TCP Collection

The following corpora are released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Before downloading the corpora, read about the format of the text files here, and about our text processing workflow here. Corpora are generated from Text Creation Partnership (TCP) XML files.

Please note that our download corpora do not contain texts from EEBO-TCP Phase II, which will not be in the public domain until five years after the completion of the TCP project for Phase II. However, metadata for EEBO-TCP Phase II texts is available for download.

The VEP TCP Collection contains three corpora: EEBO-TCP Phase I, ECCO-TCP, and Evans-TCP. Below you will find a section to download each corpus. Collection metadata is available for download through the Metadata Builder.

VEP TCP Corpora

 

EEBO-TCP Phase I
A corpus of plain text files extracted from EEBO-TCP Phase I XML files, offered in both standardized spelling and original spelling versions.

  1. Download VEP Standardized Spelling EEBO-TCP Phase I SimpleText plain text files
    • zip contents: 25,368 SimpleText plain text files; README_SimpleText_files.txt; 1 metadata csv
    • size: 1.4 GB zipped; 4.02 GB unzipped
  2. Download VEP Original Spelling EEBO-TCP Phase I SimpleText plain text files
    • zip contents: 25,368 SimpleText plain text files; README_SimpleText_files.txt; 1 metadata csv
    • size: 1.41 GB zipped; 4.05 GB unzipped

ECCO-TCP
A corpus of plain text files extracted from ECCO-TCP XML files, offered in both standardized spelling and original spelling versions.

  1. Download VEP Standardized Spelling ECCO-TCP SimpleText plain text files
    • zip contents: 2,473 SimpleText plain text files; README_SimpleText_files.txt; 1 metadata csv
    • size: 149 MB zipped; 548 MB unzipped
  2. Download VEP Original Spelling ECCO-TCP SimpleText plain text files
    • zip contents: 2,473 SimpleText plain text files; README_SimpleText_files.txt; 1 metadata csv
    • size: 149 MB zipped; 414 MB unzipped

Evans-TCP
A corpus of plain text files extracted from Evans-TCP XML files, offered in both standardized spelling and original spelling versions.

  1. Download VEP Standardized Spelling Evans-TCP SimpleText plain text files
    • zip contents: 5,012 SimpleText plain text files; README_SimpleText_files.txt; 1 metadata csv
    • size: 198 MB zipped; 548 MB unzipped
  2. Download VEP Original Spelling Evans-TCP SimpleText plain text files
    • zip contents: 5,012 SimpleText plain text files; README_SimpleText_files.txt; 1 metadata csv
      • size: 198 MB zipped; 568 MB unzipped