The University of Helsinki’s English E-thesis 1999-2016 has been updated to version 1.1. The following modifications have been made to the subcorpora containing master’s theses and doctoral theses:

  • The subcorpora have been parsed with Turku Neural Parser Pipeline (TNPP) parser.
  • Texts with less than 1000 words have been left out.
  • Texts that contain more than 1000 words are included if they contain enough English words.
  • The subcorpus ethesis_en_phd_math has been renamed to ethesis_en_phd_sci.

The subcorpora containing the abstracts (ethesis_en_dissabs and ethesis_en_maabs) have not been changed.