Corpus of Contemporary American English - Kielipankki VRT version 2020 Corpus of Contemporary American English - Kielipankin VRT versio 2020 shortname: coca-vrt-2020 metadata: http://urn.fi/urn:nbn:fi:lb-2023092601 IPR holder: Mark Davies Licensed under CLARIN RES end-user license +ND +OTHER v2.1 The complete license is available at http://urn.fi/urn:nbn:fi:lb-2017072503 A copy of the license is included in LICENSE.txt. Note, however, that the license details may be subject to change. Before downloading the resource, please refer to the latest version of the license (see the link above). CORPUS DESCRIPTION This version of the Corpus of Contemporary American English (COCA), released in March 2020, contains about 1 billion words in 485,000 texts from the years 1990-2019. The corpus is evenly divided into eight genres: spoken, fiction, magazine, newspaper, academic, blogs, web pages and TV/movies subtitles (~125 million words each). It is related to many other corpora of English, formerly known as the "BYU Corpora". The data of this resource is available in the VRT format. For more information on the VRT format in general, please see https://www.kielipankki.fi/development/korp/corpus-input-format/ Please note that the data has 10 words every 200 words replaced with @ characters to comply with the US Fair Use Law; see https://www.corpusdata.org/limitations.asp The original corpus data is searchable at https://www.english-corpora.org/coca/ and a Kielipankki version of the corpus data is searchable in Korp at http://urn.fi/urn:nbn:fi:lb-2022111502 For further information, please contact kielipankki@csc.fi.