The Coronavirus Corpus (Mark Davies, english-corpora.org) - Kielipankki version 2021-05, source

shortname: coronavirus-ecorg-2021-05-src

metadata: http://urn.fi/urn:nbn:fi:lb-2022111701

IPR holder: Prof. Mark Davies, Professor of Linguistics (retired)

license: CLARIN RES
The complete license is available at http://urn.fi/urn:nbn:fi:lb-2022111703

A copy of the license is included in LICENSE.txt. The license details
may be subject to change, so before downloading the resource, please
refer to the latest version of the license at the above link.

The corpus is available in three versions (text per line, token per
line, a relational database) each in its own archive file:

coronavirus-ecorg-2021-05-src-text.zip
- text files (text_*.txt): each text on its own line
- @@textID, space-separated tokens (words, punctuation marks)
- lexicon.txt, lexicon-21-04.txt, lexicon-21-05.txt
- sources.txt, sources-21-04.txt, sources-21-05.txt

coronavirus-ecorg-2021-05-src-db.zip
- three database tables as text with tab-separated fields
- text files (db_*.txt): textID tokenID wordID
- lexicon.txt, lexicon-21-04.txt, lexicon-21-05.txt
- sources.txt, sources-21-04.txt, sources-21-05.txt
- https://www.corpusdata.org/database.asp

coronavirus-ecorg-2021-05-src-wlp.zip
- text files each token is on its own line
- @@textID, Word, Lemma, PoS (tab-separated)
- lexicon.txt, lexicon-21-04.txt, lexicon-21-05.txt
- sources.txt, sources-21-04.txt, sources-21-05.txt

Each archive contains the *same* lexicon and sources files:
- lexicon*.txt: wordID word lemma PoS (tab-separated)
- sources*.txt: textID ... (tab-separated)

The files are as provided by Mark Davies; more information on them can
be found at https://www.corpusdata.org/formats.asp


Please note that the data has 10 words every 200 words replaced with @
characters to comply with the US Fair Use Law; see
https://www.corpusdata.org/limitations.asp

The original corpus data is searchable at https://www.english-corpora.org/corona/

For further information, please contact fin-clarin@helsinki.fi .