Corpus of Historical American English

The Corpus of Historical American English (COHA) contains about 385 million words and 115 000 texts from the years 1810-2009. Each decade has roughly the same balance of fiction, popular magazine, newspaper, and non-fiction books.

For general terms and conditions for this and other corpora from BYU please see https://www.corpusdata.org/restrictions.asp

More information on the BYU corpora at Kielipankki

Latest versions/subcorpora:
Corpus of Historical American English – Kielipankki Korp version 2017H1
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Corpus of Historical American English – Kielipankki download version 2017H1
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2017061924

The Morpho-Syntactic Database of Mikael Agricola’s Works

The Morpho-Syntactic Database of Mikael Agricola’s Works contains the Finnish parts of Mikael Agricola’s works (Abckiria, Rukouskiria, Se Wsi testamenti, Käsikiria, Messu, Piina, Psaltari, Veisut, Profeetat). The database was created from 2004 to 2008, when the texts offered by the Institute for the Languages of Finland were coded and annotated by the Finnish Language Department of the University of Turku in the project ’The Scientific Edition and the Morpho-Syntactic Database of Mikael Agricola’s Works’ by broadening the model used in the Finnish Dialect Syntax Archive. The project was funded by the Academy of Finland and the Alfred Kordelin Foundation. The words of the corpus have been annotated by keyword, part of speech, morphological components and syntactical function. All the grammatical units have been coded according to their places in the works and in the books of the Bible.

Latest versions/subcorpora:
The Morpho-Syntactic Database of Mikael Agricola’s Works version 1.1, Korp
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are, or will be, published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021051204

The Finnish Gutenberg Corpus

The corpus contains Finnish books made available by the Gutenberg project. The texts have not been linguistically annotated.

A list of the works the Finnish Gutenberg Corpus contains: Gutenberg.pdf

Latest versions/subcorpora:
The Finnish Gutenberg Corpus
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Search for these versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021051203

New Year’s Speeches of the Presidents of the Republic of Finland

This corpus contains the New year’s speeches given by the presidents of the republic of Finland in 1935-2007.

More information on the corpus: http://kaino.kotus.fi/korpus/teko/meta/presidentti/presidentti_coll_rdf.xml

Last versions/subcorpora:
New Year’s Speeches of the Presidents of the Republic of Finland
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Search for these versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021051202

Finnish Folk Poetry

A 34-volume collection of Finnic oral poetry, lyric, short rhymes, incantations etc., collected and recorded from the 16th century to the 1930s and published mostly between 1908 and 1948, with a supplement volume published in 1997. The corpus is multilingual, with texts in Finnish, Karelian, Olonets, Ludian, Votic, Izhorian, Latin and Swedish.

Latest versions/subcorpora:
Finnish Folk Poetry
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Search for these versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021051201

Fenno-Ugrica, Kielipankki version

Fenno-Ugrica is the National Library of Finland’s digital collection of Finno-Ugric publications. The Fenno-Ugrica collection includes monograph publications in Ingrian, Veps, Mari (Hill Mari and Meadow Mari) and Mordvinic (Erzya and Moksha) languages and newspapers in Mari and Mordvinic languages from the 1920s and the 1930s. All in all, the collection consists of more than 120 monographs and nearly 20,000 pages of newspapers.

The material of Fenno-Ugrica has been produced by the National Library of Finland in the Digitisation Project of Kindred Languages, which is a part of Language Programme of Kone Foundation.

More information: http://fennougrica.kansalliskirjasto.fi/

The Kielipankki version of Fenno-Ugrica is available in Kielipankki – the Language Bank of Finland.

Latest versions/subcorpora:
Fenno-Ugrica, Kielipankki Version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Fenno-Ugrica Kielipankki Downloadable Version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for these versions in META-SHARE

The languages in the corpus and their three-letter ISO 639-3 codes are the following:

  • Eastern Mari: mhr
  • Erzya: myv
  • Ingrian: izh
  • Khanty: kca
  • Mansi: mns
  • Moksha: mdf
  • Nenets: yrk
  • Selkup: sel
  • Veps: vep
  • Western Mari: mrj

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021050706

Corpus of Old Literary Finnish

Written Finnish texts from the years between 1543 and 1810, browsable and searchable on the web. The collection contains bible translations and religious texts (e.g. all of Mikael Agricola’s Finnish works), legal texts, poems, and texts concerning agriculture, nature, health etc., among others. It was compiled for lexicographic use.

More information on the corpus: http://kaino.kotus.fi/korpus/vks/meta/vks_coll_rdf.xml

Latest versions/subcorpora:
Corpus of Old Literary Finnish
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Virtual Old Literary Finnish (VVKS) – Kielipankki Korp version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Resource will be available soon
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021050705

Corpus of Early Modern Finnish

Written Finnish from the 19th century (mostly from the years between 1810 and 1880), browsable and searchable on the web. The collection contains published literature, periodicals, newspapers, and dictionaries, among others, with a focus on the earliest and most important publications and a wide thematic coverage. Texts written originally in Finnish were preferred to translations.

More information on the corpus: http://kaino.kotus.fi/korpus/1800/meta/1800_coll_rdf.xml

Latest versions/subcorpora:
Corpus of Early Modern Finnish, Kielipankki Version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Search for these versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021050704

Classics of Finnish Literature, Kielipankki Version

This corpus contains works of established Finnish authors published from 1880s to 1940s. It includes prose fiction, plays, poetry and aphorisms, some written originally in Swedish.

More information on the corpus: https://kaino.kotus.fi/korpus/klassikot/meta/klassikot_coll_rdf.xml

Latest versions/subcorpora:
Classics of Finnish Literature, Kielipankki Version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Classics of Finnish Literature, download version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Resource will be available soon
Search for these versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021050703

Helsinki Corpus of English Texts

The Helsinki Corpus of English Texts is a structured multi-genre diachronic corpus, which includes periodically organized text samples from Old, Middle and Early Modern English. Each sample is preceded by a list of parameter codes giving information on the text and its author. The corpus is useful particularly in the study of the change of linguistic features in long diachrony. It can be used as a diagnostic corpus giving general information of the occurrence of forms, structures and lexemes in different periods of English. This information can be supplemented by evidence yielded by more special and focused historical corpora.

More information on the corpus: https://varieng.helsinki.fi/CoRD/corpora/HelsinkiCorpus/

Latest versions/subcorpora:
Helsinki Corpus TEI-XML Edition (2011), Korp
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Helsinki Corpus of English Texts, VRT
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Resource will be available soon
Helsinki Corpus of English Texts, Early Modern English section
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021050302

Aleksis Kivi Corpus (SKS)

This corpus contains all the known letters, manuscripts and published works by the Finnish author Aleksis Kivi (1834–1872), collected by the Finnish Literature Society (Suomalaisen Kirjallisuuden Seura). Most of the texts were written in Finnish while some of the letters and manuscripts are in Swedish.

More information: http://www.edith.fi/kivikorpus/index.htm

Latest versions/subcorpora:
Aleksis Kivi Corpus (SKS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Search for these versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021050301

Classics Library of the National Library of Finland – Kielipankki version

This corpus comprises works written in Finnish and Swedish, which are part of the Classics Library of the National Library of Finland and have been published under the license Public Domain.
The data set in Finnish includes 686 works and the data set in Swedish includes 282 works out of the whole data set of 968 works in Finnish and Swedish, gathered from Doria and processed by Niklas Alén in April 2017.

The data set in Doria is an accumulating resource and it comprises works of established Finnish authors published from 1549 onwards. The time coverage for the Kielipankki version is 1549-1944 with the exception of Maria Jotuni’s ’Huojuva talo’ published in 1963 in the Finnish sub-corpus.
The corpus includes classical literature, e.g. prose, plays and poetry.

A list of all works in Finnish in the Kielipankki version sorted by the author
A list of all works in Swedish in the Kielipankki version sorted by the author

Latest versions/subcorpora:
The Finnish sub-corpus of the Classics Library of the National Library of Finland – Kielipankki version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Resource will be available soon
The Finnish sub-corpus of the Classics Library of the National Library of Finland – Kielipankki download version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Resource will be available soon
The Swedish sub-corpus of the Classics Library of the National Library of Finland – Kielipankki version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Resource will be available soon
The Swedish sub-corpus of the Classics Library of the National Library of Finland – Kielipankki download version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Resource will be available soon
Search for these versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2018051701

Suomeksi

Corpus Cyrillo-Methodianum Helsingiense: Corpus of Old Church Slavonic Texts

Viimeisimmät versiot: 
Corpus Cyrillo-Methodianum Helsingiense: Corpus of Old Church Slavonic Texts, source
icon-info-circle Metadata and license
icon-quote-right How to cite this version
Download
Browse the content as a web site
Search for other versions of this resource 

Content of this resource

The Corpus Cyrillo-Methodianum Helsingiense (CCMH) is a corpus of Old Church Slavonic (OCS) texts. It was collected at the University of Helsinki from 1986 to 2017 in various research projects.

Until 2020, the resource was publicly available as a website maintained by the University of Helsinki, and it is now located in Kielipankki – The Language Bank of Finland.

Further details on the content and license are provided in the metadata record of each corpus version.

The Newspaper and Periodical Corpus of the National Library of Finland, Kielipankki Version

This corpus contains newspapers and magazines from Finland starting from 1770, compiled by the National Library of Finland.

A list of the newspapers and magazines published in Finnish: https://www.kielipankki.fi/wp-content/uploads/klk-lehdet-fi.pdf

A list of the newspapers and magazines published in Swedish: https://www.kielipankki.fi/wp-content/uploads/klk-lehdet-sv.pdf.

Latest versions/subcorpora:
The Finnish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland, Kielipankki Version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
The Swedish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland, Kielipankki Version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
The Newspaper and Periodical Corpus of the National Library of Finland, Swedish sub-corpus, 1771–1879, VRT
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Available upon request
The Newspaper and Periodical Corpus of the National Library of Finland, Swedish sub-corpus, 1880–1948, scrambled, VRT
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Available upon request
Search for these versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-201405276

The Letters of Paul Sinebrychoff, Kielipankki Version

Persistent Identifier of this resource: http://urn.fi/urn:nbn:fi:lb-201407303

Paul and Fanny Sinebrychoff created an art collection unique for Finland. This corpus contains Paul Sinebrychoff’s and various experts’ and collectors’ correspondence from 1895 to 1909 related to the acquisitions of the collection.

The letters were translated into Finnish and the original hand-written letters in Swedish were transcribed with support from Sinebrychoff Oy Ab. The translations were done by students from the Department of Scandinavian Languages and Literature (Nordica) of the University of Helsinki.

More information: http://kirjearkisto.siff.fi/default.aspx

Latest versions/subcorpora:
The Finnish Sub-corpus of the Letters of Paul Sinebrychoff, Kielipankki Version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
The Swedish Sub-corpus of the Letters of Paul Sinebrychoff, Kielipankki Version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Search for these versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

suomeksi

Semfinlex Kielipankki version

The Semfinlex corpora published in the Language Bank of Finland is based on the open data resources made available in the Semantic Finlex project. The project is hosted by the Semantic Computing Research Group (SeCo) at the Aalto University. More information and links to scientific publications can be found on the website of the project.

NB! 2019-09-13 Discrepancies in dependency parses of the Finnish data: The dependency parses and relations differ significantly from the parses in other corpora parsed earlier with the same parser. We are investigating the issue.

Latest versions:  
Finnish Parliament original statutes from 1734-2018 in Finnish, version 1.1, Korp version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp icon-question-circle
Finnish Parliament original statutes from 1920-2018 in Swedish, Korp version; Ursprungliga författningar av Riksdagen på svenska från 1920-2018, Korp-versionen
icon-info-circleMetadata and license
icon-quote-right Attribution instructions
Select the corpus in Korpicon-question-circle
Finnish Parliament original statutes from 1920-2018, Korp version (Finnish-Swedish parallel corpus)
icon-info-circleMetadata and license
icon-quote-right Attribution instructions
Select the corpus in Korpicon-question-circle
Finnish Parliament original statutes from 1734-2018, downloadable version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the corpus
Finnish Supreme and Supreme Administrative Court decisions from 1980-2018 in Finnish, Korp version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp icon-question-circle
Finnish Supreme and Supreme Administrative Court decisions from 1980-2018 in Finnish, downloadable version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the corpus
Finnish Supreme and Supreme Administrative Court decisions from 1980-2018 in Swedish, Korp version; Avgöranden av Högsta domstolen och Högsta förvaltningsdomstolen på svenska 1980-2018, Korp-versionen
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp icon-question-circle
Finnish Supreme and Supreme Administrative Court decisions from 1980-2018 in Swedish, downloadable version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the corpus
Search for these versions in META-SHARE  

The resource has been annotated and the parallel corpus aligned by Erik Axelson in the FIN-CLARIN initiative for the Language Bank of Finland and it is publicly available and no registration or log in is required for access.

The concordancing tool Korp offers various options for searches and for compiling statistics. In the extended search, the user can restrict the search based on the type of the statute or a choose particular a time interval for the search. See the Korp User Guide for more information.

More detailed information about the corpora is available in the metadata articles.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021060401

ORACC

Latest versions:
Open Richly Annotated Cuneiform Corpus, Korp Version, May 2019
icon-info-circle Description in META-SHARE and licence
icon-quote-right Attribution instructions
Open the resource in the concordance service Korp icon-question-circle
Open Richly Annotated Cuneiform Corpus, Downloadable Version, September 2017
icon-info-circle Description in META-SHARE and licence
icon-quote-right Attribution instructions for this version
Download the resource
Open Richly Annotated Cuneiform Corpus, Korp Version, September 2017
icon-info-circle Description in META-SHARE and licence
icon-quote-right Attribution instructions for this version
Oracc in Korp user guide

Open Richly Annotated Cuneiform Corpus (Oracc) brings together the work of several Assyriological projects to publish online editions of cuneiform texts. The Korp version of Oracc allows extensive searches on the texts and presents the results as a KWIC concordance list. Korp also offers statistical information and comparison of the search results. Downloading the query results is possible as well.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2019111601

ScotsCorr – Helsinki Corpus of Scottish Correspondence (1540–1750)

Description

The corpus comprises circa 0.4 million words (0.5 million tokens) of early Scottish correspondence by male and female writers dating from the period 1540–1750. The corpus consists of transcripts of original letter manuscripts, which reproduce the text disallowing any modernisation, normalisation or emendation. Language-external variables such as date, region, gender, addressee, hand and script type have been coded into the database. The writers originate from fifteen different regions of Scotland; these can be grouped to represent the areas of North, North-East, Central, South-East, and South-West. In addition, there are two categories of informants that have not been defined by geographical origin: representatives of the court and professional people such as members of the clergy. The proportion of female informants in the corpus is 21 per cent.

Latest versions/subcorpora:
Helsinki Corpus of Scottish Correspondence (1540-1750)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Helsinki Corpus of Scottish Correspondence (1540-1750), VRT
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for all versions in META-SHARE

Access

ScotsCorr is available in the Korp concordance service of Kielipankki (the Language Bank of Finland); direct link: http://urn.fi/urn:nbn:fi:lb-2016121607. Note that you will need to log in to Korp and have access rights to ScotsCorr. For more information, please see the section Accessing ScotsCorr of the ScotsCorr Korp Guide.

ScotsCorr data in VRT format will be available in the download service of Kielipankki, the Language Bank of Finland, at www.kielipankki.fi/download. Note that you will need to have access rights to ScotsCorr.

Documentation

The following documentation has been written by Anneli Meurman-Solin:

In addition, you may find it helpful to consult the on-line Dictionary of the Scots Language.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-202104191