Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
---|---|---|---|---|---|---|---|---|
Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
These resource versions are not yet available in the Language Bank of Finland.
Shortname | Name and metadata | License | Formats | Support level | Contact Person | Resource group and help | Location | Other information |
---|---|---|---|---|---|---|---|---|
Shortname | Name and metadata | License | Formats | Support level | Contact Person | Resource group and help | Location | Other information |
The Corpus of Contemporary American English (COCA) is a very large corpus of American English. The original, frequently updated version of COCA is provided by Mark Davies via the corpus interface at english-corpora.org. The Language Bank of Finland offers several ”snapshot” versions of COCA under a restricted academic license that is available for users affiliated with a university in Finland.
For the description of an individual corpus version, please see the metadata record (click on the link at the corpus title).
More information about all corpora from english-corpora.org that are available via the Language Bank
For the license text of an individual corpus, click on the license image in the corpus list, or see the metadata record (click on the link at the corpus title). Note that there are specific additional terms and conditions that apply on this and other corpora from BYU, see https://www.corpusdata.org/restrictions.asp. The link is included in the official license.
This page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2017061921
This electronic language resource was compiled out of several languages spoken in Europe during the international project Le Parole.
Latest versions/subcorpora: | |
The Helsinki Korp Version of the Swedish Parole Corpus Metadata and license Attribution instructions | Select the corpus in Korp |
The Finnish Parole Corpus Metadata and license Attribution instructions | available upon request via our IDA-service |
Search for all versions in META-SHARE |
Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.
Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021042601
Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
---|---|---|---|---|---|---|---|---|
Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
Helsinki Corpus of Swahili 2.0 is available for research purposes in Kielipankki – the Language Bank of Finland. The corpus contains about 25 million words of written text, and it is available in two formats. The annotated version contains morphological and syntactic annotation as well as glosses in English. The not annotated version contains plain text. The corpus text was randomly shuffled document-internally. The sentence order is the same in both corpus versions.
For more information on the corpus please see: https://www.kielipankki.fi/corpora/hcs2/
Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record.
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2014032624
Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
---|---|---|---|---|---|---|---|---|
Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
Wanca 2016 is a collection of web corpora in small Uralic languages. The collection is composed of 29 sentence corpora in different languages. The corpora have been collected from the Internet using the automated system developed in the Finno-Ugric Languages and the Internet project (SUKI) supported by the Kone foundation from their Language Programme 2012-2016. The sentences have been extracted from the pages found while harvesting with Heritrix and the language of each sentence has been identified with MultiLi using HeLI as the identification method. Each sentence has a link to the original page it was found in, but it is possible that some of the links stop working. In that case we recommend searching for the page in the Internet Archive Wayback machine https://archive.org/web/.
More information on Wanca: http://www.suki.ling.helsinki.fi/wanca
Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record
The languages in Wanca 2016 are:
ISO 639-3 | Name of language |
fit | Tornedalen Finnish (meänkieli) |
fkv | Kven (kvääni) |
izh | Ingrian (ižoran keel) |
kca | Khanty (ханты ясанг) |
koi | Komi-Permyak (перем коми кыв) |
kpv | Komi-Zyrian (Коми кыв) |
krl | Karelian (karjal) |
liv | Liv (līvõ kēļ) |
lud | Ludian (lüüdin kiel’) |
mdf | Moksha (мокшень) |
mhr | Eastern and Meadow Mari (марий йылме) |
mns | Mansi (мāньси лāтыӈ) |
mrj | Western or Hill Mari (Кырык мары) |
myv | Erzya (эрзянь) |
nio | Nganasan (ня”) |
olo | Livvi (Olonets / livvin karjal) |
sjd | Kildin Sami (Кӣллт са̄мь кӣлл) |
sjk | Kemi Sami (samääškiela) |
sju | Ume Sami (uumajanlappi) |
sma | Southern Sami (åarjel-saemien) |
sme | Northern Sami (davvisámi, davvisámegiella) |
smj | Lule Sami (julevsábme) |
smn | Inari Sami (anarâškielâ) |
sms | Skolt Sami (sää´mǩiõll) |
udm | Udmurt (удмурт кыл) |
vep | Veps (vepsän kel’) |
vot | Votic (vad̕d̕a ceeli) |
vro | Võro (võro kiil) |
yrk | Nenets (ненэцяʼ вада) |
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-202104141
Last modified on 2025-01-20