Corpus of Contemporary American English (COCA)

Corpus of Contemporary American English (COCA)

Currently available versions of this resource group

Shortname	Name and metadata	License	Location	Cite	Resource group and help	Apply	Publication year	Support level
Shortname	Name and metadata	License	Location	Cite	Resource group and help	Apply	Publication year	Support level

Upcoming versions of this resource group

These resource versions are not yet available in the Language Bank of Finland.

Shortname	Name and metadata	License	Formats	Support level	Contact Person	Resource group and help	Location	Other information
Shortname	Name and metadata	License	Formats	Support level	Contact Person	Resource group and help	Location	Other information

Resource information

The Corpus of Contemporary American English (COCA) is a very large corpus of American English. The original, frequently updated version of COCA is provided by Mark Davies via the corpus interface at english-corpora.org. The Language Bank of Finland offers several ”snapshot” versions of COCA under a restricted academic license that is available for users affiliated with a university in Finland.

For the description of an individual corpus version, please see the metadata record (click on the link at the corpus title).

More information about all corpora from english-corpora.org that are available via the Language Bank

License and access

For the license text of an individual corpus, click on the license image in the corpus list, or see the metadata record (click on the link at the corpus title). Note that there are specific additional terms and conditions that apply on this and other corpora from BYU, see https://www.corpusdata.org/restrictions.asp. The link is included in the official license.

Korp versions

Some of the corpus versions are available for searching via the Korp concordancer tool (click on the link under ’Location’).
Access to the Korp versions requires academic login via a university in Finland.

Downloadable versions

Access to the downloadable corpora mentioned above is restricted to researchers affiliated to member universities of the FIN-CLARIN consortium in Finland. Download access can usually be provided to graduate or postgraduate students in case the applicant needs the corpora for an MA thesis or for a PhD dissertation.
To obtain access to restricted corpora, please submit an application via the Language Bank Rights (after logging in to the LBR service, search the catalogue for ’Mark Davies’ downloadable corpora at Kielipankki.’).
To access the download service, click on the link under ’Location’, or see the metadata record for the link.

This page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2017061921

The Helsinki Korp Version of the Parole Corpus (parole)

The Helsinki Korp Version of the Parole Corpus (parole)

Suomeksi

Currently available versions of this resource

Shortname	Name and metadata	License	Location	Cite	Resource group and help	Apply	Publication year	Support level
Shortname	Name and metadata	License	Location	Cite	Resource group and help	Apply	Publication year	Support level

Upcoming versions of this resource

These resource versions are not yet available in the Language Bank of Finland.

Shortname	Name and metadata	License	Formats	Support level	Contact Person	Resource group and help	Location	Other information
Shortname	Name and metadata	License	Formats	Support level	Contact Person	Resource group and help	Location	Other information

Resource information

This electronic language resource was compiled out of several languages spoken in Europe during the international project Le Parole.

License and access

This resource requires you to apply for individual access rights (RES).
Click on the license image to see the resource-specific license text.
Some versions of this resource are available in the computing environment (see column ’Location’).

This page has a persistent identifier: http://urn.fi/urn:nbn:fi:lb-2021042601

Helsinki Corpus of Swahili 2.0 (HCS 2.0)

Helsinki Corpus of Swahili 2.0 (HCS 2.0)

Suomeksi

Currently available versions of this resource

Shortname	Name and metadata	License	Location	Cite	Resource group and help	Apply	Publication year	Support level
Shortname	Name and metadata	License	Location	Cite	Resource group and help	Apply	Publication year	Support level

Upcoming versions of this resource

These resource versions are not yet available in the Language Bank of Finland.

Shortname	Name and metadata	License	Formats	Support level	Contact Person	Resource group and help	Location	Other information
Shortname	Name and metadata	License	Formats	Support level	Contact Person	Resource group and help	Location	Other information

Resource information

Helsinki Corpus of Swahili 2.0 is available for research purposes in Kielipankki – the Language Bank of Finland. The corpus contains about 25 million words of written text, and it is available in two formats. The annotated version contains morphological and syntactic annotation as well as glosses in English. The not annotated version contains plain text. The corpus text was randomly shuffled document-internally. The sentence order is the same in both corpus versions.

For more information on the corpus please see: https://www.kielipankki.fi/corpora/hcs2/

License and access

Some versions of this resource are available publicly (PUB), whereas others might require you to log in as an academic user (ACA).
Click on the license image to see the resource-specific license text.
Some versions of this resource are available in the computing environment (see column ’Location’).

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2014032624

Wanca 2016

Wanca 2016

Suomeksi

Currently available versions of this resource

Shortname	Name and metadata	License	Location	Cite	Resource group and help	Apply	Publication year	Support level
Shortname	Name and metadata	License	Location	Cite	Resource group and help	Apply	Publication year	Support level

Upcoming versions of this resource

These resource versions are not yet available in the Language Bank of Finland.

Shortname	Name and metadata	License	Formats	Support level	Contact Person	Resource group and help	Location	Other information
Shortname	Name and metadata	License	Formats	Support level	Contact Person	Resource group and help	Location	Other information

Resource information

Wanca 2016 is a collection of web corpora in small Uralic languages. The collection is composed of 29 sentence corpora in different languages. The corpora have been collected from the Internet using the automated system developed in the Finno-Ugric Languages and the Internet project (SUKI) supported by the Kone foundation from their Language Programme 2012-2016. The sentences have been extracted from the pages found while harvesting with Heritrix and the language of each sentence has been identified with MultiLi using HeLI as the identification method. Each sentence has a link to the original page it was found in, but it is possible that some of the links stop working. In that case we recommend searching for the page in the Internet Archive Wayback machine https://archive.org/web/.

More information on Wanca: http://www.suki.ling.helsinki.fi/wanca

License and access

All versions of this resource are available publicly (PUB).
Click on the license image to see the resource-specific license text.

Additional documentation

The languages in Wanca 2016 are:

ISO 639-3	Name of language
fit	Tornedalen Finnish (meänkieli)
fkv	Kven (kvääni)
izh	Ingrian (ižoran keel)
kca	Khanty (ханты ясанг)
koi	Komi-Permyak (перем коми кыв)
kpv	Komi-Zyrian (Коми кыв)
krl	Karelian (karjal)
liv	Liv (līvõ kēļ)
lud	Ludian (lüüdin kiel’)
mdf	Moksha (мокшень)
mhr	Eastern and Meadow Mari (марий йылме)
mns	Mansi (мāньси лāтыӈ)
mrj	Western or Hill Mari (Кырык мары)
myv	Erzya (эрзянь)
nio	Nganasan (ня”)
olo	Livvi (Olonets / livvin karjal)
sjd	Kildin Sami (Кӣллт са̄мь кӣлл)
sjk	Kemi Sami (samääškiela)
sju	Ume Sami (uumajanlappi)
sma	Southern Sami (åarjel-saemien)
sme	Northern Sami (davvisámi, davvisámegiella)
smj	Lule Sami (julevsábme)
smn	Inari Sami (anarâškielâ)
sms	Skolt Sami (sää´mǩiõll)
udm	Udmurt (удмурт кыл)
vep	Veps (vepsän kel’)
vot	Votic (vad̕d̕a ceeli)
vro	Võro (võro kiil)
yrk	Nenets (ненэцяʼ вада)

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-202104141

Last modified on 2025-11-26

Search the Language Bank Portal:

Researcher of the Month: Milla Uusitupa

Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information

Corpus of Contemporary American English (COCA)

Currently available versions of this resource group

Upcoming versions of this resource group

Resource information

License and access

Korp versions

Downloadable versions

The Helsinki Korp Version of the Parole Corpus (parole)

Currently available versions of this resource

Upcoming versions of this resource

Resource information

License and access

Helsinki Corpus of Swahili 2.0 (HCS 2.0)

Currently available versions of this resource

Upcoming versions of this resource

Resource information

License and access

Wanca 2016

Currently available versions of this resource

Upcoming versions of this resource

Resource information

License and access

Additional documentation

News

Contact