Modern Finnish Word List

This resource is offered by Kotus, Kotimaisten kielten keskus, the Institute for the Languages of Finland.

The entries of the word list are simple XML elements that indicate the lemma and inflection type for basic words. Rare inflection types and other restrictions are marked with attributes. Compounds are usually listed as just the lemma. Examples of the 78 inflection types and 17 consonant gradation types are available on the web site.

Latest versions/subcorpora:
Modern Finnish Word List
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the website
Search for these versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:

Frequency List of Written Finnish Word Forms

This resource is offered by Kotus, Kotimaisten kielten keskus, the Institute for the Languages of Finland.

The resource contains a ranked frequency list of Finnish word forms as they appear in the Finnish Parole text corpus of 17 million written tokens. The list is available for download in three different sizes: all tokens, tokens that occur more than once, and tokens that occur more than twice, all in ISO-8859-1 (Latin-1) one entry per line. The five thousand most frequent forms are also available for browsing on the web site.

Latest versions/subcorpora:
Frequency List of Written Finnish Word Forms
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the website
Search for these versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:

Frequencies of Early Modern Finnish Words

This resource is offered by Kotus, Kotimaisten kielten keskus, the Institute for the Languages of Finland.

The list includes the word forms included in the Corpus of Early Modern Finnish of the Institute for the Languages of Finland together with their frequency information.

Latest versions/subcorpora:
Frequencies of Early Modern Finnish Words
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the website
Search for these versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:

Frequencies of Old Literary Finnish Words

This resource is offered by Kotus, Kotimaisten kielten keskus, the Institute for the Languages of Finland.

The resource contains a list of frequencies of old literary Finnish words. The list includes the words from the Corpus of Old Literary Finnish together with information about their frequency.

Latest versions/subcorpora:
Frequencies of Old Literary Finnish Words
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the website
Search for these versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:

The Helsinki Term Bank for the Arts and Sciences

The Helsinki Term Bank for the Arts and Sciences (HTB) is a multidisciplinary project which aims to gather a permanent terminological database for all fields of research in Finland. The project has created this Semantic MediaWiki platform, which offers a collaborative environment. This means that anyone can freely use it and also participate in the discussion about terms.

Latest versions/subcorpora:
The Helsinki Term Bank for the Arts and Sciences
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the website
Search for these versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:


Online Lexicon of Veps Language

This resource is offered by Kotus, Kotimaisten kielten keskus, the Institute for the Languages of Finland.

The resource contains the Online Lexicon of Veps Language from Lauri Kettunen’s (1885-1963) handwritten dictionary and notes. Kettunen travelled twice to Veps areas, in 1917-1918 and in 1934 with Lauri Posti and Paavo Siro.

The lexicon, which is based on the field notes, has been digitized.

Latest versions/subcorpora:
Online Lexicon of Veps Language
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the website
Open the resource in Sanat
Search for these versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:

Etymological Database of the Sami Languages

This resource is offered by Kotus, Kotimaisten kielten keskus, the Institute for the Languages of Finland.

The database is built on the ca. 25,000 North Saami entries of Nielsen’s Pohjoissaamen sanakirja [North Saami Dictionary]. The words were incorporated in the database in cooperation with the Finno-Ugric Department of the University of Helsinki. The database includes the variants, as well as the etymological and derivational references at the end of the word articles, given in Nielsen’s dictionary. Each word is also provided with English-language meanings based on the data in Nielsen’s dictionary; Finnish meanings based on Sammallahti’s Saamelais-suomalainen sanakirja [Saami-Finnish dictionary]; and German meanings based on Sammallahti and Nickel’s Saamisch-deutsches Wörterbuch.

More information on the Álgu project

Latest versions/subcorpora:
Etymological Database of the Sami Languages
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the website
Search for these versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091417

Dictionary of Old Literary Finnish

The Dictionary of Old Literary Finnish presents from the point of view of both meaning and usage as exhaustively as possible all the words of the Finnish literary sources from 1543-1810.
This language resource was published on 21st November 2014 and is to be continuously updated.
The Dictionary of Old Literary Finnish is the product of a joint project between the Institute for the Languages of Finland and the Kone Foundation.

Latest versions/subcorpora:
Dictionary of Old Literary Finnish
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the website
Search for these versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091416

Dictionary of Finnish Dialects

This resource is offered by Kotus, Kotimaisten kielten keskus, the Institute for the Languages of Finland. The purpose of the Dictionary of Finnish Dialects is to present the vocabulary of the Finnish dialects based on a large amount of data collected.

The first part of the dictionary, which was planned to contain 20 volumes, was published in 1985. By 2010 eight approximately 1000 pages long volumes were published. These volumes covered the words from the letter “a” to the word “kurvottaa”. In 2010, it was decided that the dictionary should be published from then on in an electronic form as an online service that can be used free of charge. The first part of the online dictionary (kus-kyntsöttää) was published in early 2012, after which the dictionary is to be enlarged with both new and already printed alphabetical parts about once a year. On 22 October 2014 the online dictionary was enlarged to contain the words from ”kala” to ”käävätä.”

The dictionary presents the entire vocabulary of all the Finnish dialects. It does not make a distinction based on whether the word is dialectal or standard / literary. Such a distinction would be to a certain degree impossible, since the vocabulary of standard Finnish is mostly based on the vocabulary of different Finnish dialects.

Latest versions/subcorpora:
Dictionary of Finnish Dialects
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the website
Search for these versions in META-SHARE

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091415

Name Component Lexicon

This resource is offered by Kotus, Kotimaisten kielten keskus, the Institute for the Languages of Finland. All name components with some frequency in the Swedish place name bank in Finland are presented in the lexicon. There is an emphasis on material from the most common name elements, out of which the majority of the place names are built. In addition to this suffixes and certain word endings are considered, as well as the most common types of loan names.

Latest versions/subcorpora:
Name Component Lexicon
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the website
Open the resource in Sanat
Search for these versions in META-SHARE

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091414

Names of Countries in Seven Languages

This resource is offered by Kotus, Kotimaisten kielten keskus, the Institute for the Languages of Finland. The website contains the names of the independent states of the world and their geographically separate regions in 7 languages. The list includes also Western Sahara, Palestine, Hong Kong, Macau, Taiwan, Antarctica and the 21 republics of the Russian Federation. Also the official names of the countries as well as commonly used unofficial names are included.

Each name is presented on a separate page in Finnish, Swedish, Northern Sami, English, French, German and Russian.

Latest versions/subcorpora:
Names of Countries in Seven Languages
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the website
Search for these versions in META-SHARE

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091413

Dictionary of Contemporary Finnish

The Dictionary of Contemporary Finnish is a dictionary of standard Finnish made by the Institute for the Languages of Finland. It is based on an extensive, constantly expanding word archive of contemporary Finnish. The dictionary provides information on the meanings, usage and nuances of style of contemporary Finnish words, as well as about their inflection and spelling. The information provided by the dictionary is based on the decisions of the Finnish Language Board. The dictionary contains over 100 000 lemmas.

More information

Latest versions/subcorpora:
Dictionary of Contemporary Finnish
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the web page
Search for these versions in META-SHARE

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091412

Finnish Wikipedia 2017

The Finnish Wikipedia 2017 source material corpus contains all Finnish articles from the online encyclopedia Wikipedia available in 1 January 2018. The text parts of the articles have been extracted from Wikipedia Dumps with WikiExtractor.

Latest versions/subcorpora:
Finnish Wikipedia 2017, source
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for these versions in META-SHARE

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091411



The Terminological Vocabulary of Kela – Benefit-related Concepts, 4th Edition (TSK 49)

The Terminological Vocabulary of Kela – Benefit-related Concepts, 4th edition (TSK 49) contains information on more than 500 concepts in term records and concept diagrams. The concepts have been given definitions and term recommendations in Finnish and Swedish. The relations between the concepts are illustrated with the help of concept diagrams. The vocabulary is totally bilingual: foreword, instructions, concept descriptions and concept diagrams have all been translated into Swedish. The subjects covered in the vocabulary are the benefits provided by Kela (the Social Insurance Institution of Finland), e.g. sickness allowances and reimbursements for medical expenses under the Health Insurance Act, international medical care, occupational health care, disability benefits and interpreting services, rehabilitation organized and reimbursed by Kela, pensions paid by Kela, housing benefits, financial aid for students, conscript’s allowance, benefits for families with children and unemployment allowances.

More information (in Finnish)

Latest versions/subcorpora:
The Terminological Vocabulary of Kela – Benefit-related Concepts, 4th Edition (TSK 49)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for these versions in META-SHARE

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091410

The Vocabulary of Safety and Health at Work (TSK 35)

The Vocabulary of Safety and Health at Work (TSK 35) contains 465 concepts with Finnish term recommendations, definitions and notes. The equivalents are given in Swedish, English, German and French. The definitions and notes have been translated into Swedish. The chapters of the vocabulary include occupational health, safety at work, work environment, risk management, administration of working life and organizing of safety and health at work as well as important registers, methods and cooperation organizations.

More information (in Finnish)

Latest versions/subcorpora:
The Vocabulary of Safety and Health at Work (TSK 35)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for these versions in META-SHARE

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091409

Frequency Lexicon of the Finnish Newspaper Language

The Frequency Lexicon of the Finnish Newspaper Language contains the most common 9996 lemmas of Finnish newspaper language. The lexicon was compiled in 2004 from a source material containing 43,999,826 words.

Latest versions/subcorpora:
Frequency Lexicon of the Finnish Newspaper Language
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for these versions in META-SHARE

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091408

The N-grams of the Newspaper and Periodical Corpus of the National Library of Finland

The National Library of Finland has digitized a large proportion of Finland’s Finnish and Swedish newspapers, magazines, and periodicals published between 1820 and 2000 (Finnish) and between 1770 and 1940 (Swedish). This resource contains sets of unigrams, bigrams and trigrams extracted from a corpus that has been compiled from the digitized newspapers by the University of Helsinki.

The resource consists of plain UTF-8 encoded text files, each containing a list of n-grams that have been ordered by their frequencies from highest to lowest. Each line in a file consists of two or more fields separated by a whitespace character. The first field indicates the absolute frequency of a unique n-gram, and the remaining fields contain the tokens (strings of non-whitespace characters) of the n-gram itself. Uppercase letters have been retained as such and have not been converted into lowercase letters. Punctuation characters are treated as separate tokens except when they are part of an abbreviation (”etc.”, ”mm.”) or when they separate a case ending or an enclitic from an abbreviation or a sign (”EU:ssa”, ”%:iin”), as per the typographic principles of standard Finnish. The n-grams have been computed across sentence boundaries for each decade (from the 1770s to the 1940s and from the 1820s to the 2000s respectively) as well as for the entire corpus, with unigrams, bigrams and trigrams in separate files.

Since the source material has been digitized by the means of optical character recognition (OCR), the resource also contains erroneous word forms and non-word strings of characters. Furthermore, due to the large time span of the original corpus, the resource contains several lexical items and spelling variants that have since become obsolete in standard Finnish and standard Swedish.

The resource will be updated in the future as improvements are being made to the source material.

The data is derived from The Newspaper and Periodical Corpus of the National Library of Finland

Latest versions/subcorpora:
The Finnish N-grams 1820-2000 of the Newspaper and Periodical Corpus of the National Library of Finland
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
The Swedish N-grams 1770-1940 of the Newspaper and Periodical Corpus of the National Library of Finland
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for these versions in META-SHARE

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091407

Relative frequencies of part-of-speech n-grams in native and translated Finnish literary prose

The corpus contains data from Matias Tamminen’s MA thesis study ”Then shall I know fully: Relative frequencies of part-of-speech n-grams in native and translated Finnish literary prose” by Matias Tamminen (2018), University of Helsinki.

The source data are the corpus Classics of English and American Literature translated by Kersti Juva, English-Finnish parallel corpus and the corpus of Translated Finnish.

Latest versions/subcorpora:
Relative frequencies of part-of-speech n-grams in native and translated Finnish literary prose
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for these versions in META-SHARE

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091406

Karelian Dictionary

The six volumes of the Karelian dictionary were published in 1968-2005 by the Institute for the Languages of Finland and the Finno-Ugrian Society.

The online dictionary is a project of the Insitute for the Languages of Finland. It is updated according to necessity and resources.

More information on the dictionary: http://kaino.kotus.fi/kks

Website: https://kaino.kotus.fi/cgi-bin/kks/karjala.cgi

Downloadable in XML format: http://kaino.kotus.fi/kks/lataa/kksxml.zip

Latest versions/subcorpora:
Karelian Dictionary
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Headword List of the Karelian Dictionary
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open web page
Search for these versions in META-SHARE

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091405

Finnish Verbal Colorative Constructions

The resource contains Finnish verbal colorative constructions from the database of the word notes used when creating the dictionaries Nykysuomen sanakirja and Kielitoimiston sanakirja (http://www.kielitoimistonsanakirja.fi/), from various literary works, from a query test made by Maria-Magdalena Jürvetson as well as from different Internet sources.

Latest versions/subcorpora:
Finnish Verbal Colorative Constructions
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for these versions in META-SHARE

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091404

Search the Language Bank Portal:
Juho Leinonen
Researcher of the Month: Juho Leinonen

 

Tulevat tapahtumat

  1. CLARIN Annual Conference 2021

    27.9.2021 10.0029.9.2021 16.15

Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information