Finnish WordNet

The Finnish WordNet is a lexical database for Finnish. It is a part of the FIN-CLARIN infrastructure project.

FinnWordNet is licensed under the Creative Commons Attribution (CC-BY) 3.0 licence. As a derivative of the Princeton WordNet, FinnWordNet is also subject to the Princeton WordNet licence.

FinnWordNet contains words (nouns, verbs, adjectives and adverbs) grouped by meaning into synonym groups representing concepts. These synonym groups are linked to each other with relations such as hyponymy and antonymy, creating a semantic network.

FinnWordNet can be used in language technology research and applications. It can also be used interactively as an electronic thesaurus.

The first version of FinnWordNet has been created by having the words of the original English (Princeton) WordNet (version 3.0) translated into Finnish by professional translators.

Detailed information: http://www.kielipankki.fi/corpora/finnwordnet/

Latest versions/subcorpora:
The Downloadable Version of the Finnish WordNet
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
The Sanat Version of the Finnish WordNet
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the corpus in Sanat
Search for these versions in META-SHARE

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Sanat Dictionary Service. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:

Finnish FrameNet

The database of Finnish semantic frames is based on the original English language FrameNet housed at the International Computer Science Institute in Berkeley, California. The Finnish FrameNet project started by collecting 90,592 examples of different frame examples from the original Berkeley FrameNet. The examples represented 866 different frames and the elements that evoke them.

The FinnFrameNet project is a part of the FIN-CLARIN consortium.

Latest versions/subcorpora:
Finnish FrameNet
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the corpus in Sanat
The Sanat Version of the Finnish FrameNet
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the corpus in Sanat
The Sanat Version of the Finnish TransFrameNet
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the corpus in Sanat
Search for these versions in META-SHARE

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Sanat Dictionary Service. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091403

FinEst BERT

This corpus offers a Bidirectional Encoder Representations from Transformers (BERT) multilingual model trained from scratch, covering three languages: Finnish, Estonian, and English. Used for various NLP classification tasks on the mentioned three languages, supporting both monolingual and multilingual/crosslingual (knowledge transfer) tasks. Whole-word masking used during data preparation and training; trained for 40 epochs with sequence length 128 and another 4 epochs with sequence length 512. FinEst BERT model published here is in pytorch format.

Corpora used:
Finnish – STT articles, CoNLL 2017 shared task, Ylilauta downloadable version
Estonian – Ekspress Meedia articles, CoNLL 2017 shared task
English – English wikipedia

Latest versions/subcorpora:
FinEst BERT
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for these versions in META-SHARE

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091402

Search the Language Bank Portal:
Juho Leinonen
Researcher of the Month: Juho Leinonen

 

Tulevat tapahtumat

  1. CLARIN Annual Conference 2021

    27.9.2021 10.0029.9.2021 16.15

Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information