FinnWordNet – The Finnish WordNet CC BY 3.0

Tietoa suomeksi

PLEASE NOTE: FinnWordNet information moved to this location on 2019-09-16. The URLs of FinnWordNet demos and file downloads have changed, and the current URLs are not necessarily final. The demos and file downloads were not available between March and June 2018. Some demos still do not work, and the feedback form is unavailable. We aplogize for the situation. (Updated 2019-09-16.)

General information

FinnWordNet – the Finnish WordNet is a lexical database for Finnish. It is a part of the FIN-CLARIN infrastructure project.

FinnWordNet is licensed under the Creative Commons Attribution (CC-BY) 3.0 licence. As a derivative of the Princeton WordNet, FinnWordNet is also subject to the Princeton WordNet licence.

FinnWordNet contains words (nouns, verbs, adjectives and adverbs) grouped by meaning into synonym groups representing concepts. These synonym groups are linked to each other with relations such as hyponymy and antonymy, creating a semantic network.

FinnWordNet can be used in language technology research and applications. It can also be used interactively as an electronic thesaurus.

The first version of FinnWordNet has been created by having the words of the original English (Princeton) WordNet (version 3.0) translated into Finnish by professional translators.

The most recent version of FinnWordNet is 2.0, released in October 2012. The persistent identifier of this version is urn:nbn:fi:lb-2014052714.

Even though FinnWordNet is not currently being actively developed, you can send feedback on it to fin-clarin (at) helsinki.fi.

Please note that the name of the resource is FinnWordNet (with a double n), not FinWordNet.

Search interfaces and demos

FinnWordNet data can be searched or viewed in a couple of different search interfaces or demos:

Download data

FinnWordNet data package

The FinnWordNet data can be downloaded from the download service of the Language Bank of Finland as a ZIP package that contains the data in a couple of different formats:

  • relations in a tab-separated-values (TSV) format: synonym sets, word senses, semantic and lexical relations, and translations;
  • Princeton WordNet database format;
  • Princeton WordNet lexicographer file format (source format for the data files); and
  • various additional lists (synsets, translations, relations) in a tab-separated-values format, complementing the relational data.

For more information, please see the README file.

In addition, the package contains the WordNet 3.0 Grind program modified to support FinnWordNet data. Since the downloadable package contains the compiled database, you probably do not need the modified Grind unless you modify the FinnWordNet data (lexicographer files). Compiling the program requires a Unix, Linux or similar environment and a C compiler; please see the associated README file for more information.

Please also note that searching the FinnWordNet data files with the wn search program requires a version patched by Debian.

HFST thesaurus and translation dictionary transducers based on FinnWordNet

Please note that the transducers are currently not available for download. We apologize for the situation. (2019-09-13)

The FinnWordNet (and Princeton WordNet) data is also used in the HFST finite-state transducers that work as Finnish or English thesauri or Finnish–English or English–Finnish translation dictionaries. The transducers recognize inflected forms of words, and the thesauri have variants generating synonyms in the same inflected form as the input word. More information about the transducers is available in the README file.

  • English thesauri
  • Finnish thesauri
  • Finnish–English and English–Finnish translation dictionaries

To use the transducers, you need either the full HFST library and tools (version 3.2.0 or later), the stand-alone HFST optimized lookup (version 1.3 or later) or the Java implementation of the optimized lookup (2011-05-23 or later).

FinnWordNet in WN-LMF and Lemon

The FinnWordNet data is also downloadable in the WN-LMF (WordNet Lexical Markup Framework) and Lemon (The Lexicon Model for Ontologies) XML formats from the Open Multilingual Wordnet site: download data ZIP package.

Technical corrections to FinnWordNet by Frankie Robertson

Frankie Robertson has made some technical corrections and changes to FinnWordNet data to make it work with the NLTK and extJWNL libraries. The corrected version is available in GitHub. The corrections are yet to be integrated to the official FinnWordNet data.

Publications

General description

Bilinguality and technical aspects

Extending FinnWordNet

Applications

Other research that uses FinnWordNet

Project information

The FinnWordNet development project ran in 2010–2012. The development of FinnWordNet was funded by the FIN-CLARIN and META-NORD projects. The META-NORD project received funding from the European Union’s ICT Policy Support Programme as part of the Competitiveness and Innovation Framework Programme under grant agreement no. 270899.

The following people participated in the FinnWordNet project:

Advisors:
Krister Lindén (project leader) (2010–), Lauri Carlson (2010–2012), Ulla Vanhatalo (2010–2012)
Other members:
Hissu Hyvärinen (2010–2012), Juha Kuokkala (2012), Kristiina Muhonen (2010), Jyrki Niemi (2010–2012), Pinja Pennala (2012), Paula Pääkkö (2010–2011)

News

New locations for the FinnWordNet information page and download (2019-09-19)

The FinnWordNet information page has been moved to the Portal of the Language Bank of Finland, and the download location to the Download service of the Language Bank of Finland. The information page has also been updated. Updating the demos and transducers is still partly in progress.

Version 2.0 of FinnWordNet released (2012-10-05)

Version 2.0 of FinnWordNet data has been released with thousands of new word senses added and hundreds of existing ones corrected. The data is downloadable and in use in the Web search interface. The search interface no longer asks you to rate the synonymy of random words.

As of version 2.0, FinnWordNet has been extended beyond being a translation of Princeton WordNet by adding new synsets as hyponyms of existing synsets (without glosses and English translations). The new synsets correspond to senses of common Finnish compound words.

The primary data format is now a relational format. See the corresponding README file in the downloadable package for more information.

FinnWordNet 2.0 contains 120,449 synsets (2,790 more than version 1.1.2), 208,645 word senses (16,845 more), 140,515 unique words (9,251 more) and 244,742 translation relations (14,695 more). Some of the additions and corrections are based on the suggestions received from users of FinnWordNet. All feedback is welcome: fin-clarin (at) helsinki.fi.

For more information, please see the NEWS file in the downloadable package.

Older news

Search the Language Bank Portal:

Researcher of the Month: Katri Leino

 

Tulevat tapahtumat

  1. Kurssi: Puheen analyysin perusteet

    28.10.201913.12.2019
  2. Course: Data Clinic 2019-20

    1.11.201917.4.2020

Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4140599 / +358 29 4129317