
This resource contains a portal with over 105,621 pages linked together. The pages contain lists of most similar neighbours, ranked by Double Mutual Rank (DOMUR) similarity measure, for 105,621 cuneiform texts exported from Oracc.
| Latest versions/subcorpora: | |
| ANEE Idiolect Network Portal Metadata and license |
Open the website |
| Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
|---|---|---|---|---|---|---|---|---|
| Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
These resource versions are not yet available in the Language Bank of Finland.
| Shortname | Name and metadata | License | Formats | Support level | Contact Person | Resource group and help | Location | Other information |
|---|---|---|---|---|---|---|---|---|
| Shortname | Name and metadata | License | Formats | Support level | Contact Person | Resource group and help | Location | Other information |
This resource collection contains word embeddings trained with word2vec from various corpora.
The embedding file is in a simple and easily parsed textual format produced by word2vec. The first line in the file gives the vocabulary size and dimension. Each line after that begins with a vocabulary item, followed by a space, followed by 128 floating point numbers (represented textually) each followed by a space.
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2022041401
| Latest versions/subcorpora: | |
| FinnONTO – ONKI Metadata and license |
Open the website |
| Look for all versions of this resource in META-SHARE |
The ONKI service contains Finnish and international ontologies, vocabularies and thesauri needed for publishing content cost-efficiently on the Semantic Web. ONKI is published and maintained by Semantic Computing Research Group SeCo. It is part of the on-going project to build a national semantic web infrastructure to Finland (FinnONTO).
The service offers various ontologies under different categories like:
– General upper ontology
– Museum artifacts
– Music
– Design
– Health
– Photography
– Agriculture
– Government
– Literature
– Linguistics
– Literary research
– Cultural research
– Economics
– Seafaring
– Military
All ontologies are being merged into one ontology covering all the categories called The Finnish Collaborative Holistic Ontology (KOKO).
Most of the ontologies are multilingual. In the General upper ontology the names of concepts are in Finnish, Swedish and English, while for example in the Linguistics ontology the languages used are Finnish, Swedish, English, German and Estonian.
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021093001
The Helsinki Term Bank for the Arts and Sciences (HTB) is a multidisciplinary project which aims to gather a permanent terminological database for all fields of research in Finland. The project has created this Semantic MediaWiki platform, which offers a collaborative environment. This means that anyone can freely use it and also participate in the discussion about terms.
| The Helsinki Term Bank for the Arts and Sciences Metadata and license Attribution instructions |
Open the website |
Detailed information on the content, user rights and licenses can be found from the metadata record.
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021092002
| Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
|---|---|---|---|---|---|---|---|---|
| Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
These resource versions are not yet available in the Language Bank of Finland.
| Shortname | Name and metadata | License | Formats | Support level | Contact Person | Resource group and help | Location | Other information |
|---|---|---|---|---|---|---|---|---|
| Shortname | Name and metadata | License | Formats | Support level | Contact Person | Resource group and help | Location | Other information |
The Finnish WordNet is a lexical database for Finnish. It is a part of the FIN-CLARIN infrastructure project.
FinnWordNet is licensed under the Creative Commons Attribution (CC-BY) 3.0 licence. As a derivative of the Princeton WordNet, FinnWordNet is also subject to the Princeton WordNet licence.
FinnWordNet contains words (nouns, verbs, adjectives and adverbs) grouped by meaning into synonym groups representing concepts. These synonym groups are linked to each other with relations such as hyponymy and antonymy, creating a semantic network.
FinnWordNet can be used in language technology research and applications. It can also be used interactively as an electronic thesaurus.
The first version of FinnWordNet has been created by having the words of the original English (Princeton) WordNet (version 3.0) translated into Finnish by professional translators.
Detailed information: http://www.kielipankki.fi/corpora/finnwordnet/
This page has a persistent identifier: http://urn.fi/urn:nbn:fi:lb-2014052714
| Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
|---|---|---|---|---|---|---|---|---|
| Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
These resource versions are not yet available in the Language Bank of Finland.
| Shortname | Name and metadata | License | Formats | Support level | Contact Person | Resource group and help | Location | Other information |
|---|---|---|---|---|---|---|---|---|
| Shortname | Name and metadata | License | Formats | Support level | Contact Person | Resource group and help | Location | Other information |
The database of Finnish semantic frames is based on the original English language FrameNet housed at the International Computer Science Institute in Berkeley, California. The Finnish FrameNet project started by collecting 90,592 examples of different frame examples from the original Berkeley FrameNet. The examples represented 866 different frames and the elements that evoke them.
The FinnFrameNet project is a part of the FIN-CLARIN consortium.
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091403
| Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
|---|---|---|---|---|---|---|---|---|
| Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
These resource versions are not yet available in the Language Bank of Finland.
| Shortname | Name and metadata | License | Formats | Support level | Contact Person | Resource group and help | Location | Other information |
|---|---|---|---|---|---|---|---|---|
| Shortname | Name and metadata | License | Formats | Support level | Contact Person | Resource group and help | Location | Other information |
FinEst BERT is a Bidirectional Encoder Representations from Transformers (BERT) multilingual model trained from scratch, covering three languages: Finnish, Estonian, and English. Used for various NLP classification tasks on the mentioned three languages, supporting both monolingual and multilingual/crosslingual (knowledge transfer) tasks. Whole-word masking used during data preparation and training; trained for 40 epochs with sequence length 128 and another 4 epochs with sequence length 512. FinEst BERT model published here is in pytorch format.
Corpora used:
Finnish – STT articles, CoNLL 2017 shared task, Ylilauta downloadable version
Estonian – Ekspress Meedia articles, CoNLL 2017 shared task
English – English wikipedia
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091402
Team 1 of the Centre of Excellence in Ancient Near Eastern Empires (ANEE) has created a lexical portal that functions as a graphic semantic dictionary. Via this portal the user can explore semantic networks for one (or multiple) words that one is interested in. By following the links, one can also trace attestations back to the datasets in Korp (Oracc, Achemenet, and BALT) and from there to Open Richly Annotated Cuneiform Corpus (Oracc) and other resources.
| Latest versions/subcorpora: | |
| Neo-Babylonian Lexical Networks Metadata and license Attribution instructions |
Open the website |
| Neo-Babylonian Lexical Networks – the dataset Metadata and license Attribution instructions |
Open the website |
| ANEE Lexical Networks v. 2.0 Metadata and license Attribution instructions |
Open the website |
| ANEE Lexical Networks v. 2.0 – the dataset Metadata and license Attribution instructions |
Open the website |
| Archived versions: | |
| ANEE lexical portal of Akkadian: fastText Metadata and license Attribution instructions |
Open the website |
| ANEE lexical portal of Akkadian: PMI Metadata and license Attribution instructions |
Open the website |
| ANEE lexical portal of Akkadian: dataset Metadata and license Attribution instructions |
Open the website |
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021082001
Last modified on 2025-05-22
