Mink

At kielipankki.fi/future/mink, a browser-based tool called Mink is available, where users logged in via Haka can upload their own text materials for processing. The file formats supported by Mink include plain text (UTF-8), XML (where the analysis pipeline preserves the structures), Microsoft Word (.docx), Open Document (.odt), PDF, and CoNLL-U.

You can perform advanced searches on your own text corpora within the Korp environment accessible through the Mink service. If necessary, texts can first be automatically parsed and annotated in Mink, which improves the search capabilities in Korp. For now, the Mink platform supports lemmatization (i.e., the analysis of the base forms of the words) as well as morphological and dependency-based syntactic analysis for Finnish, Swedish, and English text, and the recognition of named phrases in English text. In addition to using your corpus via Korp, you can also save the analyzed texts to your own computer.

With Mink, users can prepare, test, and explore their own Korp corpus. For now, only the user themselves can access the materials they have transferred to the Korp environment within Mink. At a later stage, the plan is to make it possible to share the data stored in Mink with the members of the user’s own research group, for example. Separate arrangements can also be made to make the finalized corpus available to other researchers through the public Korp service of the Language Bank.

For now, more detailed instructions on how to use Mink can be found on the Swedish Språkbanken website. Please note that the Mink environment developed by Språkbanken has been slightly adapted for users of the Language Bank of Finland, so not all features work in exactly the same way in both Mink services.

The Mink platform is currently being further developed, and the Language Bank welcomes feedback on its functionality; see contact information.

Access Mink

Mink (Språkbanken Text)

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2026042421

Mink – omien aineistojen analysointi ja vienti Korpiin

In English

Osoitteessa kielipankki.fi/future/mink on käytettävissä selainpohjainen työkalu Mink, johon Haka-kirjautuneet käyttäjät voivat viedä käsiteltäviksi omia tekstiaineistojaan. Minkissä tuetut aineistoformaatit ovat muotoilematon teksti (utf-8), XML (jonka sisältämät rakenteet analyysiputki säilyttää), Microsoft Word (.docx), Open Document (.odt), PDF ja CoNLL-U.

Omista tekstiaineistoista voi tehdä edistyneitä hakuja Mink-palvelun sisällä näkyvässä Korp-ympäristössä. Tekstit voi tarvittaessa ensin automaattisesti jäsentää ja annotoida Minkissä, jolloin Korpin hakumahdollisuudet paranevat. Mink-alusta tukee toistaiseksi lemmatisointia (eli sanojen perusmuotoistusta) sekä morfologista ja dependenssisyntaktista analyysia suomen-, ruotsin- ja englanninkieliselle tekstille sekä nimettyjen ilmausten tunnistusta englanninkieliselle tekstille. Korpin lisäksi analyysin tulokset voi myös tallentaa takaisin omalle koneelle.

Minkin avulla käyttäjä voi siis valmistella, kokeilla ja tutkia omaa Korp-korpustaan. Toistaiseksi vain käyttäjä itse pääsee käyttämään Minkin Korp-ympäristöön siirtämäänsä aineistoa. Myöhemmässä vaiheessa on tarkoitus, että Minkissä olevaa aineistoa olisi mahdollista jakaa esimerkiksi oman tutkimusryhmän jäsenten kanssa. Erikseen voidaan myös sopia valmiin korpuksen toimittamisesta muiden tutkijoiden saataville Kielipankin yhteisen Korp-palvelun kautta.

Tarkempia ohjeita Minkin käyttöön löytyy toistaiseksi ruotsalaisen Språkbankenin sivuilta. Huomaa, että Språkbankenissa kehitettyä Mink-ympäristöä on jonkin verran sovitettu suomalaisen Kielipankin käyttäjiä varten, joten kaikki ominaisuudet eivät välttämättä toimi samalla tavalla molemmissa Mink-palveluissa.

Mink-ympäristöä kehitetään edelleen ja Kielipankki ottaa vastaan palautetta Minkin toimivuudesta, ks.yhteystiedot.

Avaa Mink

Mink (Språkbanken Text)

Tämän aineistoryhmäsivun pysyvä tunniste: http://urn.fi/urn:nbn:fi:lb-2026042422

Finland Swedish Online

Finland Swedish Online is a platform offering online courses for learners of Finland Swedish. The service is provided by the University of Helsinki. The service is based on Icelandic Online provided by the University of Iceland. The courses are offered at different levels. They are learner centered with interactive visual and listening exercises organized around themes relevant to life in Finland. The courses are supported by glossaries, grammars and dictionaries.

Access Finland Swedish Online

Try out the related service for Icelandic, Iclandic Online

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2024112801

INCEpTION

INCEpTION is a certified open-source web annotation service that has been developed by the Faculty of Computer Science of Technische Universität Darmstadt and is available to all registered users of the CLARIN:EL Research Infrastructure.

INCEpTION offers a generic multi-user annotation environment aiming

to cover three essential aspects of text annotation in a single tool: corpus building, knowledge modelling and annotation and
to combine them with machine-learning-based assistive mechanisms (so-called recommenders) to improve the annotation efficiency and quality.

INCEpTION service is hosted at Kielipankki’s CLARIN partners at CLARIN:EL in Greece. (Click here to view their Privacy Policy.)

To start using the INCEpTION service Click ”Use Service” > ”Log in to access” > ”CLARIN Service Provider Federation login” and select your home organization.

For more information see the INCEpTION User Documentation.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2024081601

Nordic Tweet Stream (NTS) haku- ja visualisointikäyttöliittymä

In English

NTS on monikielinen monitorikorpus, joka sisältää maantieteellisesti paikannettuja twiittejä ja niihin liittyviä metatietoja Pohjoismaista. Kaikkiaan se sisältää lähes 74 miljoonaa viestiä sadoilta tuhansilta käyttäjätileiltä Tanskasta, Suomesta, Islannista, Norjasta ja Ruotsista. NTS-tiedot kattavat ajanjakson tammikuun 2013 ja toukokuun 2023 välillä, ja ne kerättiin Twitter Academic API:n avulla, joka on nyt suljettu.

NTS:n tarkoituksena on helpottaa SSH:n perustutkimusta. NTS:ssä on helppokäyttöinen graafinen käyttöliittymä, joka tukee nopeaa tiedonsaantia, jotta tutkijat voivat keskittyä tietojen analysointiin. Tietoaineisto mahdollistaa erityyppiset tutkimukset. Esimerkiksi on mahdollista tutkia julkista keskustelua ja tunteita lähihistorian tapahtumista (esim. COVID-19-pandemia, Nato-jäsenyysprosessi jne.). Tietokokonaisuus on myös resurssi sosiolingvistiselle tutkimukselle ja monikielisyyden tutkijoille.

Tutustu verkkosivustoon.

Lisää tietoa NTS:stä

Jos käytät NTS-käyttöliittymää ja hyödynnät tuloksia julkaisuissasi, mainitse hiljattain julkaistu artikkeli, joka on saatavilla verkossa:
[1] Laitinen, Mikko, Jonas Lundberg, Magnus Levin & Rafael Martins. 2018. The Nordic Tweet Stream: A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data, Proc. of Digital Humanities in the Nordic Countries 3rd Conference, Helsinki, Finland, March 7-9, 2018, CEUR-WS.org, online CEUR-WS.org/Vol-2084/short10.pdf.

Tämän sivun pysyvä tunniste: http://urn.fi/urn:nbn:fi:lb-2024041502

Nordic Tweet Stream (NTS) search & visualization interface

Suomeksi

The NTS is a multilingual monitor corpus of geolocated tweets and associated metadata from the Nordic region. Altogether, it contains nearly 74 million messages from hundreds of thousands of user accounts from Denmark, Finland, Iceland, Norway, and Sweden. The NTS data cover the period between January 2013 and May 2023 and were collected using the Twitter Academic API, which is now closed.

The purpose of the NTS is to facilitate fundamental research in SSH. The NTS comes with an easy-to-use graphic interface that supports quick data access so that researchers can focus on data analysis. The dataset enables various types of research. For instance, it is possible to study public discourses and sentiment concerning events in recent history (e.g., the COVID-19 pandemic, the NATO membership process, etc.). The dataset is also a resource for sociolinguistic research and for scholars of multilingualism.

Please visit the website.

About NTS

If you use the NTS interface and use the findings in your publications, please cite the recent paper, which is available online:
[1] Laitinen, Mikko, Jonas Lundberg, Magnus Levin & Rafael Martins. 2018. The Nordic Tweet Stream: A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data, Proc. of Digital Humanities in the Nordic Countries 3rd Conference, Helsinki, Finland, March 7-9, 2018, CEUR-WS.org, online CEUR-WS.org/Vol-2084/short10.pdf.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2024041501

UDPipe

UDPipe is a trainable pipeline for tokenization, tagging, lemmatization and dependency parsing of CoNLL-U files. UDPipe is language-agnostic and can be trained given annotated data in CoNLL-U format. Trained models are provided for nearly all UD treebanks. UDPipe is available as a binary for Linux/Windows/OS X, as a library for C++, Python, Perl, Java, C#, and as a web service. Third-party R CRAN package also exists.

UDPipe is a free software distributed under the Mozilla Public License 2.0 and the linguistic models are free for non-commercial use and distributed under the CC BY-NC-SA license, although for some models the original data used to create the model may impose additional licensing conditions. UDPipe is versioned using Semantic Versioning.

Kielipankki version:
UDPipe Kielipankki version Metadata and license	Access to Puhti
Source version:
UDPipe Metadata and license	Access to GitHub

For more information on this tool have a look at the UDPipe User’s manual

The latest UDPipe version and online tool available via LINDAT: https://lindat.mff.cuni.cz/services/udpipe/

More information on the Kielipankki version:

Using UDPipe on CSC’s servers requires a CSC user account: https://research.csc.fi/accounts-and-projects

UDPipe is installed in CSC’s computing environment (invoke with: module load udpipe) in the following configuration:
Software: UDPipe 1.2.0
Models: 2.3-181115

UDPipe was compiled and installed from Source without local modifications. Please refer to the user’s manual.

The tool was installed using Ansible scripts that can be found here: https://github.com/CSCfi/Kielipankki-palvelut/tree/Dec2018/commandline/roles/udpipe

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2024021901

Finnish Dependency Parsing Pipeline

Kielipankki version:
Turku Dependency Parser Pipeline, Kielipankki version (TDPP-LBF) Metadata and license	Access to GitHub
TurkuNLP Finnish Dependency Parser:
Finnish dependency parser developed by TurkuNLP (TDPP) Metadata and license	Access to GitHub

The Turku Dependency Parser Pipeline, Kielipankki version (TDPP-LBF) is a version of the open source dependency parsing pipeline developed by the University of Turku NLP group for analyzing Finnish text, adapted by Kielipankki – the Language Bank of Finland.

For further information on the source version please visit the project’s website.

On Kielipankki’s GitHub repository you can find VRT tools adapted from the original pipeline (vrt-tdp-…):

vrt-tdp-alpha-fillup
vrt-tdp-alpha-lookup
vrt-tdp-alpha-marmot
vrt-tdp-alpha-parse

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2024021503

Transkribus

Transkribus is a comprehensive platform for the digitisation, AI-powered text recognition, transcription and searching of historical documents.

Open the website

User instructions

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021110305

TurkuNLP word embedding demo (word2vec)

A tool developed for analyzing the semantic similarity of words.

The demo is based on word embeddings induced using the word2vec method, trained on 4.5B words of Finnish from the Finnish Internet Parsebank project and over 2B words of Finnish from Suomi24. On the Parsebank project page you can also download the vectors in binary form. The software behind the demo is open-source, available on GitHub. The demo is maintained by the Turku NLP group.

Demo:
TurkuNLP word embedding demo Metadata and license	Try out the demo
Tool:
word2vec Metadata and license	Tool project page

For word embeddings trained with word2vec and available in Kielipankki – The Language Bank of Finland please visit the wordvec resource group page.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021110304

WebAnno

The Language Bank’s Webanno instance was shutdown 15.8.2024

Existing and new users are encouraged to start using the much newer INCEpTION service hosted at our CLARIN partners at CLARIN:EL in Greece. (Click here to view their Privacy Policy.)

To start using the INCEpTION service Click ”Use Service” > ”Log in to access” > ”CLARIN Service Provider Federation login” and select your home organization.

For more information see the INCEpTION User Documentation. If you have questions contact us at kielipankki (ät) csc.fi.

For reference: Historical documentation about WebAnno is available in Github.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021110303

Mylly

Mylly service has been discontinued

Due to very low usage, the Mylly service was shut down. If you still have data in Mylly or in case you wish to utilise the Mylly tool scripts on other services, read the instructions here.

Mylly is a versatile data analysis platform with interactive visualizations and workflows. It can be used to build workflows with a variety of tools, including morphosyntactic parsing, character set conversion and speech recognition.

About Mylly

Mylly User Guide

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021110302

Sparv Pipeline

Sparv, Språkbanken’s text analysis tool, is a multilingual toolkit provided by the Swedish Språkbanken for parsing and annotating text in various languages.

Latest version:
Sparv Metadata and license	Access

User manual

Latest Sparv release on GitHub

Sparv GUI

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021110301

Turku Neural Parser Pipeline

The Turku Neural Parser Pipeline is a neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more than 50 languages.

The pipeline is installed in CSC’s computing environment as a Singularity container for the languages Finnish, Swedish and English.

Kielipankki version:
Turku Neural Parser Pipeline, Kielipankki version (TNPP-LBF) Metadata and license	Access to Puhti
TurkuNLP Finnish Neural Parser:
Turku Neural Parser Pipeline (TNPP) Metadata and license	Access to GitHub

Kielipankki – the Language Bank of Finland has adapted the parser for its VRT format ( CWB-VRT):
Source for the Kielipankki version on GitHub

On Puhti you can see a list of all installed versions and languages using:
module use /appl/soft/ai/singularity/modulefiles/
module spider turku-neural-parser

For more information on this tool have a look at the following links:
Parser Demo
Turku-neural-parser-pipeline manual TNPP no longer maintained by TurkuNLP, see the note from May 2024!
TurkuNLP DockerHub

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021101102

Finnish Tagtools

This software package provides finnish-postag, a part-of-speech and morphology tagger for Finnish, and finnish-nertag, a named entity recogniser for Finnish.
This software is also installed in CSC’s computing environment (module load finnish-tagtools).

Both tools take running text from standard input and produce tabular output (one token per line) to standard output. See –help messages for more details.

An installer is provided in the form of a Makefile. More information can be found in the README file in the download folder. Also the source files are available for download.

Latest version:
Finnish Tagtools 1.6 Metadata and license	Download the resource

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021101101

Last modified on 2026-02-18

Search the Language Bank Portal:

Researcher of the Month: Milla Uusitupa

Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information