UDPipe

UDPipe is a trainable pipeline for tokenization, tagging, lemmatization and dependency parsing of CoNLL-U files. UDPipe is language-agnostic and can be trained given annotated data in CoNLL-U format. Trained models are provided for nearly all UD treebanks. UDPipe is available as a binary for Linux/Windows/OS X, as a library for C++, Python, Perl, Java, C#, and as a web service. Third-party R CRAN package also exists.

UDPipe is a free software distributed under the Mozilla Public License 2.0 and the linguistic models are free for non-commercial use and distributed under the CC BY-NC-SA license, although for some models the original data used to create the model may impose additional licensing conditions. UDPipe is versioned using Semantic Versioning.

Copyright 2017 by the Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Czech Republic.

Kielipankki version:  
UDPipe Kielipankki version
icon-info-circle Metadata and license
Access to Puhti
Source version:  
UDPipe
icon-info-circle Metadata and license
Access to GitHub
Look for all versions of this tool in META-SHARE  

For more information on this tool have a look at the UDPipe User’s manual

 

More information on the Kielipankki version:

Using UDPipe on CSC’s servers requires a CSC user account: https://research.csc.fi/accounts-and-projects

UDPipe is installed in CSC’s computing environment (invoke with: module load udpipe) in the following configuration:
Software: UDPipe 1.2.0
Models: 2.3-181115

UDPipe was compiled and installed from Source without local modifications. Please refer to the user’s manual.

The tool was installed using Ansible scripts that can be found here: https://github.com/CSCfi/Kielipankki-palvelut/tree/Dec2018/commandline/roles/udpipe


This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2024021901

Finnish Dependency Parsing Pipeline

Kielipankki version:  
Turku Dependency Parser Pipeline, Kielipankki version (TDPP-LBF)
icon-info-circle Metadata and license
Access to GitHub
TurkuNLP Finnish Dependency Parser:  
Finnish dependency parser developed by TurkuNLP (TDPP)
icon-info-circle Metadata and license
Access to GitHub
Look for all versions of this tool in META-SHARE  

The Turku Dependency Parser Pipeline, Kielipankki version (TDPP-LBF) is a version of the open source dependency parsing pipeline developed by the University of Turku NLP group for analyzing Finnish text, adapted by Kielipankki – the Language Bank of Finland.

For further information on the source version please visit the project’s website.

 

On Kielipankki’s GitHub repository you can find VRT tools adapted from the original pipeline (vrt-tdp-…):

  • vrt-tdp-alpha-fillup
  • vrt-tdp-alpha-lookup
  • vrt-tdp-alpha-marmot
  • vrt-tdp-alpha-parse

 


This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2024021503

Transkribus

Transkribus is a comprehensive platform for the digitisation, AI-powered text recognition, transcription and searching of historical documents.

Open the website

User instructions

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021110305

TurkuNLP word embedding demo (word2vec)

A tool developed for analyzing the semantic similarity of words.

The demo is based on word embeddings induced using the word2vec method, trained on 4.5B words of Finnish from the Finnish Internet Parsebank project and over 2B words of Finnish from Suomi24. On the Parsebank project page you can also download the vectors in binary form. The software behind the demo is open-source, available on GitHub. The demo is maintained by the Turku NLP group.

Demo:
TurkuNLP word embedding demo
icon-info-circle Metadata and license
Try out the demo
Tool:
word2vec
icon-info-circle Metadata and license
Tool project page
Search for all versions of this resource in META-SHARE

For word embeddings trained with word2vec and available in Kielipankki – The Language Bank of Finland please visit the wordvec resource group page.

 


This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021110304

WebAnno

WebAnno is a general purpose web-based annotation tool for a wide range of linguistic annotations including various layers of morphological, syntactical, and semantic annotations. Additionaly, custom annotation layers can be defined, allowing WebAnno to be used also for non-linguistic annotation tasks.

WebAnno is a multi-user tool supporting different roles such as annotator, curator, and project manager. The progress and quality of annotation projects can be monitored and measuered in terms of inter-annotator agreement. Multiple annotation projects can be conducted in parallel.

More about WebAnno

The Language Bank of Finland’s instance of WebAnno

See the documentation

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021110303

Mylly

Mylly is a versatile data analysis platform with interactive visualizations and workflows. It can be used to build workflows with a variety of tools, including morphosyntactic parsing, character set conversion and speech recognition.

Open the website

About Mylly

Mylly User Guide

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021110302

Sparv Pipeline

Sparv, Språkbanken’s text analysis tool, is a multilingual toolkit provided by the Swedish Språkbanken for parsing and annotating text in various languages.

User manual

Latest Sparv release on GitHub

Sparv GUI

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021110301

Turku Neural Parser Pipeline

The Turku Neural Parser Pipeline is a neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more than 50 languages.

The pipeline is installed in CSC’s computing environment as a Singularity container for the languages Finnish, Swedish and English.

Kielipankki version:  
Turku Neural Parser Pipeline, Kielipankki version (TNPP-LBF)
icon-info-circle Metadata and license
Access to Puhti
TurkuNLP Finnish Neural Parser:  
Turku Neural Parser Pipeline (TNPP)
icon-info-circle Metadata and license
Access to GitHub
Look for all versions of this tool in META-SHARE  

More information about the Kielipankki version:
Source for the Kielipankki version on GitHub
CSC’s Singularity installation of the Turku Neural Parser

On Puhti you can see a list of all installed versions and languages using:
module use /appl/soft/ai/singularity/modulefiles/
module spider turku-neural-parser

For more information on this tool have a look at the following links:
Parser Demo
Turku-neural-parser-pipeline manual
TurkuNLP DockerHub


This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021101102

Finnish Tagtools

This software package provides finnish-postag, a part-of-speech and morphology tagger for Finnish, and finnish-nertag, a named entity recogniser for Finnish.
This software is also installed in CSC’s computing environment (module load finnish-tagtools).

Both tools take running text from standard input and produce tabular output (one token per line) to standard output. See –help messages for more details.

An installer is provided in the form of a Makefile. More information can be found in the README file in the download folder.

Latest version:
Finnish Tagtools 1.6
icon-info-circle Metadata and license
Download the resource
Look for all versions of this tool in META-SHARE

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021101101

Search the Language Bank Portal:
Tanja Säily
Researcher of the Month: Tanja Säily

 

Upcoming events


Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information