UDPipe is a trainable pipeline for tokenization, tagging, lemmatization and dependency parsing of CoNLL-U files. UDPipe is language-agnostic and can be trained given annotated data in CoNLL-U format. Trained models are provided for nearly all UD treebanks. UDPipe is available as a binary for Linux/Windows/OS X, as a library for C++, Python, Perl, Java, C#, and as a web service. Third-party R CRAN package also exists.
UDPipe is a free software distributed under the Mozilla Public License 2.0 and the linguistic models are free for non-commercial use and distributed under the CC BY-NC-SA license, although for some models the original data used to create the model may impose additional licensing conditions. UDPipe is versioned using Semantic Versioning.
Copyright 2017 by the Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Czech Republic.
Kielipankki version: | |
UDPipe Kielipankki version Metadata and license |
Access to Puhti |
Source version: | |
UDPipe Metadata and license |
Access to GitHub |
Look for all versions of this tool in META-SHARE |
For more information on this tool have a look at the UDPipe User’s manual
More information on the Kielipankki version:
Using UDPipe on CSC’s servers requires a CSC user account: https://research.csc.fi/accounts-and-projects
UDPipe is installed in CSC’s computing environment (invoke with: module load udpipe) in the following configuration:
Software: UDPipe 1.2.0
Models: 2.3-181115
UDPipe was compiled and installed from Source without local modifications. Please refer to the user’s manual.
The tool was installed using Ansible scripts that can be found here: https://github.com/CSCfi/Kielipankki-palvelut/tree/Dec2018/commandline/roles/udpipe
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2024021901
Kielipankki version: | |
Turku Dependency Parser Pipeline, Kielipankki version (TDPP-LBF) Metadata and license |
Access to GitHub |
TurkuNLP Finnish Dependency Parser: | |
Finnish dependency parser developed by TurkuNLP (TDPP) Metadata and license |
Access to GitHub |
Look for all versions of this tool in META-SHARE |
The Turku Dependency Parser Pipeline, Kielipankki version (TDPP-LBF) is a version of the open source dependency parsing pipeline developed by the University of Turku NLP group for analyzing Finnish text, adapted by Kielipankki – the Language Bank of Finland.
For further information on the source version please visit the project’s website.
On Kielipankki’s GitHub repository you can find VRT tools adapted from the original pipeline (vrt-tdp-…):
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2024021503
Some tools are available as Docker images. They can be used without installing any other dependencies (except for Docker). At this time the images are replacements for the command-line versions of these tools, meaning that they’re used via stdin and stdout, but they can also be run in an application server as a web service.
For now, the available tools are finnish-nertag, finnish-postag and finnish-tokenize.
The images are available on the Language Bank’s Dockerhub account, and may be installed as follows:
sudo docker pull kielipankki/finnish-nertag:latest
(Or finnish-postag, etc.)
The resulting containers communicate via stdin and stdout, so you could test them like this::
$ sudo docker run --rm -i kielipankki/finnish-nertag <<< 'Pekingin olympialaiset 2008'
Pekingin <EnamexEvtXxx>
olympialaiset
2008 </EnamexEvtXxx>
They understand the same command-line options as the underlying tools:
$ sudo docker run --rm -i kielipankki/finnish-nertag --bio <<< 'Pekingin olympialaiset 2008'
Pekingin B-MISC
olympialaiset I-MISC
2008 I-MISC
$ sudo docker run –rm -i kielipankki/finnish-nertag –show-analyses <<< ’Pekingin olympialaiset 2008’
Pekingin peking [POS=NOUN][PROPER=PROPER][NUM=SG][CASE=GEN] [PROP=GEO] <EnamexEvtXxx>
olympialaiset olympialaiset [POS=NOUN][NUM=PL][CASE=NOM] _
2008 2008 [POS=NUMERAL][SUBCAT=CARD] _ </EnamexEvtXxx>
Sparv, Språkbanken’s text analysis tool, is a multilingual toolkit provided by the Swedish Språkbanken for parsing and annotating text in various languages.
Latest version: | |
Sparv Metadata and license |
Access |
Look for all versions of this tool in META-SHARE |
Latest Sparv release on GitHub
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021110301
The Turku Neural Parser Pipeline is a neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more than 50 languages.
The pipeline is installed in CSC’s computing environment as a Singularity container for the languages Finnish, Swedish and English.
Kielipankki version: | |
Turku Neural Parser Pipeline, Kielipankki version (TNPP-LBF) Metadata and license |
Access to Puhti |
TurkuNLP Finnish Neural Parser: | |
Turku Neural Parser Pipeline (TNPP) Metadata and license |
Access to GitHub |
Look for all versions of this tool in META-SHARE |
Kielipankki – the Language Bank of Finland has adapted the parser for its VRT format ( CWB-VRT):
Source for the Kielipankki version on GitHub
On Puhti you can see a list of all installed versions and languages using:
module use /appl/soft/ai/singularity/modulefiles/
module spider turku-neural-parser
For more information on this tool have a look at the following links:
Parser Demo
Turku-neural-parser-pipeline manual TNPP no longer maintained by TurkuNLP, see the note from May 2024!
TurkuNLP DockerHub
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021101102
This software package provides finnish-postag, a part-of-speech and morphology tagger for Finnish, and finnish-nertag, a named entity recogniser for Finnish.
This software is also installed in CSC’s computing environment (module load finnish-tagtools).
Both tools take running text from standard input and produce tabular output (one token per line) to standard output. See –help messages for more details.
An installer is provided in the form of a Makefile. More information can be found in the README file in the download folder.
Latest version: | |
Finnish Tagtools 1.6 Metadata and license |
Download the resource |
Look for all versions of this tool in META-SHARE |
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021101101