Trankit

Trankit is a light-weight Transformer-based Python Toolkit for multilingual Natural Language Processing (NLP).

Trankit can process inputs which are untokenized (raw) or pretokenized strings, at both sentence and document level.

This tool is installed in CSC’s computing environment (’module load trankit’).

The current version is Trankit v1.0.0

For more details, please see Trankit’s Documentation.

 

Currently, Trankit supports the following tasks:

  • Sentence segmentation.
  • Tokenization.
  • Multi-word token expansion.
  • Part-of-speech tagging.
  • Morphological feature tagging.
  • Dependency parsing.
  • Named entity recognition.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2026011402

Last modified on 2026-01-20

Search the Language Bank Portal:
Kalle Lahtinen
Researcher of the Month: Kalle Lahtinen

 

Upcoming events


Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information