finnish-nertag

Finnish-nertag is a named entity recogniser for Finnish. This tool implements a pipeline in which FiNER is the ner-tagging stage. Users can install the tools on their systems or run them in the local directory without installing.

FiNER is a rule-based named-entity recognition tool for Finnish, developed at the University of Helsinki for the FIN-CLARIN consortium. It uses tools based on the CRF-based tagger FinnPos, the Finnish morphology package OmorFi, and the FinnTreeBank corpus for tokenization and morphological analysis, and a set of pattern-matching (pmatch) rules for recognizing and categorizing proper names and other expressions in plaintext input.

The pattern-matching rules are built and compiled using the Helsinki Finite-State Technology toolkit.

More information and a technical documentation can be found here.

Finnish-nertag is offered in CSC’s computing environment. It is also available for download as part of the software package finnish-tagtools, whose current version number is 1.6.


This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2025021801

Installing and using dockerized tools (finnish-postag, finnish-nertag, …)

Some tools are available as Docker images. They can be used without installing any other dependencies (except for Docker). At this time the images are replacements for the command-line versions of these tools, meaning that they’re used via stdin and stdout, but they can also be run in an application server as a web service.

For now, the available tools are finnish-nertag, finnish-postag and finnish-tokenize.

Installation

The images are available on the Language Bank’s Dockerhub account, and may be installed as follows:

sudo docker pull kielipankki/finnish-nertag:latest

(Or finnish-postag, etc.)

Usage

The resulting containers communicate via stdin and stdout, so you could test them like this::

$ sudo docker run --rm -i kielipankki/finnish-nertag <<< 'Pekingin olympialaiset 2008'
Pekingin <EnamexEvtXxx>
olympialaiset
2008 </EnamexEvtXxx>

They understand the same command-line options as the underlying tools:

$ sudo docker run --rm -i kielipankki/finnish-nertag --bio <<< 'Pekingin olympialaiset 2008'
Pekingin B-MISC
olympialaiset I-MISC
2008 I-MISC

$ sudo docker run –rm -i kielipankki/finnish-nertag –show-analyses <<< ’Pekingin olympialaiset 2008’
Pekingin peking [POS=NOUN][PROPER=PROPER][NUM=SG][CASE=GEN] [PROP=GEO] <EnamexEvtXxx>
olympialaiset olympialaiset [POS=NOUN][NUM=PL][CASE=NOM] _
2008 2008 [POS=NUMERAL][SUBCAT=CARD] _ </EnamexEvtXxx>

Finnish Tagtools

This software package provides finnish-postag, a part-of-speech and morphology tagger for Finnish, and finnish-nertag, a named entity recogniser for Finnish.
This software is also installed in CSC’s computing environment (module load finnish-tagtools).

Both tools take running text from standard input and produce tabular output (one token per line) to standard output. See –help messages for more details.

An installer is provided in the form of a Makefile. More information can be found in the README file in the download folder.

Latest version:
Finnish Tagtools 1.6
icon-info-circle Metadata and license
Download the resource

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021101101

Last modified on 2025-02-19

Search the Language Bank Portal:
Tamás Grósz
Researcher of the Month: Tamás Grósz

 

Upcoming events


Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information