<< List of all deliverables

D2.2.2: Transformer adaptation for specialised data

Project: FIN-CLARIAH
Grant agreement: Research Council of Finland no. 358720
Start date: 01-01-2024
Duration: 24 months

WP 2.2: Report on Transformer adaptation for specialised data
Date of reporting: 25-11-2025

Report author: Erik Axelson, Jack Rueter (University of Helsinki)
Contributors: Jack Rueter (University of Helsinki), Sam Hardwick, Martin Matthiesen (CSC)
Deliverable location: N/A

Description

In this work package, we aim to provide an MCP server for facilitation of fst-tool and LLM linking for less technically oriented people.

MCP (Model Context Protocol) provides a powerful new opportunity to bring large language model (LLM) capabilities into the research and learning of low-resource languages by creating a bridge between rule-based, finite-state linguistic tools and LLM-based modern chatbots. By hosting HFST [1] analyzers and open-source dictionaries designed and authored by individual humans and teams at GiellaLT [2] and Apertium [3] through UralicNLP [4] libraries on an MCP server, even users with no technical background — and working from a laptop or cellphone — can access lemmatizers, morphological analyzers, and translation dictionaries for dozens of minority languages. This approach opens the door to more inclusive language technology, making advanced tools available to communities that have historically lacked computer-aided support.

We have familiarized ourselves with the use of a local MCP server from a laptop, and have run into memory issues. A so-called free server with a larger memory set at CSC would provide an ideal solution for individual users, as the server would host the model. Some language communities might want to have their specific language data housed as private, i.e., there would have to be different access to this material. The Language Bank of Finland is making plans for the installation of MCP service to allow extensive testing.

 

[1] HFST – Helsinki Finite-State Technology
[2] GiellaLT – an infrastructure for rule-based language technology aimed at minority and indigenous languages
[3] Apertium – a free/open-source machine translation platform
[4] UralicNLP – an NLP library for Uralic languages

 

FIN-CLARIAH project has received funding from the European Union – NextGenerationEU instrument and is funded by the Research Council of Finland under grant number 358720.

Last modified on 2025-11-27

Search the Language Bank Portal:
Krista Ojutkangas
Researcher of the Month: Krista Ojutkangas

 

Upcoming events


Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information