FIN-CLARIAH Deliverables

<< FIN-CLARIAH Overview

This page showcases the project deliverables (see template and instructions for reporting).

FIN-CLARIAH Funding period 2024-2025
FIN-CLARIAH Funding period 2022-2023 (Completed)

FIN-CLARIAH Funding period 2024-2025

Module 1: Natural Language Processing (NLP)

W1.1 Text processing and annotation environments

D1.1.1 Named-entity annotation 2024-09
D1.1.2 Ingesting new unstructured resources 2025-12

W1.2 Speech processing and annotation

D1.2.1 Data collection for minority languages 2024-09
D1.2.2 Transcription service for minority languages 2025-09

W1.3 Video processing and annotation

D1.3.1 Tools and guidelines for video processing 2025-06

Module 2: Language Research Infrastructure (LRI)

W2.1 Personal and Copyrighted Research Data

D2.1.1 Integrate environment for personal data 2024-09
D2.1.2 Framework for processing copyrighted data for verification of research 2025-09

W2.2 Training environments

D2.2.1 Transformer training for specialised data 2024-12
D2.2.2 Transformer adaptation for specialised data 2025-12

W2.3 Translation and Interpretation

D2.3.1 Remote access to text data repositories 2024-12
D2.3.2 Remote access to video data repositories 2025-12

W2.4 Terminology

D2.4.1 Term definition discovery procedures 2024-09
D2.4.2 Initializing terminology collections 2025-12

Module 3: Structuring Data

W3.1 Data Management

D3.1.1 Comprehensive data versioning 2024-09
D3.1.2 Workflow automation and version syncing 2025-09

W3.2 Data Ingestion

D3.2.1 Ingestion of structured data from Finna (NLF) 2025-03
D3.2.2 Ingestion of heritage and societal data from Sampo 2025-06
D3.2.3 Ingestion of multimodal societal data from the Web 2025-12

W3.3 Enrichment

D3.3.1 Automated metadata of archival data from NARC 2025-03
D3.3.2 Automated harmonisation and enrichment of metadata 2024-12
D3.3.3 Machine-learning -based enrichment of social media 2025-06
D3.3.4 Computer vision -based enrichment of multimodal data 2025-09

Module 4: Analyzing Structured Data

W4.1 Analytical Support for computational SSH

D4.1.1 Analysis of video stream interactions with AI solutions 2024-09
D4.1.2 Analysis tools for multimodal born-digital social media 2024-12
D4.1.3 Access to social media interaction in digital networks 2025-06
D4.1.4 Analysis of multimodal properties of naturalistic speech 2025-12
D4.1.5 Analysis of regional language variation in social media 2025-03
D4.1.6 Analysis of multimodal cultural heritage 2025-12
D4.1.7 Enrich survey data w. register data and unstructured text 2025-06

Module 5: Information Interaction (IIA)

W5.1 Evidence-Based Infrastructure Development

D5.1.1 Community engagement: multim. societal data researchers 2024-09
D5.1.2 Community engagement: multim. heritage researchers 2025-06
D5.1.3 Evidence-based infrastructure development 2024-12
D5.1.4 Educational resource development 2025-12

FIN-CLARIAH Funding period 2022-2023


Module 1: Natural Language Processing (NLP)

W1.1 Text processing and annotation environments

D1.1.1 Updating LBF resource selection 2022-09
D1.1.2 Ingesting new unstructured resources 2023-12

W1.2 Speech processing and annotation

D1.2.1 Forced-Alignment Service 2022-09
D1.2.2 Transcription Service for Finnish Interviews 2023-09

W1.3 Noise-tolerant NLP

D1.3.1 Corpora of non-standard language 2022-09
D1.3.2 System for detecting toxic language 2023-06
D1.3.3 Models for retrieving QA pairs from the web 2023-09
D1.3.4 QA pair corpora 2023-12

Module 2: Language Research Infrastructure

W2.1 Social Data Science

D2.1.1 Licensing agreements for personal data 2022-09
D2.1.2 Licensing agreements for special categories 2023-06

W2.2 Learners’ Assessment Environments

D2.2.1 Speech recognition for L2 2022-12
D2.2.2 Speech recognition for L2 update 2023-12

W2.3 Translation and Interpretation

D2.3.1 Licensing interpretation sessions 2022-12
D2.3.2 Aligning and retrieving 2023-12

W2.4 Terminology

D2.4.1 Term discovery procedures 2022-09
D2.4.2 Terminology application 2023-06
D2.4.3.1 Initializing terminology collections 2022-09
D2.4.3.2 Initializing terminology collections 2023-06
D2.4.3.3 Initializing terminology collections 2023-12

W2.5 Solutions for better use of language learner performances in research

D2.5.1 Test performances storage 2022-12
D2.5.2 Analysis and annotation tools for learner performances 2023-12

Module 3: Structuring Data

W3.1 Increasingly automated ingestion of material

D3.1.1 Initial NLF data 2022-09
D3.1.2 Ingestion framework 2022-12
D3.1.3 Versioning support 2023-06
D3.1.4 Incremental update process 2023-12

W3.2 AI solutions to better use of National Archives mass digitisation services

D3.2.1 Pipeline for transferring archival data 2022-12 2023-06
D3.2.2 Annotation & analysis tools for NARC data 2023-12

W3.3 AI solutions to better use of textual qualitative survey data

D3.3.1 Qualitative survey data concept network 2022-09
D3.3.2 R package for data concept network 2023-09 2023-12

W3.4 Developing analysis methods for real-time chats in gameplay streams

D3.4.1 Livestream data collector 2022-12

W3.5 Developing analysis methods for text network analysis of political texts

D3.5.1 Text network analysis of political texts 2022-12 2023-06
D3.5.2 Text network analysis of political texts 2023-09 2023-12

Module 4: Analyzing Structured Data

W4.1 Metadata harmonization and analysis

D4.1.1 Harmonized FNB 2022-09
D4.1.2 Harmonization code 2022-12
D4.1.3 Visualisation workflow 2023-06
D4.1.4 R/Python module 2023-12

W4.2 Linked Open Data Services

D4.2.1 LDF knowledge extraction tools 2022-12
D4.2.2 Parliament of Finland Ontology 2023-12

W4.3 Subsetting data

D4.3.1 Subsetting tool 2022-09
D4.3.2 Statistical overviews and bias detection 2023-06
D4.3.3 Representative Twitter dataset 2023-12

Module 5: Information Interaction

W5.1 Evidence-based RI development

D5.1.1 User experience questionnaire 2022-09
D5.1.2 Log data collection and analysis 2023-06
D5.1.3 Protocol for collecting workshop data 2023-12

W5.2 Education and dissemination

D5.2.1 Actor network 2022-12
D5.2.2 Educational material 2023-12

Top of page

<< FIN-CLARIAH Overview


Search the Language Bank Portal:
Juraj Šimko
Researcher of the Month: Juraj Šimko


Upcoming events


The Language Bank's technical support:
kielipankki (at)
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at)
tel. +358 29 4129317

More contact information