<< FIN-CLARIAH Overview

 

FIN-CLARIAH Summer Meeting 2025

Venue: University of Jyväskylä
Time: Friday 13.6.2025 11:00–17:00

(Scroll down for registration, please register by June 2nd)

Roads to multimodality in research: streams, videos, and building RI for the future

With the recent news that FIN-CLARIAH has been granted lighthouse status, we are now positioned to lead the way in advancing essential infrastructure areas, from impact and functionality, to service provision and collaborative use. This day in Jyväskylä will be dedicated to setting standards in latest and multimodal modes of research. With insights from processing audiovisual data, we aim to build connections and spark ideas for future development. In addition, we will have thematic group sessions that will convene on annotation, education provision, and code review for developers.

 

Date: Fri. 13 June 2025 11:00–17:00
Venue: University of Jyväskylä Seminaarimäki, Building P, Conference Room Lyhty (Seminaarinkatu 15, 40100 Jyväskylä)
Registration link: https://forms.office.com/e/qXd0aeafgN

Schedule

10:30 Welcome coffee

11:00 Welcome words (Raine Koskimaa, University of Jyväskylä / Krister Lindén, FIN-CLARIAH)

11:10 Keynote – To be confirmed

12:00 Lunch break. Location next to the venue (at own expense)

13:00 Insight talk: Tommi Jantunen & Juhana Salonen: “The multiple possibilities of signed language corpora” (Chair: Raine Koskimaa)

13:30 Group discussions

  1. Code-sharing session for developers – every participant walks the others through a part of their code/deployment/design, sharing what they’re happy with and what they’re having problems with; sharing best practices and receiving best practice advice (Facilitator: Eetu Mäkelä)
  2. What is Mink? – Presenting & testing the Mink platform. “Min Korp” (Mink) is a service to enable users to bring their own data to the Korp platform. Like Korp, Mink was developed by Språkbanken Text in Gothenburg. We will present the status of the adaptation to the Language Bank of Finland. (Facilitator: Martin Matthiesen)
  3. Education within FIN-CLARIAH – every participant shares what skills are required for using their part of the RI and what do they teach locally to provide those skills to SSH researchers (=mapping skill provision for the RI and identifying the training gaps in relation to RI uptake) (Facilitator: Sanna Kumpulainen)

15:00 Panel discussion. Short presentations of topics by group facilitators

16:00 Mingling

(16:00–17:00 Steering group meeting)

17:00 “Waiting for the last train” Possibility to have a drink/dinner near the railway station (at own expense)

 

 

<< FIN-CLARIAH Overview

FIN-CLARIAH Roadshow event

Place: University of Vaasa (Wolffintie 34, Tervahovi building, room D124)
Time: 14.3.2025  12:15-15:30

 

FIN-CLARIAH Roadshow in Vaasa: Some take-outs (blog posting, DARIAH-FI)
Watch a recording of the event!

What is new in the Finnish digital research infrastructure?

FIN-CLARIAH (Common Language Resources and Technology Infrastructure) is a fundamental Finnish digital research infrastructure (RI) for Social Sciences and Humanities (SSH). FIN-CLARIAH connects ten universities and memory organisations across the country. The Finnish Research Infrastructure Committee has granted roadmap status to FIN-CLARIAH for 2025–2028. This roadmap guides the research community, policymakers, and funders in directing investment and supporting research, development, and innovation.
 
FIN-CLARIAH will participate in a roadshow event in Vaasa 14.3., where some of the central resources and tools for social science and humanities (SSH) research will be presented, with special emphasis on acquiring, processing and depositing born-digital data. There will be informative presentations, but also hands-on activities in the program. From the University of Vaasa, the Natureach project will join the event with a presentation of AR/VR data and its analysis.
 
This is an in-person event at the University of Vaasa (Location: Tervahovi building, room D124). To participate in the hands-on activities, bring your own laptop!

Please register for the FIN-CLARIAH Roadshow event here: Registration has ended.
 
This roadshow is hosted by Prof. Merja Koskela
 

Programme

12.15 – 12.20

Merja Koskela: Introduction and welcome

12.20 – 12.30

Inés Matres: FIN-CLARIAH Overview of digital infrastructure for Social Sciences and Humanities (presentation slides)

12.30 – 12.45

Mietta Lennes: How to find, use and deposit your research data and tools via Kielipankki – The Language Bank of Finland (presentation, PDF / presentation, PPTX)

12.50 – 13.15

Harri Kettunen & Tiina Onikki-Rantajääskö: How Can I Contribute to the Reliability of Information? Collaborative Terminology Work at the Helsinki Term Bank for the Arts and Sciences

13.15 – 13.30

Break

13.30 – 14.00

Erik Henriksson: TurkuNLP tools to make sense of noisy web data (remote) Presentation slides

14.00 – 14.30

Raine Koskimaa, Ida Toivanen & Jari Lindroos: “How to use” – Twitch collector and analysis tools

14.30 – 15.00

Martta Ylilauri, Joni-Roy Piispanen & Rebekah Rousi: Natural impact – VR in NATUREACH /  Luonnon hyvinvointivaikutukset – VR NATUREACH-hankkeessa

15.00

Discussion

 

 

<< FIN-CLARIAH Overview

Join the course Corpus Linguistics and Statistical Methods (13.1.–28.2.2025)

The online course Corpus Linguistics and Statistical Methods is intended for students in languages or other fields who wish to learn the basics of using corpora.

The course is offered in Finnish and in English and it is available for free to all university students in and outside Finland. It can also be taken as an Open University course for a fee. The course has already started on Monday 13th January, but it will be possible to join the course area until 24 Jan.

Read more and register for the course!

NB: The same course will be organized again in period 1 in the autumn, starting on 1st September 2025 (follow the course page).

Language Data Space and ALT-EDIC in Finland

What is the European Language Data Space (LDS)?

The EU is in the process of creating an internal market for all types of data. The aim is to ensure that data can be shared from one stakeholder to another within the region, in accordance with the EU legislation. Data sharing requires interactive networks – data spaces – that can connect data providers and users, and offer a platform for them to communicate, make contracts and trade with each other.

All the upcoming European data spaces will be developed in line with the European Data Strategy. There are development plans for data spaces for approximately 15 different strategic fields. According to the vision, data spaces will allow for the commercialisation and more efficient re-use of data. This will benefit not only commercial stakeholders in the EU, but also EU citizens by providing them with better digital services, for example. In addition, researchers could gain access to new types of data and materials, which could boost basic research and increase opportunities for product development and innovation.

The European Language Data Space (shortened: LDS) is an ecosystem for the sharing and commercialisation of language data, such as text and speech data, and for the development of large language models and language-centric Artificial Intelligence. The Language Data Space is being developed and coordinated by the LDS Consortium, which was established in early 2023 with the support of the European Commission. The first phase of the LDS will last three years and during this period, the technical and legal framework for the operation of the common language data platform will be established in cooperation with the various stakeholders.

The work on the language data space will also be driven forward by ALT-EDIC, the language technology alliance of EU member states established in early 2024. In particular, ALT-EDIC aims to ensure the development of EU-based large language models.

The Language Data Space will be built partly on top of existing networks and language technology infrastructures. Sitra’s publication Snapshot of Finnish data spaces (2024) summarises well the current situation in Finland with regard to language technologies and the Language Data Space (in Finnish).

LDS Workshop in Finland and elsewhere in Europe

In spring 2024, the LDS Consortium launched a series of country-specific workshops to share information about the possibilities of the common Language Data Space, and to reach as many stakeholders in each member country as possible. The workshops are organised in collaboration with local institutions. In April 2024, Finland had the honour of being the first EU member state to host an LDS workshop. The event was organised locally by the University of Helsinki. More information on workshops in other EU countries and upcoming LDS events can be found on the Language Data Space website.

The Finnish LDS workshop provided an opportunity for organisations and companies in Finland to exchange ideas on the possibilities and challenges that a common platform and marketplace for language models and data could offer. As remote presenters, the workshop featured Philippe Gelin from the European Commission, and Georg Rehm from DFKI in Germany. In the panel discussions (see photos below), partners from the LAREINA project coordinated by the University of Helsinki shared their views on the importance of language data and on the challenges regarding the availability and technical quality of data or regarding copyright constraints. Without access to electronic data of sufficient quality and scope, it is difficult to develop language models for speakers of small and medium-sized languages.

After the LDS workshop, Finland initiated the membership process to join ALT-EDIC as an observer member. After summer 2024, the full membership of Finland in ALT-EDIC was confirmed for the next three years. The administrative representative of Finland in ALT-EDIC is the Ministry of Transport and Communications, with whom the University of Helsinki aims to maintain an active dialogue.

LDS invites businesses and other stakeholders to join the user group

Language Data Space invites European stakeholders to join the LDS User Group. The group includes commercial stakeholders from different sectors as well as representatives from both public administrations and research. The news from the remote meeting of the LDS User Group in November 2024 can be found here. Joining the LDS User Group is done via a form that can be found on the LDS website. In particular, language data providers and utilisers, as well as language model developers, are warmly welcome to join the group.

At the end of 2024, the Language Data Space is entering the pilot phase, where the Language Bank of Finland is also actively involved. The aim is to test the pilot version of the LDS platform in Finland and to collect user feedback. The Language Bank of Finland is also planning to organise a workshop in spring 2025 on the Language Data Space, ALT-EDIC and copyright issues. We will inform about this upcoming event on our website and through the LAREINA project.


All photos in the article: Jyrki Niemi / University of Helsinki

Links

Language Data Space (LDS) ja ALT-EDIC Suomessa

Mikä on European Language Data Space (LDS)?

EU luo parhaillaan kaikenlaiselle datalle sisämarkkinoita, joilla pyritään varmistamaan datan liikkuvuus alueen toimijoiden välillä EU:n lainsäädännön mukaisesti. Tähän tarkoitukseen tarvitaan vuorovaikutteisia verkkoja – data-avaruuksia – jotka voivat yhdistää datan tarjoajat ja käyttäjät ja tarjota heille alustan keskinäiseen viestintään, sopimusten laatimiseen ja kaupankäyntiin.

Kaikki vireillä olevat eurooppalaiset data-avaruudet kehitetään Euroopan datastrategian mukaisesti. Data-avaruuksia ollaan pystyttämässä jo noin 15 eri toimialalle. Tarkoituksena on, että niiden avulla dataa voitaisiin kaupallistaa ja tehostaa sen uudelleenkäyttöä. Tästä hyötyisivät EU-alueen kaupallisten toimijoiden ohella myös kansalaiset, kun esimerkiksi digitaaliset palvelut paranisivat. Lisäksi tutkijat voisivat saada pääsyn uudenlaisiin aineistoihin, mikä tukisi perustutkimusta ja parantaisi mahdollisuuksia tuotekehittelyyn ja innovaatioihin.

European Language Data Space (lyhenne: LDS) eli Eurooppalainen kielidata-avaruus on kielidatan, kuten teksti- ja puheaineistojen, jakamiseen ja kaupallistamiseen sekä suurten kielimallien ja kielikeskeisen tekoälyn kehittämiseen tarkoitettu ekosysteemi. Kielidata-avaruutta kehittää ja koordinoi LDS-konsortio, joka perustettiin Euroopan komission myötävaikutuksella alkuvuonna 2023. LDS:n ensimmäinen vaihe kestää kolme vuotta, joiden aikana on tarkoitus luoda yhteisen kielidata-alustan toiminnalle tekniset ja juridiset puitteet yhteistyössä eri sidosryhmien kanssa.

Kielidata-avaruuden edistämiseen osallistuu myös ALT-EDIC, alkuvuonna 2024 perustettu kieliteknologian allianssi, jonka jäseninä ovat EU-valtiot. ALT-EDICin tavoitteena on varmistaa etenkin EU-lähtöisten suurten kielimallien kehittäminen.

Kielidata-avaruutta rakennetaan osin jo olemassa olevien verkostojen ja kieliteknologisten infrastruktuurien päälle. Sitran julkaisu Suomalaisten data-avaruuksien tilannekuva (2024) kiteyttää hyvin Suomen tilanteen kieliteknologioiden ja kielidata-avaruuden osalta.

LDS-työpaja Suomessa ja muualla Euroopassa

LDS-konsortio käynnisti keväällä 2024 maakohtaisten työpajojen sarjan, jotta tieto yhteisen kielidata-avaruuden mahdollisuuksista kantautuisi eteenpäin ja tavoittaisi kunkin jäsenmaan sidosryhmiä mahdollisimman laajalti. Työpajat järjestetään yhteistyössä paikallisten toimijoiden kanssa. Huhtikuussa 2024 Suomella oli kunnia olla ensimmäinen EU:n jäsenvaltio, jossa toteutettiin LDS-työpaja. Paikallisena järjestäjänä toimi Helsingin yliopisto. Muissa EU-maissa järjestettyihin työpajoihin sekä tuleviin LDS-tapahtumiin voi tutustua Language Data Spacen verkkosivuilla.

LDS-työpaja tarjosi Suomessa toimiville organisaatioille ja yrityksille tilaisuuden vaihtaa ajatuksia siitä, millaisia mahdollisuuksia ja haasteita yhteinen kielimallien ja -datan jakelualusta ja kauppapaikka voisi tarjota. Työpajassa vierailivat etäpuhujina Philippe Gelin Euroopan komissiosta sekä Georg Rehm DFKI:sta, Saksasta. Paneelikeskusteluihin (ks. kuvat alla) osallistui Helsingin yliopiston koordinoiman LAREINA-hankkeen yhteistyökumppaneita, joilla on kokemusta kielidatan merkityksestä sekä tietoa haasteista, jotka liittyvät datan saatavuuteen, tekniseen laatuun tai tekijänoikeuksien tuomiin rajoitteisiin. Jos riittävän laadukkaisiin ja laajoihin sähköisiin aineistoihin ei ole pääsyä, on vaikea kehittää omia kielimalleja pienten ja keskisuurten kielten puhujille.

LDS-työpajan jälkimainingeissa Suomi käynnisti jäsenyysprosessin ja liittyi ALT-EDICiin tarkkailijajäseneksi. Kesän 2024 jälkeen vahvistettiin myös Suomen täysjäsenyys seuraaviksi kolmeksi vuodeksi. Hallinnollisesti Suomea edustava taho ALT-EDIC-asioissa on Liikenne- ja viestintäministeriö, jonka kanssa Helsingin yliopisto pyrkii pitämään aktiivisesti yhteyttä.

LDS kutsuu yrityksiä ja muita sidosryhmiä käyttäjäryhmään

Language Data Space kutsuu eurooppalaisia toimijoita mukaan LDS-käyttäjäryhmään. Mukana ryhmässä on eri alojen kaupallisia toimijoita ja julkishallinnon sekä tutkimuksen edustajia. LDS-käyttäjäryhmän marraskuussa 2024 järjestetyn etätapaamisen kuulumisia voi lukea verkkouutisesta. LDS:n sivuilla on myös käyttäjäryhmän liittymiskaavake. Ryhmään ovat tervetulleita erityisesti kielidatan tarjoajat ja hyödyntäjät sekä kielimallien kehittäjät.

Vuoden 2024 lopulla Language Data Space on siirtymässä pilotointivaiheeseen, jossa myös Kielipankki on aktiivisesti mukana. Tavoitteena on testata Suomessa LDS-alustan pilottiversiota sekä kerätä siitä käyttäjäpalautetta. Kielipankki suunnittelee myös järjestävänsä keväällä 2025 työpajan, jonka aiheina ovat LDS:n ja ALT-EDICin lisäksi tekijänoikeusasiat. Tiedotamme tapahtumasta myöhemmin verkkosivuillamme sekä LAREINA-hankkeen kautta.


Artikkelin kuvat: Jyrki Niemi / Helsingin yliopisto

Linkkejä

<< FIN-CLARIAH Overview

FIN-CLARIAH Meeting 22.11.2024

Place: University of Turku, School of Economics (Rehtoripellonkatu 3, Turku)
Time: 22.11.2024  11:00-16:00

Registration: closed

Workshop theme: “Digital cultural heritage”

The goal of this FIN-CLARIAH day is to reflect on the importance of data and metadata for research. We want to discuss what is data vs. metadata? What are the standards in different fields? How is or how could metadata be extracted or processed? First we will hear the perspective of a research infrastructure project integrating heterogeneous data for the study of human diversity to set the stage for group discussions.

Preliminary schedule

10:30 Welcome coffee
11:00 Welcome words (Krister Lindén & Veronika Laippala)
11:10 Keynote by Virpi Lummaa, Human Diversity Project, University of Turku
12:00 Lunch
13:00 Insight talks: opportunities and challenges of processing metadata (Kimmo Elo, University of Turku; Maria Kallio-Hirvonen, The Finnish National Archives)
13:30 Group discussions, challenges and solutions regarding the specifics of different types of research data. Some questions can start the conversation: what is data vs. metadata? What are the standards in your field? How is or how could metadata be processed? Discussions will be guided by moderators (tbc).
  1. Text data
  2. Audiovisual, speech and audio data
  3. Still images
15:00 Panel discussion (Moderator: Inés Matres)
16:00 Plenary ends
(16:00-17:00 Steering group meeting)

 

 

Workshop: Large Language Models and Speech-Centric AI

[Programme]

Wednesday 09.10.2024 at 8:30-14:00, Clarion Hotel Helsinki

Organizers:  

Department of Digital Humanities, University of Helsinki
LAREINA project
Kites ry

Location:  

Clarion Hotel Helsinki, Tyynenmerenkatu 2, Helsinki

Welcome to the Workshop!

The development of language-centric AI during the past few years has been remarkable. It poses challenges but also creates opportunities for organizations both in the private and the public sector. Many of us are curious about how to harness the power of AI in our own business.

Our workshop on Large Language Models and Speech-Centric AI will showcase various use cases and applications both in the public and private sector. Our objective is to introduce the current state of language-centric AI in Finland, and share information about the future of access to language data and modules. The demo presentations and industry talks will illustrate the potential use of language-centric AI.

This workshop is addressed to developers, integrators and users of language technologies and AI solutions in Finland. The workshop will be held in English and on-site only.

Registration

Participation is free of charge, but registration is required. We have 50 seats available. Registration has now ended and the event is fully booked.

 

Large Language Models and Speech-Centric AI

Programme for the Workshop on Wednesday 09.10.2024

 

08:30 – 09:00

Registration and Coffee

09:00 – 10:30

LLMs and Speech-Interfaces in Private and Public Sector

Krister Lindén, University of Helsinki
Tomi Paavola, Ministry of Transport and Communications
Jörg Tiedemann, University of Helsinki
Tommi Lehtonen, KAVI & Mikko Kurimo, Aalto University
Markus Koskela, CSC – IT Center for Science

10:30 – 11:30

Demo Presentations and Coffee

11:30 – 13:00

AI and Speech-Interfaces

Antti ’Jogi’ Poikola, Teknologiateollisuus ry
Iftikhar Ahmad, Tietoevry
Manu Setälä, Solita Oy
Iikka Hauhio, Kielikone Oy
Michael Stormbom, Lingsoft Language Services Oy
Peter Smit, Inscripta Oy

13:00 – 14:00

Lunch

 

Contact the organizers for further details:

lareina-office [ATT] helsinki.fi

Last updated: October 8, 2024

Posters presented at the FIN-CLARIAH Meeting 10.6.2024

<< FIN-CLARIAH event page

To view or to download the PDF version, click on the image.

Overview of CSC services used in various Work Packages

Image of the poster Overview of CSC services used in various Work Packages

Top of page

W1.1 Text processing and annotation environments

Image of the poster W1.1 Text processing and annotation environments
Top of page

W1.2 Speech processing and annotation

Image of the poster W1.2 Speech processing and annotation
Top of page

W1.3 Video processing and annotation

Image of the poster W1.3 Video processing and annotation
Top of page

W2.1 Personal and Copyrighted Research Data

Image of the poster W2.1 Personal and Copyrighted Research Data
Top of page

W2.2 Training environments

Image of the poster W2.2 Training environments
Top of page

W2.3 Translation and Interpretation

Image of the poster W2.3 Translation and Interpretation
Top of page

W2.4 Terminology

Image of the poster W2.4 Terminology
Top of page

W3.1 Data Management

Image of the poster W3.1 Data Management
Top of page

W3.2 Data Ingestion

Image of the poster W3.2.1 Data ingestion through Finna

Image of the poster Sampo Systems Infrastructure Data Services and Portals
Top of page

W3.3 Enrichment

Image of the poster W3.3.1 Enhancing the usability of archival data

Image of the poster W3.3.2 Supporting the research use of large-scale cultural heritage metadata

Image of the poster W3.3.3 a. Interaction in web content – A case study

Image of the poster W3.3.3 b. Multimodal Analysis Tools for Understanding Livestreams

Image of the poster W3.3.4 Tools for visual analysis

Top of page

W4.1 Analytical support for computational SSH

Image of the poster W4.1 Analytical support for computational SSH

Image of the poster W4.1.2-3 Representative benchmark data of social media and digital tools for network analysis

Image of the poster W4.1.6 Enrich survey data with register data and unstructured text

Top of page

W5.1 Evidence-Based Infrastructure Development

Image of the poster W5.1 Evidence-Based Infrastructure Development

Top of page

W5.2 Log-based Data Analysis

Image of the poster Log-based Data Analysis. Unveiling the Past: User log-based Recommendation System for NLF Historical Newspapers

Top of page

<< FIN-CLARIAH event page

<< FIN-CLARIAH Overview

FIN-CLARIAH Meeting 10.6.2024 11-16

Place: Minerva Plaza, Siltavuorenpenger 5 A, University of Helsinki

 

The goal of the workshop is to let the whole infrastructure reflect on how SSH research will be affected by AI and how we as an infrastructure should prepare for this. In addition, we need to collect information on what preparations are already taking place in the different locations of the infrastructure.

Register by 24.5.2024

 

Poster session

 

Program

The goal of the workshop is to let the whole infrastructure reflect on how SSH research will be affected by AI and how we as an infrastructure should prepare for this. In addition, we need to collect information on what preparations are already taking place in the different locations of the infrastructure.

11.00-12.00 Keynote: Modern generative image modelling, Jaakko Lehtinen (Aalto University)

12.00-13.00 Lunch

13.00- Group discussions. Task: Brainstorm and develop an action plan for how to integrate transformer technology in the research infrastructure development (Tentative questions: AI for diverse data types? What is happening across locations? Making AI accessible and usable? Implications for SSH research, development, education?)

  1. Text data – Cultural heritage (facilitator Liisa Näpärä, NLF)
  2. Text data – Web / societal data (facilitator Veronika Laippala, TurkuNLP)
  3. Audiovisual, speech and audio data (facilitator Mikko Kurimo, Aalto)
  4. Still images (facilitator Ilkka Lähteenmäki, Oulu)

14.10-14.30 Coffee

14.30-15.00 Summaries from the groups

15.00 Poster session & Refreshments

16.00 Closing

 

 

Eurooppalainen kielidata-avaruus -työpaja

Vapauta datan mahdollisuudet yrityksille ja kansalaisille EU:ssa

 

Keskiviikkona 10.04.024 klo 9:00-15:15, Clarion Hotel Helsingissä

Järjestäjät:
European Language Data Space
Digitaalisten ihmistieteiden osasto, Helsingin yliopisto

Tervetuloa Eurooppalaisen kielidata-avaruuden -työpajaan!

Eurooppalainen kielidata-avaruus (European Language Data Space, LDS) ja Helsingin yliopisto kokoavat yhteen suomalaisen teollisuuden, julkishallinnon ja tutkimuksen asiantuntijoita keskustelemaan kielidatan merkityksestä kieliteknologioiden ja tekoälypohjaisten työkalujen kehittämiselle Suomessa. Tilaisuus järjestetään 10.04.2024 Clarion Hotel Helsingissä.

Vuoden 2023 alusta lähtien Euroopan komissio on ohjannut ja tukenut uutta tapaa jakaa kielidataa Eurooppalaisen kielidata-avaruuden (LDS) kautta. Tämä uusi tapa ulottuu kielidataa laajemmalle, ja se kattaa monia aloja ja toimintaympäristöjä niiden niiden omien data-avaruuksiensa kautta. Yhteiseurooppalaisten data-avaruuksien (Common European Data Spaces) perustamisen myötä tiedotus ja välitys datan eri tiedonkuvaus- ja saatavuusmuotojen välillä on toteutumassa kaikissa Euroopan maissa.

Tätä taustaa vasten Eurooppalaisen kielidata-avaruuden tavoitteena on rakentaa luotettavat ja tehokkaat datamarkkinat kielivarojen jakamiseen julkisella ja yksityisellä sektorilla EU:n datastrategian mukaisesti.

Eurooppalainen kielidata-avaruus (LDS) järjestää sarjan maakohtaisia työpajoja, joiden tarkoituksena on auttaa paikallisia yrityksiä, tutkimusryhmiä ja julkishallintoja ottamaan uuden kielidatanvaihtoavaruuden käyttöönsä ja liittymään relevantteihin paikallisiin ja eurooppalaisiin verkostoihin. Samalla ne voivat hyödyntää jo olemassa olevia luotettavia infrastruktuureja. Eurooppalaisena kielidatan jakamisalustana LDS voi auttaa paikallisia toimijoita kaupallistamaan kielidataansa monikielisessä Euroopassa, jossa kieliteknologioiden ja tekoälypohjaisten sovellusten merkitys jatkuvasti kasvaa.

 

Suomen LDS-työpaja

Suomen työpajassa käsitellään kotimaisen yksityisen ja julkisen sektorin sidosryhmien tarpeita kielidatan tarjoajina, integroijina ja/tai kuluttajina. Tapahtumassa jaetaan näiden tahojen kokemuksia ja vaatimuksia sekä selvitetään, kuinka voitaisiin päästä toivottuun teknologiseen kasvuun ja parantaa kilpailukykyä sekä kansallisella että Euroopan tasolla. Työpajassa keskustellaan siitä, kuinka LDS voi auttaa suomalaisia toimijoita ja tukea niiden pyrkimyksiä tuottaa, kaupallistaa tai hankkia kielidataa kieliteknologioiden ja tekoälypohjaisten työkalujen käyttövoimaksi Suomessa.

Työpaja on suunnattu datan haltijoille ja tarjoajille, kieliteknologioiden kehittäjille ja integraattoreille, pk-yrityksille sekä julkisen hallinnon edustajille, viranomaisille ja yhteistyökumppaneille. Työpaja on englanninkielinen.

Ilmoittautuminen

Osallistuminen on maksutonta, mutta tilaisuuteen on ilmoittauduttava etukäteen. Ilmoittautuminen on päättynyt 03.04.2024. Ota yhteys järjestäjiin ja tarkista, onko tilaisuuteen vielä paikkoja jäljellä: lareina-office [ATT] helsinki.fi

 

Suomen LDS-työpaja 10.4.2024, ohjelma

09:00 – 09:45

Ilmoittautuminen

09:55 – 10:05

Welcome and introduction
Krister Lindén, University of Helsinki

10:05 – 10:35

Welcome by the European Commission: The Digital Europe Programme and the Common European Language Data Space
Philippe Gelin, European Commission

10:35 – 11:05

The importance of language data for the development of LT solutions future steps
Aleksander Alafuzoff, Yle

11:05 – 11:30

Kahvitauko

11:30 – 11:40

Welcome by the Ministry of Finance
Olli-Pekka Rissanen, Ministry of Finance

11:40 – 12:30

Language Data and Language Technologies in Finland and for Finnish
– Panel session

Krister Lindén, University of Helsinki (Moderator)
Mikko Kurimo, Aalto University
Iftikhar Ahmad, Tietoevry
Peter Smit, Inscripta Oy
Riikka Lindroos-Järvitalo, KELA
Patrik Gayer, SiloAI
Kirsi Salmela, Kopiosto

12:30 – 13:00

European Language Data Space: developing a market for language data and services and benefitting from a joint European effort
Georg Rehm, LDS Consortium, German Research Center for Artificial Intelligence (DFKI)

13:00 – 13:50

Lounas

13:50 – 14:50

Language data production, management, and market development: overcoming obstacles – Panel session
Krister Lindén, University of Helsinki (Moderator)
Manu Setälä, Solita Oy
Kaarina Hyvönen, Kielikone Oy
Tiina Lindh-Knuutila, Lingsoft Language Services Oy
Tommi Lehtonen, KAVI
Ilkka Lavas, City Digital Group
Jörg Tiedemann, University of Helsinki

14:50 – 15:05

Conclusions
Krister Lindén, University of Helsinki

15:05 – 15:15

Kahvitauko ja verkostoituminen

15:15 – 16:15

Kahvitauko ja verkostoituminen jatkuvat Sitran järjestämässä Nordic Data Festival 2024 -tapahtumassa (rinnakkaistapahtumana Clarion Hotel Helsingissä)

 

 

Ota yhteys paikallisiin järjestäjiin:

Krister Lindén and Wilhelmina Dyster
Helsingin yliopisto
lareina-office [ATT] helsinki.fi

Viimeksi päivitetty: 05.04.2024

CSC Computing Environment free online courses 24.-25.4. (Part 1: Basics) and 15.-16.5. (Part 2: Next steps)

CSC Computing Environment, Part 1: Basics 24.-25.4.

This is what everyone should know about our computing environment when launching jobs!

Are you planning on using CSC’s high-performance computing (HPC) services (Puhti, Mahti, Allas…) in the near future? Have you been using these services already, but would like to make sure you are getting the most out of them? This intensive course is intended for you!

More info and registration at: https://ssl.eventilla.com/part1april24

CSC Computing Environment, Part 2: Next steps 15.-16.5.

How to handle large datasets, install own software and scale up workflows efficiently in CSC’s computing environment

Are you using CSC’s high-performance computing (HPC) services (Puhti, Mahti, Allas…), but want to make sure you are getting the most out of them? Are you working with data in the most efficient way? Want to know the best tips and tricks of the trade when scaling up your workflows? This intensive course is intended for you!

More info and registration at: https://ssl.eventilla.com/part2may24

 

These 2+2 half-day sessions focus on using the CSC HPC environment via short lectures and hands on tutorials. Please check the required course prerequisites.

Please note, that the same course is also available as a free self-learning online course at https://ssl.eventilla.com/csccompenvselflearn.

European Language Data Space (LDS) workshop in Finland

Unleashing the potential of data – for EU businesses and citizens

 

Wednesday 10.04.2024 at 9:00-15:15, Clarion Hotel Helsinki

Organisation:
European Language Data Space
Department of Digital Humanities, University of Helsinki

Welcome to the European Language Data Space workshop in Finland!

The European Language Data Space and the University of Helsinki are bringing together experts from the Finnish Industry, Public Administration and Research to discuss the importance of language data for the development of Language Technologies and AI-based tools in Finland. The event is taking place on 10.04.2024 at Clarion Hotel Helsinki.

Since early 2023, the European Commission is providing guidance and support towards a new dimension in language data sharing that is executed through the European Language Data Space (LDS). This new dimension goes beyond language data and addresses many areas and fields through their specific Data Spaces. With the establishment of the Common European Data Spaces, the communication and exchange amongst different modalities of data description and availability is becoming a reality for all European countries.

In this context, the European Language Data Space aims at building a trustworthy and effective data market for the exchange of language resources in the public and – even more importantly – in the private sector, in line with the EU Data Strategy.

For that purpose, the European Language Data Space (LDS) is going to organise a series of Country Workshops to support local industries, research groups and public administrations to integrate this new language data exchange space and connect with relevant local and European networks, while benefiting from the trustworthy infrastructures already available. As European language data sharing platform, the LDS can help local industry stakeholders to monetise their language data in a multilingual Europe where Language Technologies and AI-based applications play an increasingly important role.

 

The LDS workshop in Finland

The Finnish LDS workshop will address the needs of the Finnish stakeholders from both private and public sectors, be it providers, integrators and/or consumers of language data, while sharing their experiences and requirements and exploring how to meet the desired technological growth to enhance their competitiveness at both national and European levels. The LDS will present and discuss how it can help the Finnish stakeholders and support their efforts to produce/monetise/obtain language data to power LT and AI-based tools in Finland.

The workshop is addressed to data owners and data providers, LT developers and integrators and SMEs, as well as to public administration executives, officers and partners. The workshop will be held in English.

Registration

Participation is free of charge, but registration is required. Registration was closed on 03.04.2024. Please contact the organisers and check if there still are seats available: lareina-office [ATT] helsinki.fi

 

European Language Data Space (LDS) workshop in Finland on April 10th, 2024 Programme

09:00 – 09:45

Registration

09:55 – 10:05

Welcome and introduction
Krister Lindén, University of Helsinki

10:05 – 10:35

Welcome by the European Commission: The Digital Europe Programme and the Common European Language Data Space
Philippe Gelin, European Commission

10:35 – 11:05

The importance of language data for the development of LT solutions future steps
Aleksander Alafuzoff, Yle

11:05 – 11:30

Coffee Break

11:30 – 11:40

Welcome by the Ministry of Finance
Olli-Pekka Rissanen, Ministry of Finance

11:40 – 12:30

Language Data and Language Technologies in Finland and for Finnish
– Panel session

Krister Lindén, University of Helsinki (Moderator)
Mikko Kurimo, Aalto University
Iftikhar Ahmad, Tietoevry
Peter Smit, Inscripta Oy
Riikka Lindroos-Järvitalo, KELA
Patrik Gayer, SiloAI
Kirsi Salmela, Kopiosto

12:30 – 13:00

European Language Data Space: developing a market for language data and services and benefitting from a joint European effort
Georg Rehm, LDS Consortium, German Research Center for Artificial Intelligence (DFKI)

13:00 – 13:50

Lunch Break

13:50 – 14:50

Language data production, management, and market development: overcoming obstacles – Panel session
Krister Lindén, University of Helsinki (Moderator)
Manu Setälä, Solita Oy
Kaarina Hyvönen, Kielikone Oy
Tiina Lindh-Knuutila, Lingsoft Language Services Oy
Tommi Lehtonen, KAVI
Ilkka Lavas, City Digital Group
Jörg Tiedemann, University of Helsinki

14:50 – 15:05

Conclusions
Krister Lindén, University of Helsinki

15:05 – 15:15

Coffee Break and Networking

15:15 – 16:15

Coffee Break and Networking continue in Sitra’s Nordic Data Festival 2024 event (co-located in Clarion Hotel Helsinki)

 

 

Contact the local organizers for further details:

Krister Lindén and Wilhelmina Dyster
University of Helsinki
lareina-office [ATT] helsinki.fi

Last updated: April 5, 2024

ParlaCLARIN IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora – Call for Papers

The 2024 ParlaCLARIN Workshop will be held in May in Torino (Italy), as part of the LREC-COLING 2024 – The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation.

The Call for Papers is now open and the paper submission deadline is 19 February 2024.
Read more: https://www.clarin.eu/ParlaCLARIN-IV

<< FIN-CLARIAH Overview

FIN-CLARIAH Meeting 1.12.2023 11-17

Place: Väinö Linna -sali, Linna Building, Kalevantie 5, Tampere

 

Preliminary Program

11.00-11.10 Welcoming Words by Sanna Kumpulainen, Associate Professor in Information Studies, Tampere University

11.10-12.00 Keynote I on Studying SSH Research Needs: Elina Late, Senior Research Fellow in Information Studies, Tampere University

12.00-13.00 Lunch

13.00-13.45 Keynote II on Language Models: Sampo Pyysalo, Associate Professor at the Department of Computing, University of Turku

>>Download the slides of the work package presentations<< (pdf)

13.45-14.30 Work Package Presentations I

13.45-14.00 WP1.3 Veronika Laippala: Noise-Tolerant NLP

14.00-14.20 WP1.1, 1.2, 2.1, 2.2. & 2.3 Mietta Lennes: Kielipankki – The Language Bank of Finland

14.20-14.25 WP2.4 Harri Kettunen: Helsinki Term Bank for the Arts and Sciences

14.25-14.30 WP2.5 Jenny Tarvainen: Automated Text Tools for Learner Language

14.30-15.00 Coffee

15.00-16.00 Work Package Presentations II

15.00-15.05 WP3.1 Martin Matthiesen: Pipeline from the National Library to CSC

15.05-15.10 WP3.2 Tanja Välisalo: Named Entity Recognition for NARC Data

15.10-15.15 WP4.3 Eetu Mäkelä: Evaluation and Subsetting

15.15-15.20 W4.1 Julia Matveeva: Metadata Harmonization

15.20-15.25 WP4.4 Mikko Laitinen: Twitter

15.25-15.30 WP4.2 Eero Hyvönen: LOD

15.30-15.35 WP3.4 Raine Koskimaa: Game Streams

15.35-15.40 WP3.3 Maria Valaste: Qualitative Surveys

15.40-15.45 WP3.5 Kimmo Elo (Risto Turunen replacing): Text Networks

15.45-15.50 WP5 Sanna Kumpulainen: Evidence-based RI Development + Education & Resources

16.00-17.00 Free Chilling & Refreshments / Parallel session: Executive Board Meeting (with Zoom option)

 

 

Join the course Corpus Linguistics and Statistical Methods (4.9.-20.10.2023)

The online course Corpus Linguistics and Statistical Methods is intended for students in languages or other fields who wish to learn the basics of using corpora.

The course is offered in Finnish and in English and it is open to all university students in and outside Finland. The course has already started on Monday 4th September, but it will be possible to join the course area until 15th September.

Read more and register for the course!

NB: The same course will be organized again in period 3, starting on 15 Jan 2024 (see the course page).

The Donate Speech Corpus enabled researchers to analyze the typical voice pitch of more than 8000 speakers of Finnish

The 24th INTERSPEECH Conference was held on 20-24 August 2023 in Dublin, Ireland. At the conference, Mietta Lennes from the Language Bank of Finland presented a poster, based on the following conference article:

Lennes, M., Toivola, M. (2023). Pitch distributions in a very large corpus of spontaneous Finnish speech. Proceedings of INTERSPEECH 2023, 4778-4782, doi: 10.21437/Interspeech.2023-1822.

Take a look at the poster

 

This page has a persistent identifier: http://urn.fi/urn:nbn:fi:lb-2023081621

Mietta Lennes & Minnaleena Toivola:

Pitch distributions in a very large corpus of spontaneous Finnish speech

Poster and supplementary materials presented at Interspeech 2023, 20.-24.8.2023, Dublin, Ireland.

Last updated: 2023-08-25

This page contains a picture of the poster presented at the conference and some additional figures and details about the piece of research in question. For further information, please contact Mietta Lennes.

 


Poster

(Click to download the image as a pdf document)

The poster describes the main results of the conference article.


Additional figures

Pitch density of 60 speakers (red=female, blue=male; according to spelf-reported gender), after the second pass of the pitch detection process

Pitch density of 60 speakers (red=female, blue=male; according to spelf-reported gender); referred to speaker-specific most typical pitch (statistical mode), after the second pass of the pitch detection process

 


The pitch data used for this study

The pitch data calculated for this paper will be published as an online dataset. The link to the data will be added on this page.


References

  • D. R. Ladd, Intonational phonology. Cambridge: Cambridge University Press, 1996.
  • J. Leather,“Speaker normalization in perception of lexical tone,” Journal of Phonetics, vol. 11, pp. 373–382, 1983.
  • C. B. Moore and A. Jongman, “Speaker normalization in the perception of Mandarin Chinese tones,” The Journal of the Acoustical Society of America, vol. 102, pp. 1864–1877, 1997.
  • E. Couper-Kuhlen, “The prosody of repetition. on quoting and mimicry,” in Prosody in Conversation, E. Couper-Kuhlen and M. Selting, Eds. Cambridge: Cambridge University Press, 1996.
  • I. Mennen, “Second language acquisition of pitch range in German learners of English,” Studies in Second Language Acquisition, vol. 36, no. 2, pp. 303–329, 2014. Available: https://www.jstor.org/stable/26328942
  • M. Lennes, M. Stevanovic, D. Aalto, and P. Palo, “Comparing pitch distributions using Praat and R,” Phonetician, no. 111-112, pp. 35–53, 2015. Available at: https://researchportal.helsinki.fi/files/237386876/LennesStevanovicAaltoPalo_Phonetician2015.pdf
  • M. Lennes, D. Aalto, and P. Palo,“Puheen perustaajuusjakaumat: Alustavia tuloksia,” in Fonetiikan päivät 2008. XXV Fonetiikan päivillä Tampereen yliopistossa 11.-12.1.2008 pidetyt esitelmät. Tampere Studies in Language, Translation and Culture, Series B 3, M. O’Dell and T. Nieminen, Eds. Tampere: Tampere University Press, 2009, pp. 147–155. Available: https://urn.fi/urn:isbn:978-951-44-7580-1
  • R.S.Moore,“Comparison of children’s and adults’ vocalranges and preferred tessituras in singing familiar songs,” Bulletin of the Council for Research in Music Education, vol. 107, pp. 13–22, 1991. Available: http://www.jstor.org/stable/40318417
  • J. T. Eichhorn, R. D. Kent, D. Austin, and H. K. Vorperian, “Effects of aging on vocal fundamental frequency and vowel formants in men and women,” Journal of Voice, vol. 32, no. 5, pp. 644.e1–644.e9, 2018.
  • G. Saggio and G. Costantini, “Worldwide healthy adult voice baseline parameters: a comprehensive review,” Journal of Voice, vol. 36, no. 5, pp. 637–649, 2022.
  • M. Nishio and S. Niimi, “Changes in speaking fundamental frequency characteristics with aging,” Folia Phoniatr Logop, vol. 60, pp. 120–127, 2008.
  • L. Albuquerque, C. Oliveira, A. Teixeira, P. Sa-Couto, and D. Figueiredo, “A comprehensive analysis of age and gender effects in European Portuguese oral vowels,” Journal of Voice, vol. 37, no. 1, pp. 143.e13–143.e29, 2023.
  • E. T. Stathopoulos, J. E. Huber, and J. E. Sussman, “Changes in acoustic characteristics of the voice across the life span: Measures from individuals 4–93 years of age,” Journal of Speech, Language, and Hearing Research, vol. 54, no. 4, pp. 1011–1021, 2011.
  • S.Deliyski and D.A. Xue, “Effects of aging on selected acoustic voice parameters: Preliminary normative data and educational implications,” Educational gerontology, vol. 27, no. 2, pp. 159–168, 2001.
  • University of Helsinki, “Donate Speech Corpus, version 1.0,” 2022. [Dataset]. Kielipankki – The Language Bank of Finland. Available: http://urn.fi/urn:nbn:fi:lb-2020090321
  • S. Amiriparian, J. Han, M. Schmitt, A. Baird, A. Mallol-Ragolta, M. Milling, M. Gerczuk, and B. Schuller, “Synchronization in interpersonal speech,” Front Robot AI, vol. 6, no. 116, 2019.
  • K. Lindén, T. Jauhiainen, M. Lennes, M. Kurimo, A. Rossi, T. Kurki, and O. Pitkänen, “Donate Speech: Collecting and sharing a large-scale speech database for social sciences, humanities and artificial intelligence research and innovation,” in CLARIN: The Infrastructure for Language Resources, A. W. Darja Fišer, Ed. Berlin: de Gruyter, 2022, pp. 481–510.
  • P. Boersma and D. Weenink. (2022) Praat: doing phonetics by computer (Version 6.3.02). [Computer program]. Available: https: //www.praat.org/. Retrieved on 29.11.2022.
  • P. Boersma, “Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound,” Proceedings of the Institute of Phonetic Sciences, vol. 17, pp. 97–110, 1993.
  • Posit Software, PBC, “RStudio 2022.12.0 build 353,” [Computer program], 2022. Available: https://posit.co/downloads/.

How to cite this poster presentation page

Mietta Lennes & Minnaleena Toivola (2023). Pitch distributions in a very large corpus of spontaneous Finnish speech. Poster and supplementary materials. Interspeech 2023, 20.-24.8.2023, Dublin, Ireland. Available: http://urn.fi/urn:nbn:fi:lb-2023081621.

Cite the original conference article:

Lennes, M., Toivola, M. (2023). Pitch distributions in a very large corpus of spontaneous Finnish speech. Proc. INTERSPEECH 2023, 4778-4782, doi: 10.21437/Interspeech.2023-1822

<< FIN-CLARIAH Overview

FIN-CLARIAH Summer Event 6.6.2023 11-17

Place: CSC, Life Science Centre, Keilaranta 14 C, Espoo

 

 

Preliminary Program

11.00-11.10 Welcoming Words by Katri Tegel, Development Manager, CSC

11.10-12.00 Keynote: Mikko Kurimo, Professor of Speech and Language Processing, Aalto University

12.00-13.00 Lunch

13.00-15.00 Thematic Groups

  1. DH Education:
    This group gathers together people who are interested in DH education: how can we disseminate our RI services to Finnish SSH communities through education, both in the short and long term?
  2. Documentation:
    This group develops best practices for documentation inside the project: what is the quality we want to reach by the end of the year, and how our documentation varies in different contexts (e.g., graphical user interfaces versus code repositories)?
  3. Speech Data in Research:
    This group discusses the needs of researchers using speech data: what is the state-of-the-art, and how is FIN-CLARIAH going to push the field further?
  4. Visual Sources in Research:
    This group discusses the needs of researchers using visual sources (videos, images, photos): what is the state-of-the-art, and how is FIN-CLARIAH going to push the field further?

15.00-15.30 Coffee

15.30-16.15 Sharing the Results from the Groups

16.15-17.00 Free Chilling & Refreshments / Parallel session: Executive Board Meeting (with Zoom option)

 

FIN-CLARIAH Workshop Day 18.11.2022 11-17 @ University of Jyväskylä

<< FIN-CLARIAH Overview

Workshop Program

11.00-12.00 Jari Ojala: Welcoming words + Pasi Tyrväinen: Keynote

12.00-13.00 Lunch

13.00-13.15 Anna Sendra Toset: Results from FIN-CLARIAH interviews

13.15-14.30 Teamwork in thematic groups:

  1. CSC integration – Slides (Martin Matthiesen)
  2. Data licensing – Slides (Mietta Lennes)
  3. The end-user perspective I (Eetu Mäkelä)
  4. The end-user perspective II  (Mikko Laitinen)

14.30-15.00 Coffee 

15.00-16.00

  • Reports from thematic groups
  • Mikko Tolonen: Why metadata matters in FIN-CLARIAH? (Slides)
  • General discussion 

16.00-17.00 Socializing & refreshments (Executive board meeting)

 

Discover efficient workflows and plan your research data management in the Data Clinic course!

The open online course Data Clinic kicks off on 11th November 2022 and ends in late April 2023. During the winter and spring, you learn to write a Data Management Plan and get practical advice and support for collecting, processing and managing your research data. The participants will be working partly independently and partly in small groups of peers. You may attend the entire course remotely.

The course materials will be provided mainly in English. Students from all universities and all fields are welcome if space allows. The only prerequisite is that you are already starting a research project where you need to process and manage a data set that contains text documents or speech recordings, i.e., some language data.

Read more and join the course by 28.11.2022!

Last modified on 2022-11-07

Search the Language Bank Portal:
Pekka Posio
Researcher of the Month: Pekka Posio

 

Upcoming events


Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information