Workshop: ”Accessing Data for Large Language-based Text and Speech Models”

Wednesday 05.11.2025 at 8:30-14:00, Helsinki

Organizers:

Department of Digital Humanities, University of Helsinki
LAREINA project

Location:

University of Helsinki, Main Building (Unioninkatu 34 entrance)

Welcome to the Workshop!

The development of language-centric AI during the past few years has been remarkable. It poses challenges but also creates opportunities for organizations both in the private and the public sector. Many of us are curious about how to harness the power of AI in our own business.

Our workshop on Accessing Data for Large Language-based Text and Speech Models will explore the potential benefit of copyrighted data vs. freely available data as well as recent results in training speech models using massive data sets.

This workshop is addressed to developers, integrators and users of language technologies and AI solutions in Finland. The workshop will be held in English and on-site only.

Registration

Registration has been closed.
Participation is free of charge, but registration is required. We have 50 seats available. Lunch and coffee breaks are included in the workshop.

If you have any questions, please contact the organizers via email: lareina-office AT helsinki.fi

Workshop: ”Accessing Data for Large Language-based Text and Speech Models”

Programme for the Workshop on Wednesday 05.11.2025

08:30 – 09:00	Registration and Coffee University of Helsinki, Main Building (Unioninkatu 34, Senate Square entrance), 3rd floor
09:00 – 10:30	Session 1: ”Copyright & LLMs”, room: Karolina Eskelin (U3032), 3rd floor 09:00–09:15 Welcome and Introduction Krister Lindén, University of Helsinki 09:15–09:45 ”The AI Act and its impact on LLMs” (remote) Paweł Kamocki, CLARIN Legal and Ethical Issues Committee 09:45–10:15 ”The Legacy of Mímir: LLMs and Copyright at the National Library of Norway” (remote) Javier de la Rosa, National Library of Norway 10:15–10:30 Discussion
10:30 – 11:30	Coffee break with Standing tables / Demo presentations, room: Christina (U2085), 2nd floor
11:30 – 13:00	Session 2: ”Speech Technology in Society”, room: Karolina Eskelin (U3032), 3rd floor 11:30–12:00 ”Speech synthesis in Sámi and Karelian – Bringing minority voices into innovation projects” Tove Mylläri, Yle 12:00–12:30 ”AI-assisted customer call transcription” Henry Granholm, Kela 12:30–13:00 ”Unlocking the Potential of Radio and Television Archives for Automatic Speech Recognition” Yaroslav Getman, Aalto University
13:00 – 14:00	Lunch Restaurant Flora, 2nd floor

This workshop is organized by the LAREINA project and the University of Helsinki.

Contact the organizers for further details:

lareina-office [ATT] helsinki.fi

Materials (internal)

Last modified on 2025-11-05