Ilmoittautuminen ESSLLI 2026 -kesäkouluun on avoinna (early bird -hinta voimassa 31.5. asti)

Name: Conference: CMC and Social Media Corpora for the Humanities (CMC-Corpora)
Start: 2026-08-27T00:00:00+03:00
End: 2026-08-28T23:59:59+03:00
Location: Oulu

The European Summer School in Logic, Language and Information (ESSLLI) on vuodesta 1989 lähtien ollut yksi johtavista monitieteisistä kesäkouluista, joka yhdistää logiikan, kielitieteen, tietojenkäsittelytieteen ja tekoälyn.

ESSLLI kokoaa vuosittain noin 400 osallistujaa eri puolilta Eurooppaa sekä Pohjois- ja Latinalaisesta Amerikasta ja Aasiasta. Tapahtumasta on muodostunut logiikan, kielitieteen ja tietojenkäsittelytieteen nuorille tutkijoille ja opiskelijoille tärkeä kohtaamispaikka, jossa keskustellaan ajankohtaisesta tutkimuksesta ja jaetaan osaamista.

ESSLLI 2026
Date: 3.–14. elokuuta 2026
Location: Praha, Tšekki

ESSLLI 2026 tarjoaa:

Kursseja perustasolta edistyneeseen tasoon
Syventäviä työpajoja logiikan, kielen, laskennan ja tekoälyn aloilta
Iltaluennoitsijoina kansainvälisesti arvostettuja tutkijoita
Erittäin monitieteisen ja kansainvälisen tutkimusympäristön

Tekoälyn parissa työskenteleville opiskelijoille ja tutkijoille ESSLLI on erityisen hyödyllinen sen vahvan päättelyä, kieliteknologiaa, formaaleja menetelmiä, laskentaa ja kognitiota painottavan sisällön ansiosta.

Mukaan ehtii ilmoittautua edullisemmalla early bird -hinnalla 31.5. asti. Tämän jälkeen ilmoittautuminen jatkuu normaalihinnalla 18.7. asti.

Tutustu ESSLLI 2026 -kesäkouluun alla olevien linkkien kautta (englanniksi):

Registration for ESSLLI 2026 Summer School is open (early bird closes on 31 May)

The European Summer School in Logic, Language and Information (ESSLLI) is one of the leading interdisciplinary summer schools connecting logic, linguistics, computer science, and AI since 1989.

ESSLLI attracts every year around 400 participants from all parts of Europe, as well as from North and Latin America, and Asia. The ESSLLI has become the main meeting place for young researchers and students in logic, linguistics and computer science to discuss current research and to share knowledge.

ESSLLI 2026
Date: August 3–14, 2026
Location: Prague, Czech Republic

ESSLLI 2026 offers:

Courses ranging from foundational to advanced levels
Specialized workshops across logic, language, computation, and AI
Evening lectures by internationally recognized researchers
A highly interdisciplinary and international research environment

For students and researchers working in Artificial Intelligence, ESSLLI is especially valuable given the strong focus on reasoning, language technologies, formal methods, computation, and cognition.

If you are considering attending, now is the time to register before the early bird deadline.

<< FIN-CLARIAH Overview

Save the date:

FIN-CLARIAH Summer Meeting: “Networks & Communities”

11.-12.6.2026

Venue:

Room AU111, Yliopistokatu 2, Aurora-building
Joensuu Campus, University of Eastern Finland

See the event website at UEF

The FIN-CLARIAH Summer meeting takes place in Joensuu on 11 and 12 June, 2026. This year’s program brings together thematic plenary sessions and dedicated time to share ongoing work across the consortium. Our focus is on the relationships between digital materials, from texts to multimodal content and their creators, and the networks and communities that emerge around them.

We have two invited speakers. Ruth Ahnert is Professor of Literary History & Digital Humanities at Queen Mary University, London. She works at the crossroads of literary studies and computational linguistics, with an interest in networks. Read more here.

Tuija Saresma is Professor of Cultural Studies at the University of Eastern Finland. Her work focuses on e.g. language tensions and hate speech on social media. Read more here.

The program also includes presentations of tools and services developed within the FIN-CLARIAH network, and an impact roadshow showcasing research outputs of the infrastructure.

Registration:

Please register via this link by 29 May, 2026.

For any questions, please contact Mikko Laitinen and/or Paula Rautionaho (firstname.lastname at uef.fi).

Suggested train connections Helsinki–Joensuu–Helsinki

Thursday 11.6.2026: IC3 leaving 10:41 from Helsinki, arriving at 14:51 in Joensuu
Friday 12.6.2026: IC10 leaving 15:10 from Joensuu, arriving at 19:45 in Helsinki.

Preliminary program

Thursday 11 June, 2026:

16.00-16.10 Opening and welcome (Mikko Laitinen)

16.10-17.10 Prof. Ruth Ahnert (Queen Mary University): Letter networks and the problem of aliases

17.15-18.15 FIN-CLARIAH Board meeting

19-21 Get together (self-funded)

Friday 12 June, 2026:

9.00-11.00 Select infrastructure tool/services presentations

- Henna Poikkimäki (Semantic Computing Research Group): Sampo Systems: From Cultural Heritage Data to Networks
- Tuomas Lundberg (TurkuNLP): tba
- Ville Vaara (COMHIS): tba
- Masoud Fatemi & Mehrdad Salimi (UEF): tba
- Language Bank of Finland/CSC: Mink – a service for processing and searching your own text corpora

11.00-12.00 Prof. Tuija Saresma (UEF): Making sense of hate speech: Researching a complex phenomenon from lived experience to large datasets

12.00-13.00 Lunch (self-funded in Aura)

13.00-14.15 Research output harvest (short presentations of published research using the infrastructure: please contact the organizers if you’d like to present your work!)

14.15-14.30 Farewell coffee

<< FIN-CLARIAH Overview

Invitation: Join our workshop on April 22, 2026, in Helsinki

Welcome to the Digital Language Sovereignty – Euskadi–Finland AI & Language workshop, which brings together researchers and experts from Finland and the Basque Country (Euskadi) to discuss current developments and future opportunities in artificial intelligence and language technologies.

The event is free of charge and will take place at the University of Helsinki, City Centre Campus, on 22 April 2026 from 9:00 to 16:00.

During the day, we will address topics including:

digital language sovereignty
low-resource languages
language models, datasets, and research infrastructures
industry perspectives and knowledge transfer

The workshop is aimed at researchers and doctoral students, language specialists, representatives of public administration, and professionals working with AI and language models.

Register via the event page: https://euskorpora.eus/en/evento/workshop-digital-language-sovereignty-euskadi-finland-ai-language-workshop/

Tapahtumakutsu: Ilmoittaudu 22.4.2026 järjestettävään työpajaan

Tervetuloa työpajaan Digital Language Sovereignty – Euskadi–Finland AI & Language, joka kokoaa yhteen tutkijoita ja asiantuntijoita Suomesta ja Baskimaasta (Euskadi) keskustelemaan tekoälyn ja kieliteknologian ajankohtaisista kehityssuunnista sekä tulevaisuuden mahdollisuuksista. Tilaisuus on maksuton ja se järjestetään Helsingin yliopiston keskustakampuksella 22.4.2026 klo 9.00-16.00.

Keskustelemme päivän aikana mm. seuraavista teemoista:

kielten digitaalinen itsemäärämisoikeus
vähäresurssiset kielet
kielimallit, aineistot ja tutkimusinfrastruktuurit
yksityissektorin näkökulmat ja käytännön sovellukset

Tilaisuus on suunnattu tutkijoille ja jatko-opiskelijoille, kieliasiantuntijoille, julkisen hallinnon edustajille sekä tekoälyn ja kielimallien parissa työskenteleville asiantuntijoille.

Lue lisää täältä: https://www.kielipankki.fi/tapahtumat/digital-language-sovereignty-euskadi-finland-ai-language-workshop/

Ilmoittaudu mukaan tapahtumasivulla: https://euskorpora.eus/en/evento/workshop-digital-language-sovereignty-euskadi-finland-ai-language-workshop/

<< FIN-CLARIAH Overview

FIN-CLARIAH Roadshow: From Data to Intelligence

25.3.2026 12:45-15:30

Aalto University, TUAS Atrium (Maarintie 8, 02150 Espoo)

Zoom link, for remote participants: https://aalto.zoom.us/j/65249607753

This Roadshow event at Aalto University showcases scientific terminology work supporting researchers with the new language recommendations now being implemented, and introduces latest related infrastructures and research. We also wish to raise awareness of the resources, services and collaboration networks offered via FIN-CLARIAH. During the event, you will learn about The Helsinki Term Bank for the Arts and Sciences and see how you can utilize and contribute to it. In invited talks we present Finnish ontology resources available that can be used to engineer consistent terminologies for different research purposes and applications, and local researchers present their research illustrating the relevance of concepts, terminologies and ontologies for both human sciences and engineering. More widely, we wish to foster discussion on concept-aware AI.

Registration:
This event is free but requires registration. Please register no later than 20.3.2026 to ensure refreshments or to follow the event remotely.
Register here!

Programme

12:45 Welcome coffee

Introduction to the infrastructure and to the field of Terminology

13:00 FIN-CLARIAH Introduction with Eetu Mäkelä, Professor of Digital Humanities, University of Helsinki (pdf)

13:10 Finding, using and sharing research data via Kielipankki – the Language Bank of Finland with Mietta Lennes, RI Specialist, University of Helsinki (pdf)

13:20 Introduction to “Terminology work at Aalto University” with Minna Söderqvist, Head of Doctoral Education Services at Aalto University and Antti Susiluoto, Planning Officer, Doctoral education services, Aalto University (Moderator Harri Kettunen) (pdf)

13:30 “How Can I Contribute to the Reliability of Information? Collaborative Terminology Work at the Helsinki Term Bank for the Arts and Sciences” with Harri Kettunen & Nathaly Pinto Torres (pdf)

13:50 Q&A

Tools and application scenarios for terminology engineering, concept aware AI

14:15 DARIAH-FI Introduction with Inés Matres (pdf)

14:30 National Ontology and Linked Open Data Infrastructure and Data Services (Osma Suominen, National Library of Finland) (pdf)

14:45 Case Study: Analysis of genre evolution in Finnish fiction literature using BookSampo’s annotated genres (Annastiina Ahola, Aalto University) (pdf)

15:00 Case Study: Utilizing annotated keywords for prosopographical data analysis in Sampo systems (Petri Leskinen, University of Helsinki, Aalto University, and Geneva Graduate Institute) (pdf)

Final discussion

What is FIN-CLARIAH?

FIN-CLARIAH is the Finnish digital research infrastructure (RI) for Social Sciences and Humanities (SSH) connecting ten universities as well as memory organisations in collaboration with CSC to facilitate the use, refinement, preservation and sharing of research data for the SSH researchers in Finland. FIN-CLARIAH has two parts: FIN-CLARIN integrates language resources and tools for processing language data through Kielipankki – the Language Bank of Finland, and DARIAH-FI joins a network of researchers with methodological expertise from SSH to data science developing versatile analysis tools and supporting big data approaches.

<< FIN-CLARIAH Overview

<< Back to the home page

Kielipankki Workshop: Supporting the Research Community in the Age of AI

Location:
University of Helsinki
Main building (Senate Square side), Unioninkatu 34, 00170 Helsinki
Room U3039 (3rd floor)

Date:
Friday 06.02.2026 09:30–11:30

Supporting the Research Community in the Age of AI

The aim of this workshop is to gather ideas and discuss how our Research Infrastructure can better support the Research Community. We will hear presentations and updates from a few ongoing projects, as well as learn about new CSC services for researchers in the digital humanities. After the presentations, there will be time for discussion. The event is open to everyone, and we would especially like to invite FIN-CLARIAH collaborators and researchers who use Kielipankki and CSC resources to join this workshop.

The workshop can also be followed remotely via Zoom.

Schedule

09:30 Coffee

10:00 – 11:30 Presentations & Discussion (stream)

Jörg Tiedemann (University of Helsinki): OpenEuroLLM (presentation)
Mikko Tolonen (University of Helsinki): Computational History: CASCADE and MECANO Marie Curie Training Networks (presentation)
Katri Tegel (CSC): Data and HPC: Upcoming solutions at CSC (presentation)
Harri Kettunen (University of Helsinki): News from the Helsinki Term Bank for the Arts and Sciences (presentation)

11:30 The workshop ends

<< FIN-CLARIAH Overview

FIN-CLARIAH Milestone Meeting 28.11.2025

Venue:
University of Helsinki
Metsätalo (Unioninkatu 40, 00170 Helsinki)
Room 6 (3rd floor/B-wing)

Time:
Friday 28.11.2025 10:30–17:00

Registration link

Annotating Social Data

The FIN-CLARIAH Milestone Meeting (M4), titled Annotating Social Data, brings together researchers, infrastructure developers, and social scientists to explore current practices and future needs in annotating data sets in social sciences for further analysis. Hosted by the Centre for Social Data Science (CSDS), Faculty of Social Science, University of Helsinki, this milestone event delves into the methodological, technical, and ethical dimensions of annotating sensitive and small-scale textual data such as interviews, policy documents or social media posts, extending up to very large data sets typical in social media studies.

The day opens with a keynote by Salla-Maaria Laaksonen on collecting, annotating, and analyzing social media data, followed by insight talks from Krista Lagus on theory-based annotation using large language models (LLMs) and Hisayo Katsui. Afternoon sessions offer practical perspectives on secure environments for handling sensitive data, hands-on demonstrations with the CSC Secure Desktop environment, and discussions on ethical agreements and algorithmic transparency.

The event supports the broader goals of FIN-CLARIAH in developing national infrastructure for digital humanities and social sciences, and welcomes researchers and students from across disciplines interested in qualitative data, narrative analysis, secure practices for sensitive data, and the responsible use of AI tools in the annotation pipeline.

Schedule

10:30 Welcome coffee

11:00 Welcome words

Timo Kaartinen
Krister Lindén

11:10 Keynote

Salla-Maaria Laaksonen, University of Helsinki:
Dream infrastructures for a social scientist: experiences and hopes built on ten years of interdisciplinary computational hermeneutics

12:00 Lunch

13:00 Insight talks

Hisayo Katsui, University of Helsinki: Sensitivity of disability data
Krista Lagus, University of Helsinki: Social Theory-based annotation of Text Data using LLMs

13:30 Afternoon parallel sessions

Room 6: CSC AITTA environment to deal with LLMs. Training large language model towards annotating data (Facilitator: Martin Matthiesen, CSC)
Room 11: CSC Sensitive Data (SD) environment and how to use it – hands on session (BYO) (Facilitators: Francesca Morello, Kimmo Mattila, CSC)
Room 26 (B wing, 5th floor): Agreements for the reuse of social media and interview data (Facilitator: Mietta Lennes, The Language Bank of Finland) – Slides (PDF)

16:00 Plenary ends / Mingling

(Steering group meeting in parallel)

<< FIN-CLARIAH Overview

Workshop: ”Accessing Data for Large Language-based Text and Speech Models”

[Programme]

Wednesday 05.11.2025 at 8:30-14:00, Helsinki

Organizers:

Department of Digital Humanities, University of Helsinki
LAREINA project

Location:

University of Helsinki, Main Building (Unioninkatu 34 entrance)

Welcome to the Workshop!

The development of language-centric AI during the past few years has been remarkable. It poses challenges but also creates opportunities for organizations both in the private and the public sector. Many of us are curious about how to harness the power of AI in our own business.

Our workshop on Accessing Data for Large Language-based Text and Speech Models will explore the potential benefit of copyrighted data vs. freely available data as well as recent results in training speech models using massive data sets.

This workshop is addressed to developers, integrators and users of language technologies and AI solutions in Finland. The workshop will be held in English and on-site only.

Registration

Registration has been closed.
Participation is free of charge, but registration is required. We have 50 seats available. Lunch and coffee breaks are included in the workshop.

If you have any questions, please contact the organizers via email: lareina-office AT helsinki.fi

Workshop: ”Accessing Data for Large Language-based Text and Speech Models”

Programme for the Workshop on Wednesday 05.11.2025

08:30 – 09:00	Registration and Coffee University of Helsinki, Main Building (Unioninkatu 34, Senate Square entrance), 3rd floor
09:00 – 10:30	Session 1: ”Copyright & LLMs”, room: Karolina Eskelin (U3032), 3rd floor 09:00–09:15 Welcome and Introduction Krister Lindén, University of Helsinki 09:15–09:45 ”The AI Act and its impact on LLMs” (remote) Paweł Kamocki, CLARIN Legal and Ethical Issues Committee 09:45–10:15 ”The Legacy of Mímir: LLMs and Copyright at the National Library of Norway” (remote) Javier de la Rosa, National Library of Norway 10:15–10:30 Discussion
10:30 – 11:30	Coffee break with Standing tables / Demo presentations, room: Christina (U2085), 2nd floor
11:30 – 13:00	Session 2: ”Speech Technology in Society”, room: Karolina Eskelin (U3032), 3rd floor 11:30–12:00 ”Speech synthesis in Sámi and Karelian – Bringing minority voices into innovation projects” Tove Mylläri, Yle 12:00–12:30 ”AI-assisted customer call transcription” Henry Granholm, Kela 12:30–13:00 ”Unlocking the Potential of Radio and Television Archives for Automatic Speech Recognition” Yaroslav Getman, Aalto University
13:00 – 14:00	Lunch Restaurant Flora, 2nd floor

This workshop is organized by the LAREINA project and the University of Helsinki.

Contact the organizers for further details:

lareina-office [ATT] helsinki.fi

Materials (internal)

<< FIN-CLARIAH Overview

FIN-CLARIAH Summer Meeting 2025

Venue: University of Jyväskylä
Time: Friday 13.6.2025 11:00–17:00

(Scroll down for registration, please register by June 2nd)

Roads to multimodality in research: streams, videos, and building RI for the future

With the recent news that FIN-CLARIAH has been granted lighthouse status, we are now positioned to lead the way in advancing essential infrastructure areas, from impact and functionality, to service provision and collaborative use. This day in Jyväskylä will be dedicated to setting standards in latest and multimodal modes of research. With insights from processing audiovisual data, we aim to build connections and spark ideas for future development. In addition, we will have thematic group sessions that will convene on annotation, education provision, and code review for developers.

Date: Fri. 13 June 2025 11:00–17:00
Venue: University of Jyväskylä Seminaarimäki, Building P, Conference Room Lyhty (Seminaarinkatu 15, 40100 Jyväskylä)
Registration is closed – please contact organizers if needed.

Schedule

10:30 Welcome coffee

11:00 Welcome words (Raine Koskimaa, University of Jyväskylä / Krister Lindén, FIN-CLARIAH)

11:10 Keynote (online): Erkut Erdem (Hacettepe University): Computer Vision and Machine Learning – Incorporating Context into Visual Processing (to be confirmed) (Zoom link)

12:00 Lunch break. Location next to the venue (at own expense)

13:00 Insight talk: Tommi Jantunen & Juhana Salonen: “The multiple possibilities of signed language corpora” (Chair: Raine Koskimaa) (Zoom link)

13:30 Group discussions

Code-sharing session for developers – every participant walks the others through a part of their code/deployment/design, sharing what they’re happy with and what they’re having problems with; sharing best practices and receiving best practice advice (Facilitator: Eetu Mäkelä)
What is Mink? – Presenting & testing the Mink platform. “Min Korp” (Mink) is a service to enable users to bring their own data to the Korp platform. Like Korp, Mink was developed by Språkbanken Text in Gothenburg. We will present the status of the adaptation to the Language Bank of Finland. (Facilitator: Martin Matthiesen)
Education within FIN-CLARIAH – every participant shares what skills are required for using their part of the RI and what do they teach locally to provide those skills to SSH researchers (=mapping skill provision for the RI and identifying the training gaps in relation to RI uptake) (Facilitator: Sanna Kumpulainen)

15:00 Panel discussion. Short presentations of topics by group facilitators

16:00 Mingling

(16:00–17:00 Steering group meeting)

17:00 “Waiting for the last train” Possibility to have a drink/dinner near the railway station (at own expense)

<< FIN-CLARIAH Overview

FIN-CLARIAH Visit to CSC Data Center in Kajaani

Venue: CSC Data Center, Kajaani, Finland
Time: 25.-26.9.2025

CSC – IT Center for Science is inviting FIN-CLARIAH participants to visit the Data Center in Kajaani. Join the excursion, meet colleagues and see the culture-historically interesting surroundings and premises of LUMI and other supercomputers with your own eyes! We are also going to have FIN-CLARIAH project meetings on the train. Note that you can follow the presentations on Friday in Zoom, see details below.

Registration is closed.

Hotel suggestions

If you want to come by train, read these instructions on how to book it.

Schedule:

Thursday 25.9.2025

Unfortunately no online participation on Thursday.

8.19-14.35	Train IC63 from Helsinki to Kajaani (optional) Topic for meeting on the train: What could be a ”AI killer application / use case” for FIN-CLARIAH in the future? (Inspirational Youtube video: How AI helped to determine protein structures (22min, problem statement in the first minute) Collaborative notes and more detailed agenda
14:35 – 15:30	Transfer to Tehdaskatu 15 Kajaani (30 min walk, weather permitting)
15:30 – 17:30	Visit to the CSC Data Center, Tehdaskatu 15 Kajaani Identity checks (you will need a passport, id card or driver’s license) and approx. 300 m walk from the Portti building to the LUMI Visitor Center Presentation: LUMI and AI Factory (Katja Mankinen, Senior data scientist, CSC) Presentation: Aitta (Juho Keränen, Aitta product owner, CSC) Guided Tour through the Data Center
~18:00	Check-in hotel
19:00 –	Dinner at Ravintola Terva (own cost) https://www.ravintolaterva.fi/a-la-carte/

Friday 26.9.2025

Meeting at Ravintola Wanha Kerho Kauppakatu 40, 87100 Kajaani

If you can’t make it to Kajaani, you can still follow presentations online in Zoom:

https://cscfi.zoom.us/j/61282104806

Your feedback is important for developing CSC services.

9:00 – 10:00	Presentation of ideas from the discussion in the train on Thursday
10:00 – 10:30	Roihu – AI and data management (Sebastian von Alftan, Develoment manager, CSC)
10:30 – 10:40	Coffee break
10:40 – 11:00	Discussion in small groups: What methods can we use to estimate whether there is online data in the RI to answer a specific research question?
11:00 – 12:00	LAIFS DaaS and discussion of needs from digital humanities (Heidi Laine, Senior specialist, CSC)
12:00 – 13:00	Lunch
13:00 – 14:00	HPC potential use cases in DH (Introduction by Eetu Mäkelä, UHEL; Johanna Lilja, NLF) + Discussion by all participants)

End of common event

14.35-21.45

Train IC70 from Kajaani to Helsinki (optional)

”How can we benefit from an HPC inference engine in the FIN-CLARIAH research infrastructure?”

<< FIN-CLARIAH Overview

FIN-CLARIAH Roadshow event

Place: University of Vaasa (Wolffintie 34, Tervahovi building, room D124)
Time: 14.3.2025 12:15-15:30

FIN-CLARIAH Roadshow in Vaasa: Some take-outs (blog posting, DARIAH-FI)
Watch a recording of the event!

What is new in the Finnish digital research infrastructure?

FIN-CLARIAH (Common Language Resources and Technology Infrastructure) is a fundamental Finnish digital research infrastructure (RI) for Social Sciences and Humanities (SSH). FIN-CLARIAH connects ten universities and memory organisations across the country. The Finnish Research Infrastructure Committee has granted roadmap status to FIN-CLARIAH for 2025–2028. This roadmap guides the research community, policymakers, and funders in directing investment and supporting research, development, and innovation.

FIN-CLARIAH will participate in a roadshow event in Vaasa 14.3., where some of the central resources and tools for social science and humanities (SSH) research will be presented, with special emphasis on acquiring, processing and depositing born-digital data. There will be informative presentations, but also hands-on activities in the program. From the University of Vaasa, the Natureach project will join the event with a presentation of AR/VR data and its analysis.

This is an in-person event at the University of Vaasa (Location: Tervahovi building, room D124). To participate in the hands-on activities, bring your own laptop!

Please register for the FIN-CLARIAH Roadshow event here: Registration has ended.

This roadshow is hosted by Prof. Merja Koskela

Programme

12.15 – 12.20	Merja Koskela: Introduction and welcome
12.20 – 12.30	Inés Matres: FIN-CLARIAH Overview of digital infrastructure for Social Sciences and Humanities (presentation slides)
12.30 – 12.45	Mietta Lennes: How to find, use and deposit your research data and tools via Kielipankki – The Language Bank of Finland (presentation, PDF / presentation, PPTX)
12.50 – 13.15	Harri Kettunen & Tiina Onikki-Rantajääskö: How Can I Contribute to the Reliability of Information? Collaborative Terminology Work at the Helsinki Term Bank for the Arts and Sciences
13.15 – 13.30	Break
13.30 – 14.00	Erik Henriksson: TurkuNLP tools to make sense of noisy web data (remote) Presentation slides
14.00 – 14.30	Raine Koskimaa, Ida Toivanen & Jari Lindroos: “How to use” – Twitch collector and analysis tools
14.30 – 15.00	Martta Ylilauri, Joni-Roy Piispanen & Rebekah Rousi: Natural impact – VR in NATUREACH / Luonnon hyvinvointivaikutukset – VR NATUREACH-hankkeessa
15.00	Discussion

<< FIN-CLARIAH Overview

Tule mukaan verkkokurssille Korpuslingvistiikka ja tilastolliset menetelmät (13.1.–28.2.2025)

Verkkokurssi Korpuslingvistiikka ja tilastolliset menetelmät on suunnattu kieliaineiden ja muiden aineiden opiskelijoille, jotka haluavat oppia korpusten käytön perusteet.

Kurssi on tarjolla sekä suomeksi että englanniksi ja se on ilmainen kaikille yliopisto-opiskelijoille Suomessa ja Suomen ulkopuolella. Avoimen yliopiston kautta kurssin voi suorittaa maksullisena. Kurssi alkoi maanantaina 13.1.2025, mutta osallistujamäärän salliessa kurssialueelle voi liittyä vielä 24.1. saakka.

Lue lisää ja ilmoittaudu mukaan!

Huom. Sama kurssi järjestetään uudelleen seuraavan kerran syksyn 1. periodissa, 1.9.2025 alkaen (seuraa kurssisivua).

Language Data Space and ALT-EDIC in Finland

What is the European Language Data Space (LDS)?

The EU is in the process of creating an internal market for all types of data. The aim is to ensure that data can be shared from one stakeholder to another within the region, in accordance with the EU legislation. Data sharing requires interactive networks – data spaces – that can connect data providers and users, and offer a platform for them to communicate, make contracts and trade with each other.

All the upcoming European data spaces will be developed in line with the European Data Strategy. There are development plans for data spaces for approximately 15 different strategic fields. According to the vision, data spaces will allow for the commercialisation and more efficient re-use of data. This will benefit not only commercial stakeholders in the EU, but also EU citizens by providing them with better digital services, for example. In addition, researchers could gain access to new types of data and materials, which could boost basic research and increase opportunities for product development and innovation.

The European Language Data Space (shortened: LDS) is an ecosystem for the sharing and commercialisation of language data, such as text and speech data, and for the development of large language models and language-centric Artificial Intelligence. The Language Data Space is being developed and coordinated by the LDS Consortium, which was established in early 2023 with the support of the European Commission. The first phase of the LDS will last three years and during this period, the technical and legal framework for the operation of the common language data platform will be established in cooperation with the various stakeholders.

The work on the language data space will also be driven forward by ALT-EDIC, the language technology alliance of EU member states established in early 2024. In particular, ALT-EDIC aims to ensure the development of EU-based large language models.

The Language Data Space will be built partly on top of existing networks and language technology infrastructures. Sitra’s publication Snapshot of Finnish data spaces (2024) summarises well the current situation in Finland with regard to language technologies and the Language Data Space (in Finnish).

LDS Workshop in Finland and elsewhere in Europe

In spring 2024, the LDS Consortium launched a series of country-specific workshops to share information about the possibilities of the common Language Data Space, and to reach as many stakeholders in each member country as possible. The workshops are organised in collaboration with local institutions. In April 2024, Finland had the honour of being the first EU member state to host an LDS workshop. The event was organised locally by the University of Helsinki. More information on workshops in other EU countries and upcoming LDS events can be found on the Language Data Space website.

The Finnish LDS workshop provided an opportunity for organisations and companies in Finland to exchange ideas on the possibilities and challenges that a common platform and marketplace for language models and data could offer. As remote presenters, the workshop featured Philippe Gelin from the European Commission, and Georg Rehm from DFKI in Germany. In the panel discussions (see photos below), partners from the LAREINA project coordinated by the University of Helsinki shared their views on the importance of language data and on the challenges regarding the availability and technical quality of data or regarding copyright constraints. Without access to electronic data of sufficient quality and scope, it is difficult to develop language models for speakers of small and medium-sized languages.

After the LDS workshop, Finland initiated the membership process to join ALT-EDIC as an observer member. After summer 2024, the full membership of Finland in ALT-EDIC was confirmed for the next three years. The administrative representative of Finland in ALT-EDIC is the Ministry of Transport and Communications, with whom the University of Helsinki aims to maintain an active dialogue.

LDS invites businesses and other stakeholders to join the user group

Language Data Space invites European stakeholders to join the LDS User Group. The group includes commercial stakeholders from different sectors as well as representatives from both public administrations and research. The news from the remote meeting of the LDS User Group in November 2024 can be found here. Joining the LDS User Group is done via a form that can be found on the LDS website. In particular, language data providers and utilisers, as well as language model developers, are warmly welcome to join the group.

At the end of 2024, the Language Data Space is entering the pilot phase, where the Language Bank of Finland is also actively involved. The aim is to test the pilot version of the LDS platform in Finland and to collect user feedback. The Language Bank of Finland is also planning to organise a workshop in spring 2025 on the Language Data Space, ALT-EDIC and copyright issues. We will inform about this upcoming event on our website and through the LAREINA project.

All photos in the article: Jyrki Niemi / University of Helsinki

Language Data Space (LDS) ja ALT-EDIC Suomessa

Mikä on European Language Data Space (LDS)?

EU luo parhaillaan kaikenlaiselle datalle sisämarkkinoita, joilla pyritään varmistamaan datan liikkuvuus alueen toimijoiden välillä EU:n lainsäädännön mukaisesti. Tähän tarkoitukseen tarvitaan vuorovaikutteisia verkkoja – data-avaruuksia – jotka voivat yhdistää datan tarjoajat ja käyttäjät ja tarjota heille alustan keskinäiseen viestintään, sopimusten laatimiseen ja kaupankäyntiin.

Kaikki vireillä olevat eurooppalaiset data-avaruudet kehitetään Euroopan datastrategian mukaisesti. Data-avaruuksia ollaan pystyttämässä jo noin 15 eri toimialalle. Tarkoituksena on, että niiden avulla dataa voitaisiin kaupallistaa ja tehostaa sen uudelleenkäyttöä. Tästä hyötyisivät EU-alueen kaupallisten toimijoiden ohella myös kansalaiset, kun esimerkiksi digitaaliset palvelut paranisivat. Lisäksi tutkijat voisivat saada pääsyn uudenlaisiin aineistoihin, mikä tukisi perustutkimusta ja parantaisi mahdollisuuksia tuotekehittelyyn ja innovaatioihin.

European Language Data Space (lyhenne: LDS) eli Eurooppalainen kielidata-avaruus on kielidatan, kuten teksti- ja puheaineistojen, jakamiseen ja kaupallistamiseen sekä suurten kielimallien ja kielikeskeisen tekoälyn kehittämiseen tarkoitettu ekosysteemi. Kielidata-avaruutta kehittää ja koordinoi LDS-konsortio, joka perustettiin Euroopan komission myötävaikutuksella alkuvuonna 2023. LDS:n ensimmäinen vaihe kestää kolme vuotta, joiden aikana on tarkoitus luoda yhteisen kielidata-alustan toiminnalle tekniset ja juridiset puitteet yhteistyössä eri sidosryhmien kanssa.

Kielidata-avaruuden edistämiseen osallistuu myös ALT-EDIC, alkuvuonna 2024 perustettu kieliteknologian allianssi, jonka jäseninä ovat EU-valtiot. ALT-EDICin tavoitteena on varmistaa etenkin EU-lähtöisten suurten kielimallien kehittäminen.

Kielidata-avaruutta rakennetaan osin jo olemassa olevien verkostojen ja kieliteknologisten infrastruktuurien päälle. Sitran julkaisu Suomalaisten data-avaruuksien tilannekuva (2024) kiteyttää hyvin Suomen tilanteen kieliteknologioiden ja kielidata-avaruuden osalta.

LDS-työpaja Suomessa ja muualla Euroopassa

LDS-konsortio käynnisti keväällä 2024 maakohtaisten työpajojen sarjan, jotta tieto yhteisen kielidata-avaruuden mahdollisuuksista kantautuisi eteenpäin ja tavoittaisi kunkin jäsenmaan sidosryhmiä mahdollisimman laajalti. Työpajat järjestetään yhteistyössä paikallisten toimijoiden kanssa. Huhtikuussa 2024 Suomella oli kunnia olla ensimmäinen EU:n jäsenvaltio, jossa toteutettiin LDS-työpaja. Paikallisena järjestäjänä toimi Helsingin yliopisto. Muissa EU-maissa järjestettyihin työpajoihin sekä tuleviin LDS-tapahtumiin voi tutustua Language Data Spacen verkkosivuilla.

LDS-työpaja tarjosi Suomessa toimiville organisaatioille ja yrityksille tilaisuuden vaihtaa ajatuksia siitä, millaisia mahdollisuuksia ja haasteita yhteinen kielimallien ja -datan jakelualusta ja kauppapaikka voisi tarjota. Työpajassa vierailivat etäpuhujina Philippe Gelin Euroopan komissiosta sekä Georg Rehm DFKI:sta, Saksasta. Paneelikeskusteluihin (ks. kuvat alla) osallistui Helsingin yliopiston koordinoiman LAREINA-hankkeen yhteistyökumppaneita, joilla on kokemusta kielidatan merkityksestä sekä tietoa haasteista, jotka liittyvät datan saatavuuteen, tekniseen laatuun tai tekijänoikeuksien tuomiin rajoitteisiin. Jos riittävän laadukkaisiin ja laajoihin sähköisiin aineistoihin ei ole pääsyä, on vaikea kehittää omia kielimalleja pienten ja keskisuurten kielten puhujille.

LDS-työpajan jälkimainingeissa Suomi käynnisti jäsenyysprosessin ja liittyi ALT-EDICiin tarkkailijajäseneksi. Kesän 2024 jälkeen vahvistettiin myös Suomen täysjäsenyys seuraaviksi kolmeksi vuodeksi. Hallinnollisesti Suomea edustava taho ALT-EDIC-asioissa on Liikenne- ja viestintäministeriö, jonka kanssa Helsingin yliopisto pyrkii pitämään aktiivisesti yhteyttä.

LDS kutsuu yrityksiä ja muita sidosryhmiä käyttäjäryhmään

Language Data Space kutsuu eurooppalaisia toimijoita mukaan LDS-käyttäjäryhmään. Mukana ryhmässä on eri alojen kaupallisia toimijoita ja julkishallinnon sekä tutkimuksen edustajia. LDS-käyttäjäryhmän marraskuussa 2024 järjestetyn etätapaamisen kuulumisia voi lukea verkkouutisesta. LDS:n sivuilla on myös käyttäjäryhmän liittymiskaavake. Ryhmään ovat tervetulleita erityisesti kielidatan tarjoajat ja hyödyntäjät sekä kielimallien kehittäjät.

Vuoden 2024 lopulla Language Data Space on siirtymässä pilotointivaiheeseen, jossa myös Kielipankki on aktiivisesti mukana. Tavoitteena on testata Suomessa LDS-alustan pilottiversiota sekä kerätä siitä käyttäjäpalautetta. Kielipankki suunnittelee myös järjestävänsä keväällä 2025 työpajan, jonka aiheina ovat LDS:n ja ALT-EDICin lisäksi tekijänoikeusasiat. Tiedotamme tapahtumasta myöhemmin verkkosivuillamme sekä LAREINA-hankkeen kautta.

Artikkelin kuvat: Jyrki Niemi / Helsingin yliopisto

Linkkejä

<< FIN-CLARIAH Overview

FIN-CLARIAH Meeting 22.11.2024

Place: University of Turku, School of Economics (Rehtoripellonkatu 3, Turku)
Time: 22.11.2024 11:00-16:00

Registration: closed

Workshop theme: “Digital cultural heritage”

The goal of this FIN-CLARIAH day is to reflect on the importance of data and metadata for research. We want to discuss what is data vs. metadata? What are the standards in different fields? How is or how could metadata be extracted or processed? First we will hear the perspective of a research infrastructure project integrating heterogeneous data for the study of human diversity to set the stage for group discussions.

Preliminary schedule

10:30 Welcome coffee

11:00 Welcome words (Krister Lindén & Veronika Laippala)

11:10 Keynote by Virpi Lummaa, Human Diversity Project, University of Turku

12:00 Lunch

13:00 Insight talks: opportunities and challenges of processing metadata (Kimmo Elo, University of Turku; Maria Kallio-Hirvonen, The Finnish National Archives)

13:30 Group discussions, challenges and solutions regarding the specifics of different types of research data. Some questions can start the conversation: what is data vs. metadata? What are the standards in your field? How is or how could metadata be processed? Discussions will be guided by moderators (tbc).

Text data
Audiovisual, speech and audio data
Still images

15:00 Panel discussion (Moderator: Inés Matres)

16:00 Plenary ends

(16:00-17:00 Steering group meeting)

<< FIN-CLARIAH Overview

Workshop: Large Language Models and Speech-Centric AI

[Programme]

Wednesday 09.10.2024 at 8:30-14:00, Clarion Hotel Helsinki

Organizers:

Department of Digital Humanities, University of Helsinki
LAREINA project
Kites ry

Location:

Clarion Hotel Helsinki, Tyynenmerenkatu 2, Helsinki

Welcome to the Workshop!

Our workshop on Large Language Models and Speech-Centric AI will showcase various use cases and applications both in the public and private sector. Our objective is to introduce the current state of language-centric AI in Finland, and share information about the future of access to language data and modules. The demo presentations and industry talks will illustrate the potential use of language-centric AI.

This workshop is addressed to developers, integrators and users of language technologies and AI solutions in Finland. The workshop will be held in English and on-site only.

Registration

Participation is free of charge, but registration is required. We have 50 seats available. Registration has now ended and the event is fully booked.

Large Language Models and Speech-Centric AI

Programme for the Workshop on Wednesday 09.10.2024

08:30 – 09:00	Registration and Coffee
09:00 – 10:30	LLMs and Speech-Interfaces in Private and Public Sector Krister Lindén, University of Helsinki Tomi Paavola, Ministry of Transport and Communications Jörg Tiedemann, University of Helsinki Tommi Lehtonen, KAVI & Mikko Kurimo, Aalto University Markus Koskela, CSC – IT Center for Science
10:30 – 11:30	Demo Presentations and Coffee
11:30 – 13:00	AI and Speech-Interfaces Antti ’Jogi’ Poikola, Teknologiateollisuus ry Iftikhar Ahmad, Tietoevry Manu Setälä, Solita Oy Iikka Hauhio, Kielikone Oy Michael Stormbom, Lingsoft Language Services Oy Peter Smit, Inscripta Oy
13:00 – 14:00	Lunch

Contact the organizers for further details:

lareina-office [ATT] helsinki.fi

Last updated: October 8, 2024

Posters presented at the FIN-CLARIAH Meeting 10.6.2024

<< FIN-CLARIAH event page

To view or to download the PDF version, click on the image.

FIN-CLARIAH Meeting 10.6.2024 11-16

Place: Minerva Plaza, Siltavuorenpenger 5 A, University of Helsinki

The goal of the workshop is to let the whole infrastructure reflect on how SSH research will be affected by AI and how we as an infrastructure should prepare for this. In addition, we need to collect information on what preparations are already taking place in the different locations of the infrastructure.

Register by 24.5.2024

Poster session

See all posters!
Poster template (.pptx)

Program

11.00-12.00 Keynote: Modern generative image modelling, Jaakko Lehtinen (Aalto University)

12.00-13.00 Lunch

13.00- Group discussions. Task: Brainstorm and develop an action plan for how to integrate transformer technology in the research infrastructure development (Tentative questions: AI for diverse data types? What is happening across locations? Making AI accessible and usable? Implications for SSH research, development, education?)

Text data – Cultural heritage (facilitator Liisa Näpärä, NLF)
Text data – Web / societal data (facilitator Veronika Laippala, TurkuNLP)
Audiovisual, speech and audio data (facilitator Mikko Kurimo, Aalto)
Still images (facilitator Ilkka Lähteenmäki, Oulu)

14.10-14.30 Coffee

14.30-15.00 Summaries from the groups

15.00 Poster session & Refreshments

16.00 Closing

<< FIN-CLARIAH Overview

Eurooppalainen kielidata-avaruus -työpaja

Vapauta datan mahdollisuudet yrityksille ja kansalaisille EU:ssa

Keskiviikkona 10.04.024 klo 9:00-15:15, Clarion Hotel Helsingissä

Järjestäjät:
European Language Data Space
Digitaalisten ihmistieteiden osasto, Helsingin yliopisto

Tervetuloa Eurooppalaisen kielidata-avaruuden -työpajaan!

Eurooppalainen kielidata-avaruus (European Language Data Space, LDS) ja Helsingin yliopisto kokoavat yhteen suomalaisen teollisuuden, julkishallinnon ja tutkimuksen asiantuntijoita keskustelemaan kielidatan merkityksestä kieliteknologioiden ja tekoälypohjaisten työkalujen kehittämiselle Suomessa. Tilaisuus järjestetään 10.04.2024 Clarion Hotel Helsingissä.

Vuoden 2023 alusta lähtien Euroopan komissio on ohjannut ja tukenut uutta tapaa jakaa kielidataa Eurooppalaisen kielidata-avaruuden (LDS) kautta. Tämä uusi tapa ulottuu kielidataa laajemmalle, ja se kattaa monia aloja ja toimintaympäristöjä niiden niiden omien data-avaruuksiensa kautta. Yhteiseurooppalaisten data-avaruuksien (Common European Data Spaces) perustamisen myötä tiedotus ja välitys datan eri tiedonkuvaus- ja saatavuusmuotojen välillä on toteutumassa kaikissa Euroopan maissa.

Tätä taustaa vasten Eurooppalaisen kielidata-avaruuden tavoitteena on rakentaa luotettavat ja tehokkaat datamarkkinat kielivarojen jakamiseen julkisella ja yksityisellä sektorilla EU:n datastrategian mukaisesti.

Eurooppalainen kielidata-avaruus (LDS) järjestää sarjan maakohtaisia työpajoja, joiden tarkoituksena on auttaa paikallisia yrityksiä, tutkimusryhmiä ja julkishallintoja ottamaan uuden kielidatanvaihtoavaruuden käyttöönsä ja liittymään relevantteihin paikallisiin ja eurooppalaisiin verkostoihin. Samalla ne voivat hyödyntää jo olemassa olevia luotettavia infrastruktuureja. Eurooppalaisena kielidatan jakamisalustana LDS voi auttaa paikallisia toimijoita kaupallistamaan kielidataansa monikielisessä Euroopassa, jossa kieliteknologioiden ja tekoälypohjaisten sovellusten merkitys jatkuvasti kasvaa.

Suomen LDS-työpaja

Suomen työpajassa käsitellään kotimaisen yksityisen ja julkisen sektorin sidosryhmien tarpeita kielidatan tarjoajina, integroijina ja/tai kuluttajina. Tapahtumassa jaetaan näiden tahojen kokemuksia ja vaatimuksia sekä selvitetään, kuinka voitaisiin päästä toivottuun teknologiseen kasvuun ja parantaa kilpailukykyä sekä kansallisella että Euroopan tasolla. Työpajassa keskustellaan siitä, kuinka LDS voi auttaa suomalaisia toimijoita ja tukea niiden pyrkimyksiä tuottaa, kaupallistaa tai hankkia kielidataa kieliteknologioiden ja tekoälypohjaisten työkalujen käyttövoimaksi Suomessa.

Työpaja on suunnattu datan haltijoille ja tarjoajille, kieliteknologioiden kehittäjille ja integraattoreille, pk-yrityksille sekä julkisen hallinnon edustajille, viranomaisille ja yhteistyökumppaneille. Työpaja on englanninkielinen.