Organizers:
Department of Digital Humanities, University of Helsinki
LAREINA project
Kites ry
Location:
Clarion Hotel Helsinki, Tyynenmerenkatu 2, Helsinki
The development of language-centric AI during the past few years has been remarkable. It poses challenges but also creates opportunities for organizations both in the private and the public sector. Many of us are curious about how to harness the power of AI in our own business.
Our workshop on Large Language Models and Speech-Centric AI will showcase various use cases and applications both in the public and private sector. Our objective is to introduce the current state of language-centric AI in Finland, and share information about the future of access to language data and modules. The demo presentations and industry talks will illustrate the potential use of language-centric AI.
This workshop is addressed to developers, integrators and users of language technologies and AI solutions in Finland. The workshop will be held in English and on-site only.
Participation is free of charge, but registration is required. We have 50 seats available. Registration has now ended and the event is fully booked. We have a waiting list for possible reopened seats.
08:30 – 09:00 |
Registration and Coffee |
09:00 – 10:30 |
LLMs and Speech-Interfaces in Private and Public Sector |
10:30 – 11:30 |
Demo Presentations and Coffee |
11:30 – 13:00 |
AI and Speech-Interfaces |
13:00 – 14:00 |
Lunch |
lareina-office [ATT] helsinki.fi
Last updated: October 8, 2024
To view or to download the PDF version, click on the image.
The goal of the workshop is to let the whole infrastructure reflect on how SSH research will be affected by AI and how we as an infrastructure should prepare for this. In addition, we need to collect information on what preparations are already taking place in the different locations of the infrastructure.
The goal of the workshop is to let the whole infrastructure reflect on how SSH research will be affected by AI and how we as an infrastructure should prepare for this. In addition, we need to collect information on what preparations are already taking place in the different locations of the infrastructure.
11.00-12.00 Keynote: Modern generative image modelling, Jaakko Lehtinen (Aalto University)
12.00-13.00 Lunch
13.00- Group discussions. Task: Brainstorm and develop an action plan for how to integrate transformer technology in the research infrastructure development (Tentative questions: AI for diverse data types? What is happening across locations? Making AI accessible and usable? Implications for SSH research, development, education?)
14.10-14.30 Coffee
14.30-15.00 Summaries from the groups
15.00 Poster session & Refreshments
16.00 Closing
Järjestäjät:
European Language Data Space
Digitaalisten ihmistieteiden osasto, Helsingin yliopisto
Eurooppalainen kielidata-avaruus (European Language Data Space, LDS) ja Helsingin yliopisto kokoavat yhteen suomalaisen teollisuuden, julkishallinnon ja tutkimuksen asiantuntijoita keskustelemaan kielidatan merkityksestä kieliteknologioiden ja tekoälypohjaisten työkalujen kehittämiselle Suomessa. Tilaisuus järjestetään 10.04.2024 Clarion Hotel Helsingissä.
Vuoden 2023 alusta lähtien Euroopan komissio on ohjannut ja tukenut uutta tapaa jakaa kielidataa Eurooppalaisen kielidata-avaruuden (LDS) kautta. Tämä uusi tapa ulottuu kielidataa laajemmalle, ja se kattaa monia aloja ja toimintaympäristöjä niiden niiden omien data-avaruuksiensa kautta. Yhteiseurooppalaisten data-avaruuksien (Common European Data Spaces) perustamisen myötä tiedotus ja välitys datan eri tiedonkuvaus- ja saatavuusmuotojen välillä on toteutumassa kaikissa Euroopan maissa.
Tätä taustaa vasten Eurooppalaisen kielidata-avaruuden tavoitteena on rakentaa luotettavat ja tehokkaat datamarkkinat kielivarojen jakamiseen julkisella ja yksityisellä sektorilla EU:n datastrategian mukaisesti.
Eurooppalainen kielidata-avaruus (LDS) järjestää sarjan maakohtaisia työpajoja, joiden tarkoituksena on auttaa paikallisia yrityksiä, tutkimusryhmiä ja julkishallintoja ottamaan uuden kielidatanvaihtoavaruuden käyttöönsä ja liittymään relevantteihin paikallisiin ja eurooppalaisiin verkostoihin. Samalla ne voivat hyödyntää jo olemassa olevia luotettavia infrastruktuureja. Eurooppalaisena kielidatan jakamisalustana LDS voi auttaa paikallisia toimijoita kaupallistamaan kielidataansa monikielisessä Euroopassa, jossa kieliteknologioiden ja tekoälypohjaisten sovellusten merkitys jatkuvasti kasvaa.
Suomen työpajassa käsitellään kotimaisen yksityisen ja julkisen sektorin sidosryhmien tarpeita kielidatan tarjoajina, integroijina ja/tai kuluttajina. Tapahtumassa jaetaan näiden tahojen kokemuksia ja vaatimuksia sekä selvitetään, kuinka voitaisiin päästä toivottuun teknologiseen kasvuun ja parantaa kilpailukykyä sekä kansallisella että Euroopan tasolla. Työpajassa keskustellaan siitä, kuinka LDS voi auttaa suomalaisia toimijoita ja tukea niiden pyrkimyksiä tuottaa, kaupallistaa tai hankkia kielidataa kieliteknologioiden ja tekoälypohjaisten työkalujen käyttövoimaksi Suomessa.
Työpaja on suunnattu datan haltijoille ja tarjoajille, kieliteknologioiden kehittäjille ja integraattoreille, pk-yrityksille sekä julkisen hallinnon edustajille, viranomaisille ja yhteistyökumppaneille. Työpaja on englanninkielinen.
Osallistuminen on maksutonta, mutta tilaisuuteen on ilmoittauduttava etukäteen. Ilmoittautuminen on päättynyt 03.04.2024. Ota yhteys järjestäjiin ja tarkista, onko tilaisuuteen vielä paikkoja jäljellä: lareina-office [ATT] helsinki.fi
09:00 – 09:45 |
Ilmoittautuminen |
09:55 – 10:05 |
Welcome and introduction |
10:05 – 10:35 |
Welcome by the European Commission: The Digital Europe Programme and the Common European Language Data Space |
10:35 – 11:05 |
The importance of language data for the development of LT solutions future steps |
11:05 – 11:30 |
Kahvitauko |
11:30 – 11:40 |
Welcome by the Ministry of Finance |
11:40 – 12:30 |
Language Data and Language Technologies in Finland and for Finnish |
12:30 – 13:00 |
European Language Data Space: developing a market for language data and services and benefitting from a joint European effort |
13:00 – 13:50 |
Lounas |
13:50 – 14:50 |
Language data production, management, and market development: overcoming obstacles – Panel session |
14:50 – 15:05 |
Conclusions |
15:05 – 15:15 |
Kahvitauko ja verkostoituminen |
15:15 – 16:15 |
Kahvitauko ja verkostoituminen jatkuvat Sitran järjestämässä Nordic Data Festival 2024 -tapahtumassa (rinnakkaistapahtumana Clarion Hotel Helsingissä) |
Krister Lindén and Wilhelmina Dyster
Helsingin yliopisto
lareina-office [ATT] helsinki.fi
Viimeksi päivitetty: 05.04.2024
This is what everyone should know about our computing environment when launching jobs!
Are you planning on using CSC’s high-performance computing (HPC) services (Puhti, Mahti, Allas…) in the near future? Have you been using these services already, but would like to make sure you are getting the most out of them? This intensive course is intended for you!
More info and registration at: https://ssl.eventilla.com/part1april24
How to handle large datasets, install own software and scale up workflows efficiently in CSC’s computing environment
Are you using CSC’s high-performance computing (HPC) services (Puhti, Mahti, Allas…), but want to make sure you are getting the most out of them? Are you working with data in the most efficient way? Want to know the best tips and tricks of the trade when scaling up your workflows? This intensive course is intended for you!
More info and registration at: https://ssl.eventilla.com/part2may24
These 2+2 half-day sessions focus on using the CSC HPC environment via short lectures and hands on tutorials. Please check the required course prerequisites.
Please note, that the same course is also available as a free self-learning online course at https://ssl.eventilla.com/csccompenvselflearn.
Organisation:
European Language Data Space
Department of Digital Humanities, University of Helsinki
The European Language Data Space and the University of Helsinki are bringing together experts from the Finnish Industry, Public Administration and Research to discuss the importance of language data for the development of Language Technologies and AI-based tools in Finland. The event is taking place on 10.04.2024 at Clarion Hotel Helsinki.
Since early 2023, the European Commission is providing guidance and support towards a new dimension in language data sharing that is executed through the European Language Data Space (LDS). This new dimension goes beyond language data and addresses many areas and fields through their specific Data Spaces. With the establishment of the Common European Data Spaces, the communication and exchange amongst different modalities of data description and availability is becoming a reality for all European countries.
In this context, the European Language Data Space aims at building a trustworthy and effective data market for the exchange of language resources in the public and – even more importantly – in the private sector, in line with the EU Data Strategy.
For that purpose, the European Language Data Space (LDS) is going to organise a series of Country Workshops to support local industries, research groups and public administrations to integrate this new language data exchange space and connect with relevant local and European networks, while benefiting from the trustworthy infrastructures already available. As European language data sharing platform, the LDS can help local industry stakeholders to monetise their language data in a multilingual Europe where Language Technologies and AI-based applications play an increasingly important role.
The Finnish LDS workshop will address the needs of the Finnish stakeholders from both private and public sectors, be it providers, integrators and/or consumers of language data, while sharing their experiences and requirements and exploring how to meet the desired technological growth to enhance their competitiveness at both national and European levels. The LDS will present and discuss how it can help the Finnish stakeholders and support their efforts to produce/monetise/obtain language data to power LT and AI-based tools in Finland.
The workshop is addressed to data owners and data providers, LT developers and integrators and SMEs, as well as to public administration executives, officers and partners. The workshop will be held in English.
Participation is free of charge, but registration is required. Registration was closed on 03.04.2024. Please contact the organisers and check if there still are seats available: lareina-office [ATT] helsinki.fi
09:00 – 09:45 |
Registration |
09:55 – 10:05 |
Welcome and introduction |
10:05 – 10:35 |
Welcome by the European Commission: The Digital Europe Programme and the Common European Language Data Space |
10:35 – 11:05 |
The importance of language data for the development of LT solutions future steps |
11:05 – 11:30 |
Coffee Break |
11:30 – 11:40 |
Welcome by the Ministry of Finance |
11:40 – 12:30 |
Language Data and Language Technologies in Finland and for Finnish |
12:30 – 13:00 |
European Language Data Space: developing a market for language data and services and benefitting from a joint European effort |
13:00 – 13:50 |
Lunch Break |
13:50 – 14:50 |
Language data production, management, and market development: overcoming obstacles – Panel session |
14:50 – 15:05 |
Conclusions |
15:05 – 15:15 |
Coffee Break and Networking |
15:15 – 16:15 |
Coffee Break and Networking continue in Sitra’s Nordic Data Festival 2024 event (co-located in Clarion Hotel Helsinki) |
Krister Lindén and Wilhelmina Dyster
University of Helsinki
lareina-office [ATT] helsinki.fi
Last updated: April 5, 2024
The 2024 ParlaCLARIN Workshop will be held in May in Torino (Italy), as part of the LREC-COLING 2024 – The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation.
The Call for Papers is now open and the paper submission deadline is 19 February 2024.
Read more: https://www.clarin.eu/ParlaCLARIN-IV
11.00-11.10 Welcoming Words by Sanna Kumpulainen, Associate Professor in Information Studies, Tampere University
11.10-12.00 Keynote I on Studying SSH Research Needs: Elina Late, Senior Research Fellow in Information Studies, Tampere University
12.00-13.00 Lunch
13.00-13.45 Keynote II on Language Models: Sampo Pyysalo, Associate Professor at the Department of Computing, University of Turku
>>Download the slides of the work package presentations<< (pdf)
13.45-14.30 Work Package Presentations I
13.45-14.00 WP1.3 Veronika Laippala: Noise-Tolerant NLP
14.00-14.20 WP1.1, 1.2, 2.1, 2.2. & 2.3 Mietta Lennes: Kielipankki – The Language Bank of Finland
14.20-14.25 WP2.4 Harri Kettunen: Helsinki Term Bank for the Arts and Sciences
14.25-14.30 WP2.5 Jenny Tarvainen: Automated Text Tools for Learner Language
14.30-15.00 Coffee
15.00-16.00 Work Package Presentations II
15.00-15.05 WP3.1 Martin Matthiesen: Pipeline from the National Library to CSC
15.05-15.10 WP3.2 Tanja Välisalo: Named Entity Recognition for NARC Data
15.10-15.15 WP4.3 Eetu Mäkelä: Evaluation and Subsetting
15.15-15.20 W4.1 Julia Matveeva: Metadata Harmonization
15.20-15.25 WP4.4 Mikko Laitinen: Twitter
15.25-15.30 WP4.2 Eero Hyvönen: LOD
15.30-15.35 WP3.4 Raine Koskimaa: Game Streams
15.35-15.40 WP3.3 Maria Valaste: Qualitative Surveys
15.40-15.45 WP3.5 Kimmo Elo (Risto Turunen replacing): Text Networks
15.45-15.50 WP5 Sanna Kumpulainen: Evidence-based RI Development + Education & Resources
16.00-17.00 Free Chilling & Refreshments / Parallel session: Executive Board Meeting (with Zoom option)
The online course Corpus Linguistics and Statistical Methods is intended for students in languages or other fields who wish to learn the basics of using corpora.
The course is offered in Finnish and in English and it is open to all university students in and outside Finland. The course has already started on Monday 4th September, but it will be possible to join the course area until 15th September.
Read more and register for the course!
NB: The same course will be organized again in period 3, starting on 15 Jan 2024 (see the course page).
The 24th INTERSPEECH Conference was held on 20-24 August 2023 in Dublin, Ireland. At the conference, Mietta Lennes from the Language Bank of Finland presented a poster, based on the following conference article:
Lennes, M., Toivola, M. (2023). Pitch distributions in a very large corpus of spontaneous Finnish speech. Proceedings of INTERSPEECH 2023, 4778-4782, doi: 10.21437/Interspeech.2023-1822.
This page has a persistent identifier: http://urn.fi/urn:nbn:fi:lb-2023081621
Poster and supplementary materials presented at Interspeech 2023, 20.-24.8.2023, Dublin, Ireland.
Last updated: 2023-08-25
This page contains a picture of the poster presented at the conference and some additional figures and details about the piece of research in question. For further information, please contact Mietta Lennes.
(Click to download the image as a pdf document)
The pitch data calculated for this paper will be published as an online dataset. The link to the data will be added on this page.
Mietta Lennes & Minnaleena Toivola (2023). Pitch distributions in a very large corpus of spontaneous Finnish speech. Poster and supplementary materials. Interspeech 2023, 20.-24.8.2023, Dublin, Ireland. Available: http://urn.fi/urn:nbn:fi:lb-2023081621.
Lennes, M., Toivola, M. (2023). Pitch distributions in a very large corpus of spontaneous Finnish speech. Proc. INTERSPEECH 2023, 4778-4782, doi: 10.21437/Interspeech.2023-1822
11.00-11.10 Welcoming Words by Katri Tegel, Development Manager, CSC
11.10-12.00 Keynote: Mikko Kurimo, Professor of Speech and Language Processing, Aalto University
12.00-13.00 Lunch
13.00-15.00 Thematic Groups
15.00-15.30 Coffee
15.30-16.15 Sharing the Results from the Groups
16.15-17.00 Free Chilling & Refreshments / Parallel session: Executive Board Meeting (with Zoom option)
11.00-12.00 Jari Ojala: Welcoming words + Pasi Tyrväinen: Keynote
12.00-13.00 Lunch
13.00-13.15 Anna Sendra Toset: Results from FIN-CLARIAH interviews
13.15-14.30 Teamwork in thematic groups:
14.30-15.00 Coffee
15.00-16.00
16.00-17.00 Socializing & refreshments (Executive board meeting)
The open online course Data Clinic kicks off on 11th November 2022 and ends in late April 2023. During the winter and spring, you learn to write a Data Management Plan and get practical advice and support for collecting, processing and managing your research data. The participants will be working partly independently and partly in small groups of peers. You may attend the entire course remotely.
The course materials will be provided mainly in English. Students from all universities and all fields are welcome if space allows. The only prerequisite is that you are already starting a research project where you need to process and manage a data set that contains text documents or speech recordings, i.e., some language data.
Read more and join the course by 28.11.2022!
In this online course, you get a grip of special tools that are available for transcribing and studying speech samples. You also learn about collecting and managing a speech corpus of your own. During the course, you will actively use the Praat program and get familiar with ELAN, too.
The course is open to students in all universities and you can take it either in Finnish or in English. The number of participants may be restricted if required. The course will be taught by Mietta Lennes and Juraj Šimko at the University of Helsinki.
The course has already begun, but you may still enrol and join in until 11th November.
Further information and link to the course on Moodle
To view or download the PDF version, click on the image.
The kick-off get-together of the FIN-CLARIAH infrastructure project is held in the premises of the National Library on 3.6.2022. You can see the posters online on the event page.
The online course Corpus Linguistics and Statistical Methods (Korpuslingvistiikka ja tilastolliset menetelmät, 5 credits) will be offered again during 17.1.–6.3.2022. This course can be taken either in Finnish or in English.
The total number of participants will be restricted, but it will be possible to participate the course from outside the University of Helsinki and even from outside Finland. If you are a student from outside the University of Helsinki, please find further details and the link for joining the Moodle area on the course home page (see below). Students from the University of Helsinki should first register via Sisu.
Registration for the course is open until 28.1.2022 (unless the maximum number of participants is exceeded before then).
Find more courses and training by Kielipankki
The Donate Speech campaign, where the Language Bank of Finland has been involved, was awarded with PRIX EUROPA: Best European Digital Audio Project of the Year 2021 (see https://www.prixeuropa.eu/news/2021/10/15winners-y4emh). The award ceremony took place in Potsdam, Germany on 15th October, 2021.
Earlier this year, Donate Speech also won the national Grand One award for Best Mobile Service of the Year, including a distinction for Best Use of Data.
Donate Speech is a joint project of Yle – the Finnish Broadcasting Company, Vake Oy (current Ilmastorahasto), Solita, Aalto University and the University of Helsinki.
If you speak and understand Finnish, you can donate your speech here!
Last modified on 2021-11-27