Järjestäjät:
European Language Data Space
Digitaalisten ihmistieteiden osasto, Helsingin yliopisto
Eurooppalainen kielidata-avaruus (European Language Data Space, LDS) ja Helsingin yliopisto kokoavat yhteen suomalaisen teollisuuden, julkishallinnon ja tutkimuksen asiantuntijoita keskustelemaan kielidatan merkityksestä kieliteknologioiden ja tekoälypohjaisten työkalujen kehittämiselle Suomessa. Tilaisuus järjestetään 10.04.2024 Clarion Hotel Helsingissä.
Vuoden 2023 alusta lähtien Euroopan komissio on ohjannut ja tukenut uutta tapaa jakaa kielidataa Eurooppalaisen kielidata-avaruuden (LDS) kautta. Tämä uusi tapa ulottuu kielidataa laajemmalle, ja se kattaa monia aloja ja toimintaympäristöjä niiden niiden omien data-avaruuksiensa kautta. Yhteiseurooppalaisten data-avaruuksien (Common European Data Spaces) perustamisen myötä tiedotus ja välitys datan eri tiedonkuvaus- ja saatavuusmuotojen välillä on toteutumassa kaikissa Euroopan maissa.
Tätä taustaa vasten Eurooppalaisen kielidata-avaruuden tavoitteena on rakentaa luotettavat ja tehokkaat datamarkkinat kielivarojen jakamiseen julkisella ja yksityisellä sektorilla EU:n datastrategian mukaisesti.
Eurooppalainen kielidata-avaruus (LDS) järjestää sarjan maakohtaisia työpajoja, joiden tarkoituksena on auttaa paikallisia yrityksiä, tutkimusryhmiä ja julkishallintoja ottamaan uuden kielidatanvaihtoavaruuden käyttöönsä ja liittymään relevantteihin paikallisiin ja eurooppalaisiin verkostoihin. Samalla ne voivat hyödyntää jo olemassa olevia luotettavia infrastruktuureja. Eurooppalaisena kielidatan jakamisalustana LDS voi auttaa paikallisia toimijoita kaupallistamaan kielidataansa monikielisessä Euroopassa, jossa kieliteknologioiden ja tekoälypohjaisten sovellusten merkitys jatkuvasti kasvaa.
Suomen työpajassa käsitellään kotimaisen yksityisen ja julkisen sektorin sidosryhmien tarpeita kielidatan tarjoajina, integroijina ja/tai kuluttajina. Tapahtumassa jaetaan näiden tahojen kokemuksia ja vaatimuksia sekä selvitetään, kuinka voitaisiin päästä toivottuun teknologiseen kasvuun ja parantaa kilpailukykyä sekä kansallisella että Euroopan tasolla. Työpajassa keskustellaan siitä, kuinka LDS voi auttaa suomalaisia toimijoita ja tukea niiden pyrkimyksiä tuottaa, kaupallistaa tai hankkia kielidataa kieliteknologioiden ja tekoälypohjaisten työkalujen käyttövoimaksi Suomessa.
Työpaja on suunnattu datan haltijoille ja tarjoajille, kieliteknologioiden kehittäjille ja integraattoreille, pk-yrityksille sekä julkisen hallinnon edustajille, viranomaisille ja yhteistyökumppaneille. Työpaja on englanninkielinen.
Osallistuminen on maksutonta, mutta tilaisuuteen on ilmoittauduttava etukäteen. Ilmoittaudu täällä: https://ec.europa.eu/eusurvey/runner/LDS_WS1-FI
09:00 – 09:45 |
Ilmoittautuminen |
09:55 – 10:05 |
Welcome and introduction |
10:05 – 10:35 |
Welcome by the European Commission: The Digital Europe Programme and the Common European Language Data Space |
10:35 – 11:05 |
The importance of language data for the development of LT solutions future steps |
11:05 – 11:30 |
Kahvitauko |
11:30 – 11:40 |
Welcome by the Ministry of Finance |
11:40 – 12:30 |
Language Data and Language Technologies in Finland and for Finnish |
12:30 – 13:00 |
European Language Data Space: developing a market for language data and services and benefitting from a joint European effort |
13:00 – 13:50 |
Lounas |
13:50 – 14:50 |
Language data production, management, and market development: overcoming obstacles – Panel session
|
14:50 – 15:05 |
Conclusions |
15:05 – 15:15 |
Kahvitauko ja verkostoituminen |
15:15 – 16:15 |
Kahvitauko ja verkostoituminen jatkuvat Sitran järjestämässä Nordic Data Festival 2024 -tapahtumassa (rinnakkaistapahtumana Clarion Hotel Helsingissä) |
Krister Lindén and Wilhelmina Dyster
Helsingin yliopisto
lareina-office [ATT] helsinki.fi
Viimeksi päivitetty: 28.3.2024
This is what everyone should know about our computing environment when launching jobs!
Are you planning on using CSC’s high-performance computing (HPC) services (Puhti, Mahti, Allas…) in the near future? Have you been using these services already, but would like to make sure you are getting the most out of them? This intensive course is intended for you!
More info and registration at: https://ssl.eventilla.com/part1april24
How to handle large datasets, install own software and scale up workflows efficiently in CSC’s computing environment
Are you using CSC’s high-performance computing (HPC) services (Puhti, Mahti, Allas…), but want to make sure you are getting the most out of them? Are you working with data in the most efficient way? Want to know the best tips and tricks of the trade when scaling up your workflows? This intensive course is intended for you!
More info and registration at: https://ssl.eventilla.com/part2may24
These 2+2 half-day sessions focus on using the CSC HPC environment via short lectures and hands on tutorials. Please check the required course prerequisites.
Please note, that the same course is also available as a free self-learning online course at https://ssl.eventilla.com/csccompenvselflearn.
Organisation:
European Language Data Space
Department of Digital Humanities, University of Helsinki
The European Language Data Space and the University of Helsinki are bringing together experts from the Finnish Industry, Public Administration and Research to discuss the importance of language data for the development of Language Technologies and AI-based tools in Finland. The event is taking place on 10.04.2024 at Clarion Hotel Helsinki.
Since early 2023, the European Commission is providing guidance and support towards a new dimension in language data sharing that is executed through the European Language Data Space (LDS). This new dimension goes beyond language data and addresses many areas and fields through their specific Data Spaces. With the establishment of the Common European Data Spaces, the communication and exchange amongst different modalities of data description and availability is becoming a reality for all European countries.
In this context, the European Language Data Space aims at building a trustworthy and effective data market for the exchange of language resources in the public and – even more importantly – in the private sector, in line with the EU Data Strategy.
For that purpose, the European Language Data Space (LDS) is going to organise a series of Country Workshops to support local industries, research groups and public administrations to integrate this new language data exchange space and connect with relevant local and European networks, while benefiting from the trustworthy infrastructures already available. As European language data sharing platform, the LDS can help local industry stakeholders to monetise their language data in a multilingual Europe where Language Technologies and AI-based applications play an increasingly important role.
The Finnish LDS workshop will address the needs of the Finnish stakeholders from both private and public sectors, be it providers, integrators and/or consumers of language data, while sharing their experiences and requirements and exploring how to meet the desired technological growth to enhance their competitiveness at both national and European levels. The LDS will present and discuss how it can help the Finnish stakeholders and support their efforts to produce/monetise/obtain language data to power LT and AI-based tools in Finland.
The workshop is addressed to data owners and data providers, LT developers and integrators and SMEs, as well as to public administration executives, officers and partners. The workshop will be held in English.
Participation is free of charge, but registration is required. Please register here: https://ec.europa.eu/eusurvey/runner/LDS_WS1-FI
09:00 – 09:45 |
Registration |
09:55 – 10:05 |
Welcome and introduction |
10:05 – 10:35 |
Welcome by the European Commission: The Digital Europe Programme and the Common European Language Data Space |
10:35 – 11:05 |
The importance of language data for the development of LT solutions future steps |
11:05 – 11:30 |
Coffee Break |
11:30 – 11:40 |
Welcome by the Ministry of Finance |
11:40 – 12:30 |
Language Data and Language Technologies in Finland and for Finnish |
12:30 – 13:00 |
European Language Data Space: developing a market for language data and services and benefitting from a joint European effort |
13:00 – 13:50 |
Lunch Break |
13:50 – 14:50 |
Language data production, management, and market development: overcoming obstacles – Panel session
|
14:50 – 15:05 |
Conclusions |
15:05 – 15:15 |
Coffee Break and Networking |
15:15 – 16:15 |
Coffee Break and Networking continue in Sitra’s Nordic Data Festival 2024 event (co-located in Clarion Hotel Helsinki) |
Krister Lindén and Wilhelmina Dyster
University of Helsinki
lareina-office [ATT] helsinki.fi
Last updated: March 28, 2024
11.00-11.10 Welcoming Words by Sanna Kumpulainen, Associate Professor in Information Studies, Tampere University
11.10-12.00 Keynote I on Studying SSH Research Needs: Elina Late, Senior Research Fellow in Information Studies, Tampere University
12.00-13.00 Lunch
13.00-13.45 Keynote II on Language Models: Sampo Pyysalo, Associate Professor at the Department of Computing, University of Turku
>>Download the slides of the work package presentations<< (pdf)
13.45-14.30 Work Package Presentations I
13.45-14.00 WP1.3 Veronika Laippala: Noise-Tolerant NLP
14.00-14.20 WP1.1, 1.2, 2.1, 2.2. & 2.3 Mietta Lennes: Kielipankki – The Language Bank of Finland
14.20-14.25 WP2.4 Harri Kettunen: Helsinki Term Bank for the Arts and Sciences
14.25-14.30 WP2.5 Jenny Tarvainen: Automated Text Tools for Learner Language
14.30-15.00 Coffee
15.00-16.00 Work Package Presentations II
15.00-15.05 WP3.1 Martin Matthiesen: Pipeline from the National Library to CSC
15.05-15.10 WP3.2 Tanja Välisalo: Named Entity Recognition for NARC Data
15.10-15.15 WP4.3 Eetu Mäkelä: Evaluation and Subsetting
15.15-15.20 W4.1 Julia Matveeva: Metadata Harmonization
15.20-15.25 WP4.4 Mikko Laitinen: Twitter
15.25-15.30 WP4.2 Eero Hyvönen: LOD
15.30-15.35 WP3.4 Raine Koskimaa: Game Streams
15.35-15.40 WP3.3 Maria Valaste: Qualitative Surveys
15.40-15.45 WP3.5 Kimmo Elo (Risto Turunen replacing): Text Networks
15.45-15.50 WP5 Sanna Kumpulainen: Evidence-based RI Development + Education & Resources
16.00-17.00 Free Chilling & Refreshments / Parallel session: Executive Board Meeting (with Zoom option)
The online course Corpus Linguistics and Statistical Methods is intended for students in languages or other fields who wish to learn the basics of using corpora.
The course is offered in Finnish and in English and it is open to all university students in and outside Finland. The course has already started on Monday 4th September, but it will be possible to join the course area until 15th September.
Read more and register for the course!
NB: The same course will be organized again in period 3, starting on 15 Jan 2024 (see the course page).
The 24th INTERSPEECH Conference was held on 20-24 August 2023 in Dublin, Ireland. At the conference, Mietta Lennes from the Language Bank of Finland presented a poster, based on the following conference article:
Lennes, M., Toivola, M. (2023). Pitch distributions in a very large corpus of spontaneous Finnish speech. Proceedings of INTERSPEECH 2023, 4778-4782, doi: 10.21437/Interspeech.2023-1822.
This page has a persistent identifier: http://urn.fi/urn:nbn:fi:lb-2023081621
Poster and supplementary materials presented at Interspeech 2023, 20.-24.8.2023, Dublin, Ireland.
Last updated: 2023-08-25
This page contains a picture of the poster presented at the conference and some additional figures and details about the piece of research in question. For further information, please contact Mietta Lennes.
(Click to download the image as a pdf document)
The pitch data calculated for this paper will be published as an online dataset. The link to the data will be added on this page.
Mietta Lennes & Minnaleena Toivola (2023). Pitch distributions in a very large corpus of spontaneous Finnish speech. Poster and supplementary materials. Interspeech 2023, 20.-24.8.2023, Dublin, Ireland. Available: http://urn.fi/urn:nbn:fi:lb-2023081621.
Lennes, M., Toivola, M. (2023). Pitch distributions in a very large corpus of spontaneous Finnish speech. Proc. INTERSPEECH 2023, 4778-4782, doi: 10.21437/Interspeech.2023-1822
11.00-11.10 Welcoming Words by Katri Tegel, Development Manager, CSC
11.10-12.00 Keynote: Mikko Kurimo, Professor of Speech and Language Processing, Aalto University
12.00-13.00 Lunch
13.00-15.00 Thematic Groups
15.00-15.30 Coffee
15.30-16.15 Sharing the Results from the Groups
16.15-17.00 Free Chilling & Refreshments / Parallel session: Executive Board Meeting (with Zoom option)
11.00-12.00 Jari Ojala: Welcoming words + Pasi Tyrväinen: Keynote
12.00-13.00 Lunch
13.00-13.15 Anna Sendra Toset: Results from FIN-CLARIAH interviews
13.15-14.30 Teamwork in thematic groups:
14.30-15.00 Coffee
15.00-16.00
16.00-17.00 Socializing & refreshments (Executive board meeting)
The open online course Data Clinic kicks off on 11th November 2022 and ends in late April 2023. During the winter and spring, you learn to write a Data Management Plan and get practical advice and support for collecting, processing and managing your research data. The participants will be working partly independently and partly in small groups of peers. You may attend the entire course remotely.
The course materials will be provided mainly in English. Students from all universities and all fields are welcome if space allows. The only prerequisite is that you are already starting a research project where you need to process and manage a data set that contains text documents or speech recordings, i.e., some language data.
Read more and join the course by 28.11.2022!
In this online course, you get a grip of special tools that are available for transcribing and studying speech samples. You also learn about collecting and managing a speech corpus of your own. During the course, you will actively use the Praat program and get familiar with ELAN, too.
The course is open to students in all universities and you can take it either in Finnish or in English. The number of participants may be restricted if required. The course will be taught by Mietta Lennes and Juraj Šimko at the University of Helsinki.
The course has already begun, but you may still enrol and join in until 11th November.
Further information and link to the course on Moodle
To view or download the PDF version, click on the image.
The kick-off get-together of the FIN-CLARIAH infrastructure project is held in the premises of the National Library on 3.6.2022. You can see the posters online on the event page.
The online course Corpus Linguistics and Statistical Methods (Korpuslingvistiikka ja tilastolliset menetelmät, 5 credits) will be offered again during 17.1.–6.3.2022. This course can be taken either in Finnish or in English.
The total number of participants will be restricted, but it will be possible to participate the course from outside the University of Helsinki and even from outside Finland. If you are a student from outside the University of Helsinki, please find further details and the link for joining the Moodle area on the course home page (see below). Students from the University of Helsinki should first register via Sisu.
Registration for the course is open until 28.1.2022 (unless the maximum number of participants is exceeded before then).
Find more courses and training by Kielipankki
The Donate Speech campaign, where the Language Bank of Finland has been involved, was awarded with PRIX EUROPA: Best European Digital Audio Project of the Year 2021 (see https://www.prixeuropa.eu/news/2021/10/15winners-y4emh). The award ceremony took place in Potsdam, Germany on 15th October, 2021.
Earlier this year, Donate Speech also won the national Grand One award for Best Mobile Service of the Year, including a distinction for Best Use of Data.
Donate Speech is a joint project of Yle – the Finnish Broadcasting Company, Vake Oy (current Ilmastorahasto), Solita, Aalto University and the University of Helsinki.
If you speak and understand Finnish, you can donate your speech here!
On 29th October 2021, the Language Bank of Finland and the Donate Speech campaign (Lahjoita puhetta) were awarded by the University of Helsinki in recognition of exceptional work in promoting the accessibility and reusability of research data. In addition to the Language Bank, the award was given to Research Coordinator Kati Lassila-Perini.
In the award ceremony, Research Director Krister Lindén gave a presentation that is now available on YouTube with English subtitles. Read more about the award on the website of the University of Helsinki.
In this online course, you get a grip of special tools that are available for transcribing and studying speech samples. You also learn about collecting and managing a speech corpus of your own. During the course, you will actively use the Praat program and get familiar with ELAN, too.
The course is open to students in all universities and you can take it either in Finnish or in English. The number of participants may be restricted if required. The course will be taught by Mietta Lennes and Juraj Šimko at the University of Helsinki.
Join the course by 12th November!
Further information and link to the course on Moodle
The online course Natural Language Processing for Linguists will be taught by Tuomo Hiippala at the University of Helsinki during 15.3.–10.5.2021.
The course is also open to students from universities outside Helsinki, if space allows. Registration is open until 16th March.
Note also that all the course materials will be available online and you can use them even if you cannot make it to the course this time!
The next Kielipankki Live event will be held on Monday 14th December starting at 13:00 via Zoom. The event will be in English, but questions are welcome in Finnish as well! The main themes are speech corpora and personal data practices. Join us for the interviews and presentations of special guests and for good discussions! Register preferably by 11the December.
Program and registration details
The European Language Grid (ELG) aims to provide a digital marketplace where European companies, organizations and citizens can both offer and efficiently use language technologies, data sets and services. The ELG workshop presents an overview of the ELG platform and the ELG pilot projects. Welcome to see what ELG has to offer for you!
The workshop is a free online event, but registration is required. Please register via the ELRC website by 10th December. NB: In case you wish to participate in the ELG tutorial session that may be arranged after the workshop, please indicate this in the field for additional information on the registration form. Thanks!
Note that the third ELRC Workshop in Finland will also take place online, in the same virtual room, on the same day at 9.30-12.40. Welcome to participate in both events!
14:00 | Welcome and introduction |
14:05 | ELG Overview Katrin Marheinecke |
14:30 | ELG online demo Nils Feldhus |
14:50 | Presentations of Finnish Pilot Projects funded in ELG: PARA4DLM (University of Turku), LSDISCO (Lingsoft); OPUS-MT (University of Helsinki) |
15:20 | Expectations/requirements of Finnish Language Technology providers Marko Turpeinen, 1001Lakes |
15:40 | Summary and discussion |
16:00 | End of workshop |
16:15 | Tutorial: How to integrate a service into ELG This tutorial may be organized according to requests from the participants. Please indicate your interest in the registration form. |
Last updated: December 7, 2020