Posters presented at the FIN-CLARIAH Meeting 10.6.2024

<< FIN-CLARIAH event page

To view or to download the PDF version, click on the image.

Overview of CSC services used in various Work Packages

Image of the poster Overview of CSC services used in various Work Packages

Top of page

W1.1 Text processing and annotation environments

Image of the poster W1.1 Text processing and annotation environments
Top of page

W1.2 Speech processing and annotation

Image of the poster W1.2 Speech processing and annotation
Top of page

W1.3 Video processing and annotation

Image of the poster W1.3 Video processing and annotation
Top of page

W2.1 Personal and Copyrighted Research Data

Image of the poster W2.1 Personal and Copyrighted Research Data
Top of page

W2.2 Training environments

Image of the poster W2.2 Training environments
Top of page

W2.3 Translation and Interpretation

Image of the poster W2.3 Translation and Interpretation
Top of page

W2.4 Terminology

Image of the poster W2.4 Terminology
Top of page

W3.1 Data Management

Image of the poster W3.1 Data Management
Top of page

W3.2 Data Ingestion

Image of the poster W3.2.1 Data ingestion through Finna

Image of the poster Sampo Systems Infrastructure Data Services and Portals
Top of page

W3.3 Enrichment

Image of the poster W3.3.1 Enhancing the usability of archival data

Image of the poster W3.3.2 Supporting the research use of large-scale cultural heritage metadata

Image of the poster W3.3.3 a. Interaction in web content – A case study

Image of the poster W3.3.3 b. Multimodal Analysis Tools for Understanding Livestreams

Image of the poster W3.3.4 Tools for visual analysis

Top of page

W4.1 Analytical support for computational SSH

Image of the poster W4.1 Analytical support for computational SSH

Image of the poster W4.1.2-3 Representative benchmark data of social media and digital tools for network analysis

Image of the poster W4.1.6 Enrich survey data with register data and unstructured text

Top of page

W5.1 Evidence-Based Infrastructure Development

Image of the poster W5.1 Evidence-Based Infrastructure Development

Top of page

W5.2 Log-based Data Analysis

Image of the poster Log-based Data Analysis. Unveiling the Past: User log-based Recommendation System for NLF Historical Newspapers

Top of page

<< FIN-CLARIAH event page

<< FIN-CLARIAH Overview

FIN-CLARIAH Meeting 10.6.2024 11-16

Place: Minerva Plaza, Siltavuorenpenger 5 A, University of Helsinki

 

The goal of the workshop is to let the whole infrastructure reflect on how SSH research will be affected by AI and how we as an infrastructure should prepare for this. In addition, we need to collect information on what preparations are already taking place in the different locations of the infrastructure.

Register by 24.5.2024

 

Poster session

 

Program

The goal of the workshop is to let the whole infrastructure reflect on how SSH research will be affected by AI and how we as an infrastructure should prepare for this. In addition, we need to collect information on what preparations are already taking place in the different locations of the infrastructure.

11.00-12.00 Keynote: Modern generative image modelling, Jaakko Lehtinen (Aalto University)

12.00-13.00 Lunch

13.00- Group discussions. Task: Brainstorm and develop an action plan for how to integrate transformer technology in the research infrastructure development (Tentative questions: AI for diverse data types? What is happening across locations? Making AI accessible and usable? Implications for SSH research, development, education?)

  1. Text data – Cultural heritage (facilitator Liisa Näpärä, NLF)
  2. Text data – Web / societal data (facilitator Veronika Laippala, TurkuNLP)
  3. Audiovisual, speech and audio data (facilitator Mikko Kurimo, Aalto)
  4. Still images (facilitator Ilkka Lähteenmäki, Oulu)

14.10-14.30 Coffee

14.30-15.00 Summaries from the groups

15.00 Poster session & Refreshments

16.00 Closing

 

 

Eurooppalainen kielidata-avaruus -työpaja

Vapauta datan mahdollisuudet yrityksille ja kansalaisille EU:ssa

 

Keskiviikkona 10.04.024 klo 9:00-15:15, Clarion Hotel Helsingissä

Järjestäjät:
European Language Data Space
Digitaalisten ihmistieteiden osasto, Helsingin yliopisto

Tervetuloa Eurooppalaisen kielidata-avaruuden -työpajaan!

Eurooppalainen kielidata-avaruus (European Language Data Space, LDS) ja Helsingin yliopisto kokoavat yhteen suomalaisen teollisuuden, julkishallinnon ja tutkimuksen asiantuntijoita keskustelemaan kielidatan merkityksestä kieliteknologioiden ja tekoälypohjaisten työkalujen kehittämiselle Suomessa. Tilaisuus järjestetään 10.04.2024 Clarion Hotel Helsingissä.

Vuoden 2023 alusta lähtien Euroopan komissio on ohjannut ja tukenut uutta tapaa jakaa kielidataa Eurooppalaisen kielidata-avaruuden (LDS) kautta. Tämä uusi tapa ulottuu kielidataa laajemmalle, ja se kattaa monia aloja ja toimintaympäristöjä niiden niiden omien data-avaruuksiensa kautta. Yhteiseurooppalaisten data-avaruuksien (Common European Data Spaces) perustamisen myötä tiedotus ja välitys datan eri tiedonkuvaus- ja saatavuusmuotojen välillä on toteutumassa kaikissa Euroopan maissa.

Tätä taustaa vasten Eurooppalaisen kielidata-avaruuden tavoitteena on rakentaa luotettavat ja tehokkaat datamarkkinat kielivarojen jakamiseen julkisella ja yksityisellä sektorilla EU:n datastrategian mukaisesti.

Eurooppalainen kielidata-avaruus (LDS) järjestää sarjan maakohtaisia työpajoja, joiden tarkoituksena on auttaa paikallisia yrityksiä, tutkimusryhmiä ja julkishallintoja ottamaan uuden kielidatanvaihtoavaruuden käyttöönsä ja liittymään relevantteihin paikallisiin ja eurooppalaisiin verkostoihin. Samalla ne voivat hyödyntää jo olemassa olevia luotettavia infrastruktuureja. Eurooppalaisena kielidatan jakamisalustana LDS voi auttaa paikallisia toimijoita kaupallistamaan kielidataansa monikielisessä Euroopassa, jossa kieliteknologioiden ja tekoälypohjaisten sovellusten merkitys jatkuvasti kasvaa.

 

Suomen LDS-työpaja

Suomen työpajassa käsitellään kotimaisen yksityisen ja julkisen sektorin sidosryhmien tarpeita kielidatan tarjoajina, integroijina ja/tai kuluttajina. Tapahtumassa jaetaan näiden tahojen kokemuksia ja vaatimuksia sekä selvitetään, kuinka voitaisiin päästä toivottuun teknologiseen kasvuun ja parantaa kilpailukykyä sekä kansallisella että Euroopan tasolla. Työpajassa keskustellaan siitä, kuinka LDS voi auttaa suomalaisia toimijoita ja tukea niiden pyrkimyksiä tuottaa, kaupallistaa tai hankkia kielidataa kieliteknologioiden ja tekoälypohjaisten työkalujen käyttövoimaksi Suomessa.

Työpaja on suunnattu datan haltijoille ja tarjoajille, kieliteknologioiden kehittäjille ja integraattoreille, pk-yrityksille sekä julkisen hallinnon edustajille, viranomaisille ja yhteistyökumppaneille. Työpaja on englanninkielinen.

Ilmoittautuminen

Osallistuminen on maksutonta, mutta tilaisuuteen on ilmoittauduttava etukäteen. Ilmoittautuminen on päättynyt 03.04.2024. Ota yhteys järjestäjiin ja tarkista, onko tilaisuuteen vielä paikkoja jäljellä: lareina-office [ATT] helsinki.fi

 

Suomen LDS-työpaja 10.4.2024, ohjelma

09:00 – 09:45

Ilmoittautuminen

09:55 – 10:05

Welcome and introduction
Krister Lindén, University of Helsinki

10:05 – 10:35

Welcome by the European Commission: The Digital Europe Programme and the Common European Language Data Space
Philippe Gelin, European Commission

10:35 – 11:05

The importance of language data for the development of LT solutions future steps
Aleksander Alafuzoff, Yle

11:05 – 11:30

Kahvitauko

11:30 – 11:40

Welcome by the Ministry of Finance
Olli-Pekka Rissanen, Ministry of Finance

11:40 – 12:30

Language Data and Language Technologies in Finland and for Finnish
– Panel session

Krister Lindén, University of Helsinki (Moderator)
Mikko Kurimo, Aalto University
Iftikhar Ahmad, Tietoevry
Peter Smit, Inscripta Oy
Riikka Lindroos-Järvitalo, KELA
Patrik Gayer, SiloAI
Kirsi Salmela, Kopiosto

12:30 – 13:00

European Language Data Space: developing a market for language data and services and benefitting from a joint European effort
Georg Rehm, LDS Consortium, German Research Center for Artificial Intelligence (DFKI)

13:00 – 13:50

Lounas

13:50 – 14:50

Language data production, management, and market development: overcoming obstacles – Panel session
Krister Lindén, University of Helsinki (Moderator)
Manu Setälä, Solita Oy
Kaarina Hyvönen, Kielikone Oy
Tiina Lindh-Knuutila, Lingsoft Language Services Oy
Tommi Lehtonen, KAVI
Ilkka Lavas, City Digital Group
Jörg Tiedemann, University of Helsinki

14:50 – 15:05

Conclusions
Krister Lindén, University of Helsinki

15:05 – 15:15

Kahvitauko ja verkostoituminen

15:15 – 16:15

Kahvitauko ja verkostoituminen jatkuvat Sitran järjestämässä Nordic Data Festival 2024 -tapahtumassa (rinnakkaistapahtumana Clarion Hotel Helsingissä)

 

 

Ota yhteys paikallisiin järjestäjiin:

Krister Lindén and Wilhelmina Dyster
Helsingin yliopisto
lareina-office [ATT] helsinki.fi

Viimeksi päivitetty: 05.04.2024

CSC Computing Environment free online courses 24.-25.4. (Part 1: Basics) and 15.-16.5. (Part 2: Next steps)

CSC Computing Environment, Part 1: Basics 24.-25.4.

This is what everyone should know about our computing environment when launching jobs!

Are you planning on using CSC’s high-performance computing (HPC) services (Puhti, Mahti, Allas…) in the near future? Have you been using these services already, but would like to make sure you are getting the most out of them? This intensive course is intended for you!

More info and registration at: https://ssl.eventilla.com/part1april24

CSC Computing Environment, Part 2: Next steps 15.-16.5.

How to handle large datasets, install own software and scale up workflows efficiently in CSC’s computing environment

Are you using CSC’s high-performance computing (HPC) services (Puhti, Mahti, Allas…), but want to make sure you are getting the most out of them? Are you working with data in the most efficient way? Want to know the best tips and tricks of the trade when scaling up your workflows? This intensive course is intended for you!

More info and registration at: https://ssl.eventilla.com/part2may24

 

These 2+2 half-day sessions focus on using the CSC HPC environment via short lectures and hands on tutorials. Please check the required course prerequisites.

Please note, that the same course is also available as a free self-learning online course at https://ssl.eventilla.com/csccompenvselflearn.

European Language Data Space (LDS) workshop in Finland

Unleashing the potential of data – for EU businesses and citizens

 

Wednesday 10.04.2024 at 9:00-15:15, Clarion Hotel Helsinki

Organisation:
European Language Data Space
Department of Digital Humanities, University of Helsinki

Welcome to the European Language Data Space workshop in Finland!

The European Language Data Space and the University of Helsinki are bringing together experts from the Finnish Industry, Public Administration and Research to discuss the importance of language data for the development of Language Technologies and AI-based tools in Finland. The event is taking place on 10.04.2024 at Clarion Hotel Helsinki.

Since early 2023, the European Commission is providing guidance and support towards a new dimension in language data sharing that is executed through the European Language Data Space (LDS). This new dimension goes beyond language data and addresses many areas and fields through their specific Data Spaces. With the establishment of the Common European Data Spaces, the communication and exchange amongst different modalities of data description and availability is becoming a reality for all European countries.

In this context, the European Language Data Space aims at building a trustworthy and effective data market for the exchange of language resources in the public and – even more importantly – in the private sector, in line with the EU Data Strategy.

For that purpose, the European Language Data Space (LDS) is going to organise a series of Country Workshops to support local industries, research groups and public administrations to integrate this new language data exchange space and connect with relevant local and European networks, while benefiting from the trustworthy infrastructures already available. As European language data sharing platform, the LDS can help local industry stakeholders to monetise their language data in a multilingual Europe where Language Technologies and AI-based applications play an increasingly important role.

 

The LDS workshop in Finland

The Finnish LDS workshop will address the needs of the Finnish stakeholders from both private and public sectors, be it providers, integrators and/or consumers of language data, while sharing their experiences and requirements and exploring how to meet the desired technological growth to enhance their competitiveness at both national and European levels. The LDS will present and discuss how it can help the Finnish stakeholders and support their efforts to produce/monetise/obtain language data to power LT and AI-based tools in Finland.

The workshop is addressed to data owners and data providers, LT developers and integrators and SMEs, as well as to public administration executives, officers and partners. The workshop will be held in English.

Registration

Participation is free of charge, but registration is required. Registration was closed on 03.04.2024. Please contact the organisers and check if there still are seats available: lareina-office [ATT] helsinki.fi

 

European Language Data Space (LDS) workshop in Finland on April 10th, 2024 Programme

09:00 – 09:45

Registration

09:55 – 10:05

Welcome and introduction
Krister Lindén, University of Helsinki

10:05 – 10:35

Welcome by the European Commission: The Digital Europe Programme and the Common European Language Data Space
Philippe Gelin, European Commission

10:35 – 11:05

The importance of language data for the development of LT solutions future steps
Aleksander Alafuzoff, Yle

11:05 – 11:30

Coffee Break

11:30 – 11:40

Welcome by the Ministry of Finance
Olli-Pekka Rissanen, Ministry of Finance

11:40 – 12:30

Language Data and Language Technologies in Finland and for Finnish
– Panel session

Krister Lindén, University of Helsinki (Moderator)
Mikko Kurimo, Aalto University
Iftikhar Ahmad, Tietoevry
Peter Smit, Inscripta Oy
Riikka Lindroos-Järvitalo, KELA
Patrik Gayer, SiloAI
Kirsi Salmela, Kopiosto

12:30 – 13:00

European Language Data Space: developing a market for language data and services and benefitting from a joint European effort
Georg Rehm, LDS Consortium, German Research Center for Artificial Intelligence (DFKI)

13:00 – 13:50

Lunch Break

13:50 – 14:50

Language data production, management, and market development: overcoming obstacles – Panel session
Krister Lindén, University of Helsinki (Moderator)
Manu Setälä, Solita Oy
Kaarina Hyvönen, Kielikone Oy
Tiina Lindh-Knuutila, Lingsoft Language Services Oy
Tommi Lehtonen, KAVI
Ilkka Lavas, City Digital Group
Jörg Tiedemann, University of Helsinki

14:50 – 15:05

Conclusions
Krister Lindén, University of Helsinki

15:05 – 15:15

Coffee Break and Networking

15:15 – 16:15

Coffee Break and Networking continue in Sitra’s Nordic Data Festival 2024 event (co-located in Clarion Hotel Helsinki)

 

 

Contact the local organizers for further details:

Krister Lindén and Wilhelmina Dyster
University of Helsinki
lareina-office [ATT] helsinki.fi

Last updated: April 5, 2024

ParlaCLARIN IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora – Call for Papers

The 2024 ParlaCLARIN Workshop will be held in May in Torino (Italy), as part of the LREC-COLING 2024 – The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation.

The Call for Papers is now open and the paper submission deadline is 19 February 2024.
Read more: https://www.clarin.eu/ParlaCLARIN-IV

<< FIN-CLARIAH Overview

FIN-CLARIAH Meeting 1.12.2023 11-17

Place: Väinö Linna -sali, Linna Building, Kalevantie 5, Tampere

 

Preliminary Program

11.00-11.10 Welcoming Words by Sanna Kumpulainen, Associate Professor in Information Studies, Tampere University

11.10-12.00 Keynote I on Studying SSH Research Needs: Elina Late, Senior Research Fellow in Information Studies, Tampere University

12.00-13.00 Lunch

13.00-13.45 Keynote II on Language Models: Sampo Pyysalo, Associate Professor at the Department of Computing, University of Turku

>>Download the slides of the work package presentations<< (pdf)

13.45-14.30 Work Package Presentations I

13.45-14.00 WP1.3 Veronika Laippala: Noise-Tolerant NLP

14.00-14.20 WP1.1, 1.2, 2.1, 2.2. & 2.3 Mietta Lennes: Kielipankki – The Language Bank of Finland

14.20-14.25 WP2.4 Harri Kettunen: Helsinki Term Bank for the Arts and Sciences

14.25-14.30 WP2.5 Jenny Tarvainen: Automated Text Tools for Learner Language

14.30-15.00 Coffee

15.00-16.00 Work Package Presentations II

15.00-15.05 WP3.1 Martin Matthiesen: Pipeline from the National Library to CSC

15.05-15.10 WP3.2 Tanja Välisalo: Named Entity Recognition for NARC Data

15.10-15.15 WP4.3 Eetu Mäkelä: Evaluation and Subsetting

15.15-15.20 W4.1 Julia Matveeva: Metadata Harmonization

15.20-15.25 WP4.4 Mikko Laitinen: Twitter

15.25-15.30 WP4.2 Eero Hyvönen: LOD

15.30-15.35 WP3.4 Raine Koskimaa: Game Streams

15.35-15.40 WP3.3 Maria Valaste: Qualitative Surveys

15.40-15.45 WP3.5 Kimmo Elo (Risto Turunen replacing): Text Networks

15.45-15.50 WP5 Sanna Kumpulainen: Evidence-based RI Development + Education & Resources

16.00-17.00 Free Chilling & Refreshments / Parallel session: Executive Board Meeting (with Zoom option)

 

 

Join the course Corpus Linguistics and Statistical Methods (4.9.-20.10.2023)

The online course Corpus Linguistics and Statistical Methods is intended for students in languages or other fields who wish to learn the basics of using corpora.

The course is offered in Finnish and in English and it is open to all university students in and outside Finland. The course has already started on Monday 4th September, but it will be possible to join the course area until 15th September.

Read more and register for the course!

NB: The same course will be organized again in period 3, starting on 15 Jan 2024 (see the course page).

The Donate Speech Corpus enabled researchers to analyze the typical voice pitch of more than 8000 speakers of Finnish

The 24th INTERSPEECH Conference was held on 20-24 August 2023 in Dublin, Ireland. At the conference, Mietta Lennes from the Language Bank of Finland presented a poster, based on the following conference article:

Lennes, M., Toivola, M. (2023). Pitch distributions in a very large corpus of spontaneous Finnish speech. Proceedings of INTERSPEECH 2023, 4778-4782, doi: 10.21437/Interspeech.2023-1822.

Take a look at the poster

 

This page has a persistent identifier: http://urn.fi/urn:nbn:fi:lb-2023081621

Mietta Lennes & Minnaleena Toivola:

Pitch distributions in a very large corpus of spontaneous Finnish speech

Poster and supplementary materials presented at Interspeech 2023, 20.-24.8.2023, Dublin, Ireland.

Last updated: 2023-08-25

This page contains a picture of the poster presented at the conference and some additional figures and details about the piece of research in question. For further information, please contact Mietta Lennes.

 


Poster

(Click to download the image as a pdf document)

The poster describes the main results of the conference article.


Additional figures

Pitch density of 60 speakers (red=female, blue=male; according to spelf-reported gender), after the second pass of the pitch detection process

Pitch density of 60 speakers (red=female, blue=male; according to spelf-reported gender); referred to speaker-specific most typical pitch (statistical mode), after the second pass of the pitch detection process

 


The pitch data used for this study

The pitch data calculated for this paper will be published as an online dataset. The link to the data will be added on this page.


References

  • D. R. Ladd, Intonational phonology. Cambridge: Cambridge University Press, 1996.
  • J. Leather,“Speaker normalization in perception of lexical tone,” Journal of Phonetics, vol. 11, pp. 373–382, 1983.
  • C. B. Moore and A. Jongman, “Speaker normalization in the perception of Mandarin Chinese tones,” The Journal of the Acoustical Society of America, vol. 102, pp. 1864–1877, 1997.
  • E. Couper-Kuhlen, “The prosody of repetition. on quoting and mimicry,” in Prosody in Conversation, E. Couper-Kuhlen and M. Selting, Eds. Cambridge: Cambridge University Press, 1996.
  • I. Mennen, “Second language acquisition of pitch range in German learners of English,” Studies in Second Language Acquisition, vol. 36, no. 2, pp. 303–329, 2014. Available: https://www.jstor.org/stable/26328942
  • M. Lennes, M. Stevanovic, D. Aalto, and P. Palo, “Comparing pitch distributions using Praat and R,” Phonetician, no. 111-112, pp. 35–53, 2015. Available at: https://researchportal.helsinki.fi/files/237386876/LennesStevanovicAaltoPalo_Phonetician2015.pdf
  • M. Lennes, D. Aalto, and P. Palo,“Puheen perustaajuusjakaumat: Alustavia tuloksia,” in Fonetiikan päivät 2008. XXV Fonetiikan päivillä Tampereen yliopistossa 11.-12.1.2008 pidetyt esitelmät. Tampere Studies in Language, Translation and Culture, Series B 3, M. O’Dell and T. Nieminen, Eds. Tampere: Tampere University Press, 2009, pp. 147–155. Available: https://urn.fi/urn:isbn:978-951-44-7580-1
  • R.S.Moore,“Comparison of children’s and adults’ vocalranges and preferred tessituras in singing familiar songs,” Bulletin of the Council for Research in Music Education, vol. 107, pp. 13–22, 1991. Available: http://www.jstor.org/stable/40318417
  • J. T. Eichhorn, R. D. Kent, D. Austin, and H. K. Vorperian, “Effects of aging on vocal fundamental frequency and vowel formants in men and women,” Journal of Voice, vol. 32, no. 5, pp. 644.e1–644.e9, 2018.
  • G. Saggio and G. Costantini, “Worldwide healthy adult voice baseline parameters: a comprehensive review,” Journal of Voice, vol. 36, no. 5, pp. 637–649, 2022.
  • M. Nishio and S. Niimi, “Changes in speaking fundamental frequency characteristics with aging,” Folia Phoniatr Logop, vol. 60, pp. 120–127, 2008.
  • L. Albuquerque, C. Oliveira, A. Teixeira, P. Sa-Couto, and D. Figueiredo, “A comprehensive analysis of age and gender effects in European Portuguese oral vowels,” Journal of Voice, vol. 37, no. 1, pp. 143.e13–143.e29, 2023.
  • E. T. Stathopoulos, J. E. Huber, and J. E. Sussman, “Changes in acoustic characteristics of the voice across the life span: Measures from individuals 4–93 years of age,” Journal of Speech, Language, and Hearing Research, vol. 54, no. 4, pp. 1011–1021, 2011.
  • S.Deliyski and D.A. Xue, “Effects of aging on selected acoustic voice parameters: Preliminary normative data and educational implications,” Educational gerontology, vol. 27, no. 2, pp. 159–168, 2001.
  • University of Helsinki, “Donate Speech Corpus, version 1.0,” 2022. [Dataset]. Kielipankki – The Language Bank of Finland. Available: http://urn.fi/urn:nbn:fi:lb-2020090321
  • S. Amiriparian, J. Han, M. Schmitt, A. Baird, A. Mallol-Ragolta, M. Milling, M. Gerczuk, and B. Schuller, “Synchronization in interpersonal speech,” Front Robot AI, vol. 6, no. 116, 2019.
  • K. Lindén, T. Jauhiainen, M. Lennes, M. Kurimo, A. Rossi, T. Kurki, and O. Pitkänen, “Donate Speech: Collecting and sharing a large-scale speech database for social sciences, humanities and artificial intelligence research and innovation,” in CLARIN: The Infrastructure for Language Resources, A. W. Darja Fišer, Ed. Berlin: de Gruyter, 2022, pp. 481–510.
  • P. Boersma and D. Weenink. (2022) Praat: doing phonetics by computer (Version 6.3.02). [Computer program]. Available: https: //www.praat.org/. Retrieved on 29.11.2022.
  • P. Boersma, “Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound,” Proceedings of the Institute of Phonetic Sciences, vol. 17, pp. 97–110, 1993.
  • Posit Software, PBC, “RStudio 2022.12.0 build 353,” [Computer program], 2022. Available: https://posit.co/downloads/.

How to cite this poster presentation page

Mietta Lennes & Minnaleena Toivola (2023). Pitch distributions in a very large corpus of spontaneous Finnish speech. Poster and supplementary materials. Interspeech 2023, 20.-24.8.2023, Dublin, Ireland. Available: http://urn.fi/urn:nbn:fi:lb-2023081621.

Cite the original conference article:

Lennes, M., Toivola, M. (2023). Pitch distributions in a very large corpus of spontaneous Finnish speech. Proc. INTERSPEECH 2023, 4778-4782, doi: 10.21437/Interspeech.2023-1822

<< FIN-CLARIAH Overview

FIN-CLARIAH Summer Event 6.6.2023 11-17

Place: CSC, Life Science Centre, Keilaranta 14 C, Espoo

 

 

Preliminary Program

11.00-11.10 Welcoming Words by Katri Tegel, Development Manager, CSC

11.10-12.00 Keynote: Mikko Kurimo, Professor of Speech and Language Processing, Aalto University

12.00-13.00 Lunch

13.00-15.00 Thematic Groups

  1. DH Education:
    This group gathers together people who are interested in DH education: how can we disseminate our RI services to Finnish SSH communities through education, both in the short and long term?
  2. Documentation:
    This group develops best practices for documentation inside the project: what is the quality we want to reach by the end of the year, and how our documentation varies in different contexts (e.g., graphical user interfaces versus code repositories)?
  3. Speech Data in Research:
    This group discusses the needs of researchers using speech data: what is the state-of-the-art, and how is FIN-CLARIAH going to push the field further?
  4. Visual Sources in Research:
    This group discusses the needs of researchers using visual sources (videos, images, photos): what is the state-of-the-art, and how is FIN-CLARIAH going to push the field further?

15.00-15.30 Coffee

15.30-16.15 Sharing the Results from the Groups

16.15-17.00 Free Chilling & Refreshments / Parallel session: Executive Board Meeting (with Zoom option)

 

FIN-CLARIAH Workshop Day 18.11.2022 11-17 @ University of Jyväskylä

<< FIN-CLARIAH Overview

Workshop Program

11.00-12.00 Jari Ojala: Welcoming words + Pasi Tyrväinen: Keynote

12.00-13.00 Lunch

13.00-13.15 Anna Sendra Toset: Results from FIN-CLARIAH interviews

13.15-14.30 Teamwork in thematic groups:

  1. CSC integration – Slides (Martin Matthiesen)
  2. Data licensing – Slides (Mietta Lennes)
  3. The end-user perspective I (Eetu Mäkelä)
  4. The end-user perspective II  (Mikko Laitinen)

14.30-15.00 Coffee 

15.00-16.00

  • Reports from thematic groups
  • Mikko Tolonen: Why metadata matters in FIN-CLARIAH? (Slides)
  • General discussion 

16.00-17.00 Socializing & refreshments (Executive board meeting)

 

Discover efficient workflows and plan your research data management in the Data Clinic course!

The open online course Data Clinic kicks off on 11th November 2022 and ends in late April 2023. During the winter and spring, you learn to write a Data Management Plan and get practical advice and support for collecting, processing and managing your research data. The participants will be working partly independently and partly in small groups of peers. You may attend the entire course remotely.

The course materials will be provided mainly in English. Students from all universities and all fields are welcome if space allows. The only prerequisite is that you are already starting a research project where you need to process and manage a data set that contains text documents or speech recordings, i.e., some language data.

Read more and join the course by 28.11.2022!

Open online course Introduction to Speech Analysis 31.10.–12.12.2022

In this online course, you get a grip of special tools that are available for transcribing and studying speech samples. You also learn about collecting and managing a speech corpus of your own. During the course, you will actively use the Praat program and get familiar with ELAN, too.

The course is open to students in all universities and you can take it either in Finnish or in English. The number of participants may be restricted if required. The course will be taught by Mietta Lennes and Juraj Šimko at the University of Helsinki.

The course has already begun, but you may still enrol and join in until 11th November.

Further information and link to the course on Moodle

FIN-CLARIAH Kick-off 3.6.2022

<< FIN-CLARIAH Overview

Posters presented in the kick-off event – Avajaistilaisuudessa esiteltävät posterit

To view or download the PDF version, click on the image.

W1.1 Text processing and annotation environments

Image of the poster W1.1 Text processing and annotation environments
Top of page

W1.2 Speech processing and annotation

Image of the poster W1.2 Speech processing and annotation
Top of page

W1.3 Noise-tolerant NLP

Image of the poster W1.3 Noise-tolerant NLP
Top of page

W2.1 Social Data Science

Image of the poster W2.1 Social Data Science
Top of page

W2.2 Learners’ Assessment Environments

Image of the poster W2.2 Learners' Assessment Environments
Top of page

W2.3 Translation and Interpretation

Image of the poster W2.3 Translation and Interpretation
Top of page

W2.4 Terminology

Image of the poster W2.4 Terminology
Top of page

W2.5 Solutions for better use of language learner performances in research

Image of the poster W2.5 Solutions for better use of language learner performances in research
Top of page

W3.1 Increasingly automated ingestion of material

Image of the poster W3.1 Increasingly automated ingestion of material
Top of page

W3.2 AI solutions to better use of National Archives mass digitisation services

Image of the poster W3.2 AI solutions to better use of National Archives mass digitisation services
Top of page

W3.3 Qualitative survey data

Image of the poster W3.3 Qualitative survey data
Top of page

W3.4 Analysis tools for real-time chats in gameplay streams

Image of the poster W3.4 Analysis tools for real-time chats in gameplay streams
Top of page

W3.5 Text network analysis of political texts

Image of the poster W3.5 Text network analysis of political texts
Top of page

W4.1 Metadata harmonization and analysis

Image of the poster W4.1 Metadata harmonization and analysis
Top of page

W4.2 Linked Open Data Services

Image of the poster W4.2 Linked Open Data Services
Top of page

W4.3 Subsetting and evaluating data

Image of the poster W4.3 Subsetting and evaluating data
Top of page

W4.4 Social media noise (and how to tackle it?)

Image of the poster W4.4 Social media noise (and how to tackle it?)
Top of page

W5.1 & 5.2 Information interaction

Image of the poster W5.1 & 5.2 Information interaction

Top of page

Posters about FIN-CLARIAH

The kick-off get-together of the FIN-CLARIAH infrastructure project is held in the premises of the National Library on 3.6.2022. You can see the posters online on the event page.

Register now for the online course Corpus Linguistics and Statistical Methods

The online course  Corpus Linguistics and Statistical Methods (Korpuslingvistiikka ja tilastolliset menetelmät, 5 credits) will be offered again during 17.1.–6.3.2022. This course can be taken either in Finnish or in English.

The total number of participants will be restricted, but it will be possible to participate the course from outside the University of Helsinki and even from outside Finland. If you are a student from outside the University of Helsinki, please find further details and the link for joining the Moodle area on the course home page (see below). Students from the University of Helsinki should first register via Sisu.

Registration for the course is open until 28.1.2022 (unless the maximum number of participants is exceeded before then).

Home page of the course

 

Find more courses and training by Kielipankki

Donate Speech awarded with Prix Europa: Best European Digital Audio Project of the Year 2021

The Donate Speech campaign, where the Language Bank of Finland has been involved, was awarded with PRIX EUROPA: Best European Digital Audio Project of the Year 2021 (see https://www.prixeuropa.eu/news/2021/10/15winners-y4emh). The award ceremony took place in Potsdam, Germany on 15th October, 2021.

Earlier this year, Donate Speech also won the national Grand One award for Best Mobile Service of the Year, including a distinction for Best Use of Data.

Donate Speech is a joint project of Yle – the Finnish Broadcasting Company, Vake Oy (current Ilmastorahasto), Solita, Aalto University and the University of Helsinki.

 

If you speak and understand Finnish, you can donate your speech here!

University of Helsinki Open Science Award 2021 was granted to the Language Bank of Finland and the Donate Speech campaign

On 29th October 2021, the Language Bank of Finland and the Donate Speech campaign (Lahjoita puhetta) were awarded by the University of Helsinki in recognition of exceptional work in promoting the accessibility and reusability of research data. In addition to the Language Bank, the award was given to Research Coordinator Kati Lassila-Perini.

In the award ceremony, Research Director Krister Lindén gave a presentation that is now available on YouTube with English subtitles. Read more about the award on the website of the University of Helsinki.

Open online course Introduction to Speech Analysis 1.11.-17.12.2021

In this online course, you get a grip of special tools that are available for transcribing and studying speech samples. You also learn about collecting and managing a speech corpus of your own. During the course, you will actively use the Praat program and get familiar with ELAN, too.

The course is open to students in all universities and you can take it either in Finnish or in English. The number of participants may be restricted if required. The course will be taught by Mietta Lennes and Juraj Šimko at the University of Helsinki.

Join the course by 12th November!

Further information and link to the course on Moodle

Search the Language Bank Portal:
Krister Lindén
Researcher of the Month: Krister Lindén

 

Upcoming events


Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information