Join the course Corpus Linguistics and Statistical Methods (4.9.-20.10.2023)

The online course Corpus Linguistics and Statistical Methods is intended for students in languages or other fields who wish to learn the basics of using corpora.

The course is offered in Finnish and in English and it is open to all university students in and outside Finland. The course has already started on Monday 4th September, but it will be possible to join the course area until 15th September.

Read more and register for the course!

NB: The same course will be organized again in period 3, starting on 15 Jan 2024 (see the course page).

The Donate Speech Corpus enabled researchers to analyze the typical voice pitch of more than 8000 speakers of Finnish

The 24th INTERSPEECH Conference was held on 20-24 August 2023 in Dublin, Ireland. At the conference, Mietta Lennes from the Language Bank of Finland presented a poster, based on the following conference article:

Lennes, M., Toivola, M. (2023). Pitch distributions in a very large corpus of spontaneous Finnish speech. Proceedings of INTERSPEECH 2023, 4778-4782, doi: 10.21437/Interspeech.2023-1822.

Take a look at the poster

 

This page has a persistent identifier: http://urn.fi/urn:nbn:fi:lb-2023081621

Mietta Lennes & Minnaleena Toivola:

Pitch distributions in a very large corpus of spontaneous Finnish speech

Poster and supplementary materials presented at Interspeech 2023, 20.-24.8.2023, Dublin, Ireland.

Last updated: 2023-08-25

This page contains a picture of the poster presented at the conference and some additional figures and details about the piece of research in question. For further information, please contact Mietta Lennes.

 


Poster

(Click to download the image as a pdf document)

The poster describes the main results of the conference article.


Additional figures

Pitch density of 60 speakers (red=female, blue=male; according to spelf-reported gender), after the second pass of the pitch detection process
Pitch density of 60 speakers (red=female, blue=male; according to spelf-reported gender); referred to speaker-specific most typical pitch (statistical mode), after the second pass of the pitch detection process

 


The pitch data used for this study

The pitch data calculated for this paper will be published as an online dataset. The link to the data will be added on this page.


References

  • D. R. Ladd, Intonational phonology. Cambridge: Cambridge University Press, 1996.
  • J. Leather,“Speaker normalization in perception of lexical tone,” Journal of Phonetics, vol. 11, pp. 373–382, 1983.
  • C. B. Moore and A. Jongman, “Speaker normalization in the perception of Mandarin Chinese tones,” The Journal of the Acoustical Society of America, vol. 102, pp. 1864–1877, 1997.
  • E. Couper-Kuhlen, “The prosody of repetition. on quoting and mimicry,” in Prosody in Conversation, E. Couper-Kuhlen and M. Selting, Eds. Cambridge: Cambridge University Press, 1996.
  • I. Mennen, “Second language acquisition of pitch range in German learners of English,” Studies in Second Language Acquisition, vol. 36, no. 2, pp. 303–329, 2014. Available: https://www.jstor.org/stable/26328942
  • M. Lennes, M. Stevanovic, D. Aalto, and P. Palo, “Comparing pitch distributions using Praat and R,” Phonetician, no. 111-112, pp. 35–53, 2015. Available at: https://researchportal.helsinki.fi/files/237386876/LennesStevanovicAaltoPalo_Phonetician2015.pdf
  • M. Lennes, D. Aalto, and P. Palo,“Puheen perustaajuusjakaumat: Alustavia tuloksia,” in Fonetiikan päivät 2008. XXV Fonetiikan päivillä Tampereen yliopistossa 11.-12.1.2008 pidetyt esitelmät. Tampere Studies in Language, Translation and Culture, Series B 3, M. O’Dell and T. Nieminen, Eds. Tampere: Tampere University Press, 2009, pp. 147–155. Available: https://urn.fi/urn:isbn:978-951-44-7580-1
  • R.S.Moore,“Comparison of children’s and adults’ vocalranges and preferred tessituras in singing familiar songs,” Bulletin of the Council for Research in Music Education, vol. 107, pp. 13–22, 1991. Available: http://www.jstor.org/stable/40318417
  • J. T. Eichhorn, R. D. Kent, D. Austin, and H. K. Vorperian, “Effects of aging on vocal fundamental frequency and vowel formants in men and women,” Journal of Voice, vol. 32, no. 5, pp. 644.e1–644.e9, 2018.
  • G. Saggio and G. Costantini, “Worldwide healthy adult voice baseline parameters: a comprehensive review,” Journal of Voice, vol. 36, no. 5, pp. 637–649, 2022.
  • M. Nishio and S. Niimi, “Changes in speaking fundamental frequency characteristics with aging,” Folia Phoniatr Logop, vol. 60, pp. 120–127, 2008.
  • L. Albuquerque, C. Oliveira, A. Teixeira, P. Sa-Couto, and D. Figueiredo, “A comprehensive analysis of age and gender effects in European Portuguese oral vowels,” Journal of Voice, vol. 37, no. 1, pp. 143.e13–143.e29, 2023.
  • E. T. Stathopoulos, J. E. Huber, and J. E. Sussman, “Changes in acoustic characteristics of the voice across the life span: Measures from individuals 4–93 years of age,” Journal of Speech, Language, and Hearing Research, vol. 54, no. 4, pp. 1011–1021, 2011.
  • S.Deliyski and D.A. Xue, “Effects of aging on selected acoustic voice parameters: Preliminary normative data and educational implications,” Educational gerontology, vol. 27, no. 2, pp. 159–168, 2001.
  • University of Helsinki, “Donate Speech Corpus, version 1.0,” 2022. [Dataset]. Kielipankki – The Language Bank of Finland. Available: http://urn.fi/urn:nbn:fi:lb-2020090321
  • S. Amiriparian, J. Han, M. Schmitt, A. Baird, A. Mallol-Ragolta, M. Milling, M. Gerczuk, and B. Schuller, “Synchronization in interpersonal speech,” Front Robot AI, vol. 6, no. 116, 2019.
  • K. Lindén, T. Jauhiainen, M. Lennes, M. Kurimo, A. Rossi, T. Kurki, and O. Pitkänen, “Donate Speech: Collecting and sharing a large-scale speech database for social sciences, humanities and artificial intelligence research and innovation,” in CLARIN: The Infrastructure for Language Resources, A. W. Darja Fišer, Ed. Berlin: de Gruyter, 2022, pp. 481–510.
  • P. Boersma and D. Weenink. (2022) Praat: doing phonetics by computer (Version 6.3.02). [Computer program]. Available: https: //www.praat.org/. Retrieved on 29.11.2022.
  • P. Boersma, “Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound,” Proceedings of the Institute of Phonetic Sciences, vol. 17, pp. 97–110, 1993.
  • Posit Software, PBC, “RStudio 2022.12.0 build 353,” [Computer program], 2022. Available: https://posit.co/downloads/.

How to cite this poster presentation page

Mietta Lennes & Minnaleena Toivola (2023). Pitch distributions in a very large corpus of spontaneous Finnish speech. Poster and supplementary materials. Interspeech 2023, 20.-24.8.2023, Dublin, Ireland. Available: http://urn.fi/urn:nbn:fi:lb-2023081621.

Cite the original conference article:

Lennes, M., Toivola, M. (2023). Pitch distributions in a very large corpus of spontaneous Finnish speech. Proc. INTERSPEECH 2023, 4778-4782, doi: 10.21437/Interspeech.2023-1822

<< FIN-CLARIAH Overview

FIN-CLARIAH Summer Event 6.6.2023 11-17

Place: CSC, Life Science Centre, Keilaranta 14 C, Espoo

 

 

Preliminary Program

11.00-11.10 Welcoming Words by Katri Tegel, Development Manager, CSC

11.10-12.00 Keynote: Mikko Kurimo, Professor of Speech and Language Processing, Aalto University

12.00-13.00 Lunch

13.00-15.00 Thematic Groups

  1. DH Education:
    This group gathers together people who are interested in DH education: how can we disseminate our RI services to Finnish SSH communities through education, both in the short and long term?
  2. Documentation:
    This group develops best practices for documentation inside the project: what is the quality we want to reach by the end of the year, and how our documentation varies in different contexts (e.g., graphical user interfaces versus code repositories)?
  3. Speech Data in Research:
    This group discusses the needs of researchers using speech data: what is the state-of-the-art, and how is FIN-CLARIAH going to push the field further?
  4. Visual Sources in Research:
    This group discusses the needs of researchers using visual sources (videos, images, photos): what is the state-of-the-art, and how is FIN-CLARIAH going to push the field further?

15.00-15.30 Coffee

15.30-16.15 Sharing the Results from the Groups

16.15-17.00 Free Chilling & Refreshments / Parallel session: Executive Board Meeting (with Zoom option)

 

FIN-CLARIAH Workshop Day 18.11.2022 11-17 @ University of Jyväskylä

<< FIN-CLARIAH Overview

Workshop Program

11.00-12.00 Jari Ojala: Welcoming words + Pasi Tyrväinen: Keynote

12.00-13.00 Lunch

13.00-13.15 Anna Sendra Toset: Results from FIN-CLARIAH interviews

13.15-14.30 Teamwork in thematic groups:

  1. CSC integration – Slides (Martin Matthiesen)
  2. Data licensing – Slides (Mietta Lennes)
  3. The end-user perspective I (Eetu Mäkelä)
  4. The end-user perspective II  (Mikko Laitinen)

14.30-15.00 Coffee 

15.00-16.00

  • Reports from thematic groups
  • Mikko Tolonen: Why metadata matters in FIN-CLARIAH? (Slides)
  • General discussion 

16.00-17.00 Socializing & refreshments (Executive board meeting)

 

Discover efficient workflows and plan your research data management in the Data Clinic course!

The open online course Data Clinic kicks off on 11th November 2022 and ends in late April 2023. During the winter and spring, you learn to write a Data Management Plan and get practical advice and support for collecting, processing and managing your research data. The participants will be working partly independently and partly in small groups of peers. You may attend the entire course remotely.

The course materials will be provided mainly in English. Students from all universities and all fields are welcome if space allows. The only prerequisite is that you are already starting a research project where you need to process and manage a data set that contains text documents or speech recordings, i.e., some language data.

Read more and join the course by 28.11.2022!

Open online course Introduction to Speech Analysis 31.10.–12.12.2022

In this online course, you get a grip of special tools that are available for transcribing and studying speech samples. You also learn about collecting and managing a speech corpus of your own. During the course, you will actively use the Praat program and get familiar with ELAN, too.

The course is open to students in all universities and you can take it either in Finnish or in English. The number of participants may be restricted if required. The course will be taught by Mietta Lennes and Juraj Šimko at the University of Helsinki.

The course has already begun, but you may still enrol and join in until 11th November.

Further information and link to the course on Moodle

FIN-CLARIAH Kick-off 3.6.2022

<< FIN-CLARIAH Overview

Posters presented in the kick-off event – Avajaistilaisuudessa esiteltävät posterit

To view or download the PDF version, click on the image.

W1.1 Text processing and annotation environments

Image of the poster W1.1 Text processing and annotation environments
Top of page

W1.2 Speech processing and annotation

Image of the poster W1.2 Speech processing and annotation
Top of page

W1.3 Noise-tolerant NLP

Image of the poster W1.3 Noise-tolerant NLP
Top of page

W2.1 Social Data Science

Image of the poster W2.1 Social Data Science
Top of page

W2.2 Learners’ Assessment Environments

Image of the poster W2.2 Learners' Assessment Environments
Top of page

W2.3 Translation and Interpretation

Image of the poster W2.3 Translation and Interpretation
Top of page

W2.4 Terminology

Image of the poster W2.4 Terminology
Top of page

W2.5 Solutions for better use of language learner performances in research

Image of the poster W2.5 Solutions for better use of language learner performances in research
Top of page

W3.1 Increasingly automated ingestion of material

Image of the poster W3.1 Increasingly automated ingestion of material
Top of page

W3.2 AI solutions to better use of National Archives mass digitisation services

Image of the poster W3.2 AI solutions to better use of National Archives mass digitisation services
Top of page

W3.3 Qualitative survey data

Image of the poster W3.3 Qualitative survey data
Top of page

W3.4 Analysis tools for real-time chats in gameplay streams

Image of the poster W3.4 Analysis tools for real-time chats in gameplay streams
Top of page

W3.5 Text network analysis of political texts

Image of the poster W3.5 Text network analysis of political texts
Top of page

W4.1 Metadata harmonization and analysis

Image of the poster W4.1 Metadata harmonization and analysis
Top of page

W4.2 Linked Open Data Services

Image of the poster W4.2 Linked Open Data Services
Top of page

W4.3 Subsetting and evaluating data

Image of the poster W4.3 Subsetting and evaluating data
Top of page

W4.4 Social media noise (and how to tackle it?)

Image of the poster W4.4 Social media noise (and how to tackle it?)
Top of page

W5.1 & 5.2 Information interaction

Image of the poster W5.1 & 5.2 Information interaction

Top of page

Posters about FIN-CLARIAH

The kick-off get-together of the FIN-CLARIAH infrastructure project is held in the premises of the National Library on 3.6.2022. You can see the posters online on the event page.

Register now for the online course Corpus Linguistics and Statistical Methods

The online course  Corpus Linguistics and Statistical Methods (Korpuslingvistiikka ja tilastolliset menetelmät, 5 credits) will be offered again during 17.1.–6.3.2022. This course can be taken either in Finnish or in English.

The total number of participants will be restricted, but it will be possible to participate the course from outside the University of Helsinki and even from outside Finland. If you are a student from outside the University of Helsinki, please find further details and the link for joining the Moodle area on the course home page (see below). Students from the University of Helsinki should first register via Sisu.

Registration for the course is open until 28.1.2022 (unless the maximum number of participants is exceeded before then).

Home page of the course

 

Find more courses and training by Kielipankki

Donate Speech awarded with Prix Europa: Best European Digital Audio Project of the Year 2021

The Donate Speech campaign, where the Language Bank of Finland has been involved, was awarded with PRIX EUROPA: Best European Digital Audio Project of the Year 2021 (see https://www.prixeuropa.eu/news/2021/10/15winners-y4emh). The award ceremony took place in Potsdam, Germany on 15th October, 2021.

Earlier this year, Donate Speech also won the national Grand One award for Best Mobile Service of the Year, including a distinction for Best Use of Data.

Donate Speech is a joint project of Yle – the Finnish Broadcasting Company, Vake Oy (current Ilmastorahasto), Solita, Aalto University and the University of Helsinki.

 

If you speak and understand Finnish, you can donate your speech here!

University of Helsinki Open Science Award 2021 was granted to the Language Bank of Finland and the Donate Speech campaign

On 29th October 2021, the Language Bank of Finland and the Donate Speech campaign (Lahjoita puhetta) were awarded by the University of Helsinki in recognition of exceptional work in promoting the accessibility and reusability of research data. In addition to the Language Bank, the award was given to Research Coordinator Kati Lassila-Perini.

In the award ceremony, Research Director Krister Lindén gave a presentation that is now available on YouTube with English subtitles. Read more about the award on the website of the University of Helsinki.

Open online course Introduction to Speech Analysis 1.11.-17.12.2021

In this online course, you get a grip of special tools that are available for transcribing and studying speech samples. You also learn about collecting and managing a speech corpus of your own. During the course, you will actively use the Praat program and get familiar with ELAN, too.

The course is open to students in all universities and you can take it either in Finnish or in English. The number of participants may be restricted if required. The course will be taught by Mietta Lennes and Juraj Šimko at the University of Helsinki.

Join the course by 12th November!

Further information and link to the course on Moodle

Join the online course Natural Language Processing for Linguists

The online course Natural Language Processing for Linguists will be taught by Tuomo Hiippala at the University of Helsinki during 15.3.–10.5.2021.

The course is also open to students from universities outside Helsinki, if space allows. Registration is open until 16th March.

Note also that all the course materials will be available online and you can use them even if you cannot make it to the course this time!

Read more and register

 

Welcome to the next Kielipankki Live 14.12.2020 at 13:00

The next Kielipankki Live event will be held on Monday 14th December starting at 13:00 via Zoom. The event will be  in English, but questions are welcome in Finnish as well! The main themes are speech corpora and personal data practices. Join us for the interviews and presentations of special guests and for good discussions! Register preferably by 11the December.

Program and registration details

Suomeksi

European Language Grid (ELG): Introduction and overview

4th Regional ELG Workshop: Finland

15th December 2020, 14.00-16.30
Online event

 

The European Language Grid (ELG) aims to provide a digital marketplace where European companies, organizations and citizens can both offer and efficiently use language technologies, data sets and services. The ELG workshop presents an overview of the ELG platform and the ELG pilot projects. Welcome to see what ELG has to offer for you!

Registration

The workshop is a free online event, but registration is required. Please register via the ELRC website by 10th December. NB: In case you wish to participate in the ELG tutorial session that may be arranged after the workshop, please indicate this in the field for additional information on the registration form. Thanks!

Note that the third ELRC Workshop in Finland will also take place online, in the same virtual room, on the same day at 9.30-12.40. Welcome to participate in both events!

Program (provisional)

14:00Welcome and introduction
14:05ELG Overview
Katrin Marheinecke
14:30ELG online demo
Nils Feldhus
14:50Presentations of Finnish Pilot Projects funded in ELG: PARA4DLM (University of Turku), LSDISCO (Lingsoft); OPUS-MT (University of Helsinki)
15:20Expectations/requirements of Finnish Language Technology providers
Marko Turpeinen, 1001Lakes
15:40Summary and discussion
16:00End of workshop
16:15Tutorial: How to integrate a service into ELG
This tutorial may be organized according to requests from the participants. Please indicate your interest in the registration form.

Last updated: December 7, 2020

The online course Data Clinic 2020-21 will begin soon

This online course can support you with practical issues in managing the research data you need for your MA thesis or PhD project. You can join the course from any university, given that you fulfil the criteria. There is plenty of room left at the moment. Note, however, that the number of participants is restricted and students in the LingDig MA programme at the University of Helsinki have priority.

Read more and register…

 

See all online courses and training

 

Workshop day to be organized together with ELRC and ELG on 15th December 2020

FIN-CLARIN is planning an online event together with ELRC (European Language Resource Coordination) and ELG (European Language Grid), to be organized on 15th December 2020.

Mark your calendars! Further information will be updated on the event page.

Suomeksi

3rd European Language Resource Coordination (ELRC) workshop in Finland

15.12.2020 at 9:30-12:40

Organizers:
The European Language Resource Coordination (ELRC) consortium
Department of Digital Humanities, University of Helsinki

Welcome to the third ELRC workshop in Finland!

Language Technology is shaping our multilingual future. It has already been transforming the way we interact with our devices and with each other, the way we shop, work and travel. More and more it reshapes our interaction with service providers, either public or private. Programs that automatically correct spelling errors and aid sophisticated writing, digital assistants that transform our voices to text messages on mobile phones, bots that answer our calls to the bank or to our social security organisation, systems that automatically translate from a foreign language, and much more, are already empowering our everyday lives, our businesses and our administrations. But can we fully use our own language in our digital interactions? Is our language adequately supported and ready to keep pace with the technological advancements of the AI era?

The third Finnish European Language Resource Coordination (ELRC) workshop will address these questions and it will seek to engage participants in a fruitful discussion on the status and prospects of Language Technology for Finnish. Developers, integrators and users of Language Technology, both from the private and public sector will share experiences, requirements and ways for transforming digital interaction in our multilingual Europe with Language Technologies. Finally, we will discuss how language data, i.e. texts and speech, can fuel development in Artificial Intelligence.

This workshop continues the series of previous ELRC workshops that were organized in Finland on 19.2.2016 and 24.10.2018.

Now in collaboration with European Language Grid (ELG)

This ELRC workshop is organized in collaboration with the European Language Grid (ELG). The 4th Regional ELG Workshop will take place in the afternoon, starting at 14:00. For details, see the ELG workshop page. Welcome to register and attend both events!

Registration

The ELRC workshop is a free event, but registration is required. You can use the same form to register to both the ELRC workshop (morning sessions) and the ELG workshop (afternoon sessions).

Please register via the ELRC website by 10th December. Welcome!

Program (provisional)

09:30 – 09:40

Welcome and introduction (video, pdf)
Krister Lindén, University of Helsinki / FIN-CLARIN

09:40 – 10:00

The potential of Language Technology and AI – where we are, where we should be heading (video, pdf)
Jörg Tiedemann, University of Helsinki

10:00 – 10:30

Language Technologies for the Languages of Finland – Panel session (video, pdf)
Filip Ginter, University of Turku (Moderator)
Sebastian Andersson, Lingsoft
Jörg Tiedemann, University of Helsinki
Sampo Pyysalo, University of Turku
Pasi Tapanainen, Etuma
Kaarina Hyvönen, Kielikone

10:30 – 10:45

Coffee Break

10:45 – 11:15 

The CEF AT Platform (video, pdf)
Vilmantas Liubinas, 
European Commission

11:15 – 11:45

Language technologies by/for the public sector – Panel session (video, pdf)
Jouko Salonen, Finnish Immigration Service (Moderator)
Osma Suominen, National Library of Finland
Ville Viitasaari, Kela
Kaisamari Kuhmonen, Prime Minister’s Office

11:45 – 12:15

Language data creation, management and sharing: existing practices and challenges – Panel session (video)
Aleksi Rossi, YLE (Moderator)
Krister Lindén, University of Helsinki / FIN-CLARIN
Mikko Kurimo, Aalto University
Tommi Kurki, University of Turku

12:15 – 12:30

The EU Council Presidency Translator – Finnish presidency success story and what’s beyond (video, pdf)
Pekka Myllylä, Managing Director at Tilde Eesti OÜ

12:30 – 12:40

Conclusions (video, pdf)
Krister Lindén, University of Helsinki / FIN-CLARIN

12:40 – 14:00

Break

14:00 – 16:30

European Language Grid (ELG): Introduction and overview.
4th Regional European Language Grid (ELG) Workshop in Finland

The ELG workshop is organized in collaboration with the European Language Grid (ELG) and it will take place in the same online meeting room as the ELRC workshop. Please note that the ELG workshop will be held in English only. Welcome to register and participate in both events!

The detailed program for the ELG workshop is updated at https://www.kielipankki.fi/elg-workshop-2020/.

Please register via the ELRC website by 10th December. Welcome!

Contact the local organizers for further details:

Mietta Lennes and Tommi Jauhiainen
University of Helsinki / FIN-CLARIN
fin-clarin [ATT] helsinki.fi

Last updated: December 8, 2020

Join the online course Introduction to Speech Analysis!

The open online course Introduction to Speech Analysis (5 ECTS) has just started. The course is now offered for the first time in both Finnish and in English. Within the group size limits, you can join in from any university until 6th November 2020. See the course home page for instructions on how to enrol the course area on Moodle.

During the course, you learn to transcribe and to annotate speech and to understand some of the most important acoustic displays and measurement methods that can be used in speech research. The main tool of the course is the Praat analysis program, but we will also take a look at ELAN. The course can be relevant for students in phonetics, linguistics and languages, but also in other fields where audio recordings of speech are used for research.

All the courses offered by FIN-CLARIN can be found on the Training page.

Hae Kielipankki-portaalista:
Sampo Pyysalo
Kuukauden tutkija: Sampo Pyysalo

 

Tulevat tapahtumat


Yhteystiedot

Kielipankin tekninen ylläpito:
kielipankki (ät) csc.fi
p. 09 4572001

Aineistoihin ja muuhun sisältöön liittyvät asiat:
fin-clarin (ät) helsinki.fi
p. 029 4129317

Tarkemmat yhteystiedot