Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Heikki Rasilo tells us about his use of the Aalto University DSP Course Conversation Corpus for his research related to speech production.
I am Heikki Rasilo, a postdoc researcher in the Artificial Intelligence Lab at Vrije Universiteit Brussel, Belgium. I got my PhD as a joint degree between VUB and Aalto University in 2017. After working in the private sector for a couple of years, I received a research grant from Ulla Tuominen Foundation, through the Finnish Foundations’ Post Doc Pool (Säätiöiden post doc -pooli), for continuing my research.
Already from the beginning of my PhD studies, my main research focus has been on physical speech production and on its learning mechanisms. How do human children learn to articulate and imitate the speech of their parents while using their own vocal tracts of very different size and shape? The acoustic properties of adult and infant speech are different as well, and it is difficult to compare them directly. Nevertheless, children learn to articulate their mother tongue, and I am interested in whether the articulatory learning process can also affect the way in which we recognize and comprehend speech. Perhaps one of the reasons why we understand speech better than machines is that we know the physical mechanism through which speech is produced.
I am currently investigating whether the acoustic representations of speech that are formed in learning speech articulation could also be utilized in automatic speech recognition. The amount of recorded speech data that is required in order to train the world’s best speech recognizers is vast, and human children are not likely to encounter a similar amount of speech during their speech acquisition process. Therefore, it must be possible to learn to understand speech with smaller amounts of data, and physical articulation may play a role in the learning process.
In a study that was published last year, I trained a neural network to simultaneously recognize both phonemes and physical articulation from speech. The hypothesis was that the articulatory learning would shape the representations the network would learn, and these new representations could be helpful also when recognizing phonemes. For the experiment, I needed some recorded speech as well as articulatory information related to it. In the Language Bank of Finland, I found the Aalto University DSP Course Conversation Corpus that contained a sufficient amount of Finnish speech material including phonemic transcriptions. From the transcriptions, I was able to generate coarse synthetic articulatory data by using a Finnish speech synthesizer. The results of the experiment were promising – the articulatory learning did shape the speech representations in ways that can enhance phoneme recognition.
In my previous research, I have also used the CAREGIVER Corpus (available via ELRA) that consists of simple sentences and their orthographic transcriptions. With Academy Research Fellow Okko Räsänen, we used the corpus in order to investigate certain algorithms for learning word-meaning mappings, word segmentation and acoustic patterns related to words.
Rasilo, H. (2020). Phonemic learning based on articulatory-acoustic speech representations. In S. Denison., M. Mack, Y. Xu, & B.C. Armstrong (Eds.), Proceedings of the 42nd Annual Conference of the Cognitive Science Society (pp. 2203–2209). Cognitive Science Society. Available at: https://cogsci.mindmodeling.org/2020/papers/0512/index.html
Rasilo, H. & Räsänen, O. (2017), An online model for vowel imitation learning. Speech Communication, 86, 1-23. Available at: https://doi.org/10.1016/j.specom.2016.10.010
Räsänen, O. & Rasilo, H. (2015), A joint model of word segmentation and meaning acquisition through cross-situational learning. Psychological Review, 122(4), 792–829. Available at: https://psycnet.apa.org/doi/10.1037/a0039702
Rasilo, H. & Räsänen, O. (2015), Weakly-supervised word learning is improved by an active online algorithm. Proceedings of the 16th Annual Conference of the International Speech Communication Association (Interspeech 2015), Dresden, Germany, pp. 1561-1565. Available at: https://www.isca-speech.org/archive/interspeech_2015/i15_1561.html
The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.
All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Humanities of the University of Helsinki.