Suomeksi

Researcher of the Month: Krista Ojutkangas

Krista Ojutkangas
Photo: Oona Rouvinen

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Krista Ojutkangas tells us about their corpus-based research on the Finnish language. Their work combines qualitative and quantitative observations, making both methodological perspectives relevant.

Who are you?

I am Krista Ojutkangas, Adjunct Professor and University Lecturer in Finnish language at the University of Turku. I conduct research in the project “Finnish relations: Changes in Finnish relational predicates from the 16th century to the present (FiRe)”, led by Tuomas Huumo and funded by the Research Council of Finland.

What is your research topic?

I am interested in grammar, especially the relationship between linguistic structures and semantics. In particular, I have studied spatial semantics, that is, how different locations are expressed in language, for example through local cases and postpositions. I am also interested in the diachronic development of these elements, which I have recently explored together with Minna Jaakola. Furthermore, I am interested in phenomena in Old Literary Finnish, such as word paratagmas or parallelism (cf. Finnish word combinations hyvä ja lysti ’good and funny’, juurtua ja itää ’root and sprout’, kuulla ja ymmärtää ’hear and understand’), which I have studied together with Kirsi-Maria Nummila. My latest research topic is transitivity, which I have investigated together with Ilmari Ivaska. This project began as an innocent plan for a single article, but preparations are already underway for seasons three and four.

How is your research related to Kielipankki – the Language Bank of Finland?

I have always conducted corpus-based research in various ways, and I am also interested in the methods used in such research. I co-authored a chapter on qualitative corpus-based research together with Milla Luodonpää-Manni for a book on research methods in linguistics. In that chapter, I provide a very concrete account of my own research approach through illustrative case examples. I usually describe my research as qualitative, but in practice, qualitative and quantitative observations intertwine, and it is not meaningful to draw a strict line between methodological perspectives. In a study I conducted with Ilmari Ivaska, quantitative methods have taken center stage thanks to Ilmari’s expertise. Even in this collaboration, I always end up close-reading the data, and I’m persistent when it comes to tapping out a detailed analysis.

Of all the resources available through the Language Bank of Finland, the one closest to me is the Finnish Dialect Corpus of the Syntax Archive. As a bonus, it offers a glimpse into agrarian Finland of the last century. Most of its material is dialect interviews conducted in the 1960s, and almost all of the interviewees were born in the 19th century. Spoken data is also represented by the ArkiSyn Database of Finnish Conversational Discourse, but most of my research has focused on written language from different periods. I’ve made use of newspaper, journal, and news material through the Finnish Text Collection, the Newspaper and Periodical Corpus of the National Library of Finland (also in Swedish), and Yle Finnish News Archive.

The National Library’s collection of newspapers and periodicals reaches the 19th century, and I have gone deeper into history with the corpora of Old Literary Finnish and Early Modern Finnish. In our quantitative research on transitivity, we have also used the Suomi24 Sentences Corpus. However, given the range of topics and opinions it contains, it’s not a corpus I would necessarily choose for manual, close-reading analysis.

Browsing the resources in the Language Bank feels like stepping into a candy shop for a linguist like me. So many of the corpora are truly tempting and evoke curiosity and new research ideas. Even though I’m not a text researcher by definition, my methods make me appreciate having access to broader context in the data. A close reading of individual occurrences often benefits from seeing what comes before and after them in the text.

Publications

Ivaska, I., & Ojutkangas, K. (2025). Suomen transitiiviset verbit ja verbien transitiivisuus: kvantitatiivinen tutkimus. Virittäjä 129(1), 4–30. https://doi.org/10.23982/vir.146123

Jaakola, M. & Ojutkangas, K. (2023). Readymade grammar: Why are Finnish postpositions an open class? In M. Jaakola and T. Onikki-Rantajääskö (eds.), The Finnish Case System: Cognitive Linguistic Perspectives, 325–354. Helsinki: Suomalaisen Kirjallisuuden Seura. https://doi.org/10.21435/sflin.23

Luodonpää-Manni, M. & Ojutkangas, K. (2020). Laadullinen aineistopohjainen kielentutkimus. In M. Luodonpää-Manni, M. Hamunen, R. Konstenius, M. Miestamo, U. Nikanne and K. Sinnemäki (eds.) Kielentutkimuksen menetelmiä I–IV, 412–441. Helsinki: Suomalaisen Kirjallisuuden Seura. https://doi.org/10.21435/skst.1457

Nummila, K.-M., & Ojutkangas, K. (2013). Pyytämättä ja yllätyksenä. Paratagmakonstruktiot 1500–1800‐luvun kirjasuomessa. Sananjalka 55, 73–99. https://doi.org/10.30673/sja.86722

Ojutkangas, K. (2017). Suomen mukana ja mukaan seuralaisuussuhteen ilmaisijoina: kiintopisteen ilmaisukeinot, konstruktiot ja osallistujien symmetriaero. Virittäjä 121(2), 176–212. https://doi.org/10.23982/vir.58707

Ojutkangas, K. (2023). Dynamic local cases in use. Expressing directional events in Finnish. In M. Jaakola and T. Onikki-Rantajääskö (eds.), The Finnish Case System: Cognitive Linguistic Perspectives, 299–324. Helsinki: Suomalaisen Kirjallisuuden Seura. https://doi.org/10.21435/sflin.23

Corpora

Links

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Suomeksi

Researcher of the Month: Dejan Porjazovski

Dejan Porjazovski
Photo: Taru Tanhuanpää

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Dejan Porjazovski tells us about his research on systems for spoken language understanding (SLU).

Who are you?

I am Dejan Porjazovski. I came to Finland back in 2018 to pursue a Master’s degree in Machine Learning, Data Science, and Artificial Intelligence at Aalto University. My interest in language technologies led me to join the Automatic Speech Recognition Group at Aalto, first as a summer intern and later as a Master’s thesis worker and a PhD student. I defended my PhD thesis in May 2025.

After graduation, I joined Aivot Labs as a Machine Learning Engineer, where I work with speech-to-text, text-to-speech, and large language models (LLMs) to build Finnish conversational agents, used in the medical domain.

What is your research topic?

My doctoral thesis focused on spoken language understanding for low-resource languages. Spoken language understanding is an umbrella term that covers different speech and language technologies that allow computers to comprehend human speech.

In the thesis, I explored different speech embedding methods, how the amount of data affects their performance, and whether they have language-agnostic capabilities (which is very important for low-resource languages).

Furthermore, I compared two paradigms for building spoken language understanding systems: cascading and end-to-end (E2E). The E2E models require a large amount of data to learn the task. The cascading systems, while more data efficient, come with a higher complexity. To this end, I compared the E2E and cascading systems on various SLU tasks, such as named entity recognition and topic identification, focusing predominantly on Finnish but also on other languages.

The last area of my research relates to out-of-distribution generalisation of E2E spoken language understanding models. As the hands-free interaction devices become more popular, it is important for the systems to perform reliably when presented with data not seen during training.

How is your research related to Kielipankki – the Language Bank of Finland?

As part of the research, I used the Aalto Finnish Parliament ASR Corpus 2008-2020 to develop cascading and end-to-end named entity recognition models for spoken Finnish.

I was also involved in the collection of the Donate Speech datasets (puhelahjat). The corpora include over 3000 hours of speech, annotated with various metadata, such as age, gender, and topic. I used the corpora to develop a topic identification system for spontaneous Finnish speech, as well as models for extracting metadata from speech. During this period, I was part of the LAREINA project.

Selected publications

Porjazovski, D., Grósz, T., & Kurimo, M. (2024). From raw speech to fixed representations: A comprehensive evaluation of speech embedding techniques. IEEE/ACM Transactions on Audio, Speech, and Language Processing. DOI: 10.1109/TASLP.2024.3426301

Porjazovski, D., Grósz, T., & Kurimo, M. (2023, September). Topic identification for spontaneous speech: Enriching audio features with embedded linguistic information. In 2023 31st European Signal Processing Conference (EUSIPCO) (pp. 396-400). IEEE. DOI: 10.23919/EUSIPCO58844.2023.10289822

Moisio, A., Porjazovski, D., Rouhe, A., Getman, Y., Virkkunen, A., AlGhezi, R., … & Kurimo, M. (2023). Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks. Language Resources and Evaluation, 57(3), 1295-1327. DOI: 10.1007/s10579-022-09606-3

Porjazovski, D., Leinonen, J., & Kurimo, M. (2021, August). Attention-based end-to-end named entity recognition from speech. In International Conference on Text, Speech, and Dialogue (pp. 469-480). Cham: Springer International Publishing. DOI: 10.1007/978-3-030-83527-9_40

Porjazovski, D., Leinonen, J., & Kurimo, M. (2020, October). Named entity recognition for spoken finnish. In Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery (pp. 25-29). DOI: 10.1145/3422839.3423066

Corpora

Links

 
The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Suomeksi

Researcher of the Month: Inka Rantakallio

Inka Rantakallio
Photo: AJ Savolainen

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Inka Rantakallio tells us about her research on Finnish female and non-binary rap artists.

Who are you?

I am Inka Rantakallio, PhD, a researcher and lecturer in musicology at the University of Helsinki. Until the end of July 2025, I worked on the Suoni research association’s ”Music Scholars in Society” project. I am the editor-in-chief of Mene ja tiedä, an online magazine published by the Young Academy Finland, and one of the three editors-in-chief of Musiikki journal.

What is your research topic?

From 2021 to 2024, I was funded by the Research Council of Finland as a postdoctoral researcher. My project focused on Finnish female and non-binary rap artists and themes of gender, feminism, race, and whiteness. I was interested in how feminism, gender, and race/ethnicity affect artist identity and artistic expression, and how the norm of whiteness affects Finnish rap music. My interest stemmed strongly from my own background, as I have worked as a music journalist and DJ alongside my research career and have thus become acquainted with and performed alongside several female and non-binary rappers.

My research data consisted of music and music videos, participatory observation at concerts, and artist interviews. My project produced information on how female and non-binary rappers are carving out space in the rather heterosexist and male-dominated hip hop genre, and how white and non-white artists negotiate race and gender norms in relation to Finnish and international hip hop culture. My project also brought visibility to non-male artists and the norm of whiteness, which had previously received very little attention in hip hop research. I also critically reflected on my own position as an ”insider” in my research articles published on the project.

How is your research related to Kielipankki – the Language Bank of Finland?

This Research Council of Finland project was the first project focusing on Finnish female and non-binary rappers, so I wanted to deposit the interviews produced in the project for possible future research projects. The Language Bank offers reliable long-term storage for interview transcripts.

Selected publications

Rantakallio, Inka (2021). Femcees Finland, NiceRap ja vastatilojen voima: Suomiräpin naisten vertaisverkostojen historiaa. Etnomusikologian Vuosikirja 33: 67–93. DOI: https://doi.org/10.23985/evk.103019

Rantakallio, Inka (2023) Who Is Heard and Who Gets to Belong in Hip-Hop? The Counterspaces of Women and Gender Minority Rappers in Finland. In P. Dale, P. Burnard, & R. Travis (eds.), Music for Inclusion and Healing in Schools and Beyond: Hip Hop, Techno, Grime, and More. Oxford: Oxford University Press, 356–382.

Rantakallio, Inka (2025). Researcher as Minority and Majority: Hip Hop Feminist Epistemologies. In K. Ramstedt, S. Välimäki, K. Ahlsved, S. Mononen (eds.), Music, Research and Activism: Prospects and Projects in Northern Europe. Bristol: Intellect, 17–28.

Rantakallio, Inka & Andrea Dankić (2025). Ethnography and Researcher Positionality – Reflections on Feminist Fieldwork in Hip Hop Scenes in Sweden and Finland. IASPM@journal 15(1): 133–150. DOI: 10.5429/2079.387(2025)v15i1.9en

Rantakallio, Inka (2025). ‘Being a woman is the only thing considered questionable. But not the whiteness.’ Gender and race in normatively white hip hop scenes. Global Hip Hop Studies 6(1): 21–41. DOI: 10.1386/ghhs_00101_1

Corpus

 
The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.
 

Suomeksi

Researcher of the Month: Idastiina Valtasalmi

Idastiina Valtasalmi
Photo: Juhani Valtasalmi

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Idastiina Valtasalmi tells us about her research on linguistic affect, i.e. the expression of emotions and attitudes in language. The topic stemmed from her earlier research on Easy Finnish.

Who are you?

I am Idastiina Valtasalmi and I am a postdoctoral researcher of Finnish language at Tampere University. Currently, I research linguistic affect and bias-free language in a project called Tampere in emotions, funded by Kone Foundation. Previously, I have completed my doctoral thesis, which examines the vocabulary of Easy Language, and especially Easy Finnish, from text and user perspectives. Easy Language is a simplified form of language that varies by situation. It can be used to remove barriers of accessibility in communication.

What is your research topic?

I study linguistic affect, i.e. the expression of emotions and attitudes in language. My research focuses on bias-free language and referring to different groups of people in an inclusive, appreciative and respectful manner. Biased expressions are the flip side of my research, because non-discrimination is examined in relation to discrimination. Language is also constantly changing, and expressions previously considered as neutral can become discriminatory in tone over time. Bias-free Finnish is an important research topic, as it has not gained much attention in research, and English has been used as a model for practical writing guidelines. However, the structures of Finnish and English are different, and the guidelines for English are not suitable to be used in Finnish.

My current research topics are related to my earlier research, as I have already observed bias-free expressions used in Easy Finnish texts in my doctoral thesis. It can be said that my current research topics partly stemmed from the research results of my doctoral thesis.

How is your research related to Kielipankki – the Language Bank of Finland?

I have used corpus research, questionnaires and linguistic tests as research methods. The corpora available via the Language Bank of Finland have been valuable, as they are of high quality and easily accessible. The corpora have also been useful, for example, in preparing questionnaires and tests, for which I have selected words based on their frequency. Of all the corpora of the Language Bank, I am particularly fond of resources which contain Easy Finnish texts of current affairs and news: Leija, Selkosanomat/Selkouutiset and Easy Finnish texts from the Yle News Archive. I find that such extensive collections of Easy Language texts are relatively rare, even worldwide. The corpora therefore provide Easy Finnish researchers with excellent opportunities for text-based research.

Selected publications

Valtasalmi, Idastiina – Siltaloppi, Satu – Wacklin, Vilma – Mustanoja, Liisa 2025: Kymmenen havaintoa syrjimättömästä kielestä. – Kaisa Jänis & Iiris Salminen (toim.), Kieli ja kirjallisuus muuttuvassa yhteiskunnassa s. 99–129. Äidinkielen opettajain liiton vuosikirja 2025. Äidinkielen opettajain liitto.

Valtasalmi, Idastiina 2024: Teksti- ja käyttäjänäkökulmia selkokielen sanastoon. Tampere: Tampereen yliopisto. https://urn.fi/URN:ISBN:978-952-03-3538-0.

Valtasalmi, Idastiina 2023: Essiivin funktiot ja käyttö perustason selkokielessä. Virittäjä, 127(1), s. 4–27. https://doi.org/10.23982/vir.111948.

Corpora

Links

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Suomeksi

Researcher of the Month: Rea Peltola

Rea Peltola
Photo: Jocelyn Parot

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Rea Peltola tells us about her research on the semantics of animacy.

Who are you?

I am Rea Peltola, Professor of Finnish language and culture and Head of the Department of Nordic Studies at the University of Caen in Normandy, France. I am a member of the CRISCO research group. I am also a docent of the Finnish language at the University of Helsinki.

What is your research topic?

My roots are in the study of modal meaning structures, particularly in what is known as post-modality. It investigates the fading of modal meanings, or rather their intersubjective reorientation. Gradually, studying the expressions of permission and ability brought me to reflect on the semantics of animacy. I became interested in how grammar describes the characteristics of living beings, especially in terms of embodiment. For more than ten years now, I have been studying how human language deals with being an animal: How do we talk about other animals and their bodily experiences? How is human language used when interacting with another animal?

How is your research related to Kielipankki – the Language Bank of Finland?

In the thematic issue on interspecies pragmatics, co-edited with Mika Simonen, I studied reported animal inner speech in dialect data. At the time, I went through all the interviews in the Eastern dialects of the Finnish Dialect Corpus of the Syntax Archive, and collected passages where the speakers verbalised the thoughts of other species (e.g., se kuuloo hirvi että tuolla se mennöö se vihamies ’the moose can hear, like there he goes, the enemy’, Suomussalmi). These occurred especially when talking about hunting practices or work with another animal (usually a dog or horse). In general, these passages described how animals reasoned based on their sensory perception.

With Outi Duvallon, we have analysed the Language Bank of Finland’s corpora to investigate the modal meaning and use of the Finnish sai kuin saikin (’V1 kuin V1=kin’) type of reduplicative construction in text corpora from different time periods, particularly in Early Modern Finnish, as well as contemporary language use in the Yle News Archive and the Suomi24 online discussion. We noticed, for example, that the epistemic use of the construction that was at the time perhaps still in formation (e.g. tässä minä sinun nuorin poikasi olen, kuin olenki ’here I am, your youngest son, as sure as I stand’, Salmelainen 1863), already present in older material, has gained new momentum in the chain-like structures of online conversations. The reduplicative construction can be used to align with the position of the other participant expressed in an earlier message (e.g., laulajalla on kuin onkin loistava ääni ’the vocalist indeed has an excellent voice’, Suomi24).

Recently, Arnaud Godet and I have been investigating Finnish modal verbs that express rather specific abilities related to the circumstances, such as tarjeta (’to not to be cold’), jaksaa (’to have enough energy’), malttaa (’to have patience’) and raaskia (’to have the heart to do something’). We analysed their grammar and uses in the classics of literature, in the journals of the 1990s and 2000s, and in the Finnish Dialect Corpus of the Syntax Archive and Digital Morphology Archives. We also obtained a small amount of data from the Arkisyn conversations. We compared the selected verbs in terms of complement constructions, person reference and negative affinity. Our aim was to shed light on their shared force dynamics structure underlying their meanings and, on the other hand, to understand the mutual relationships and division of labour between these verbs.

Selected publications

Duvallon, Outi & Peltola, Rea. 2025. La construction réduplicative finnoise V1 kuin V1=kin : une ressource modale et discursive. Études finno-ougriennes. In press.

Peltola, Rea. 2023. Verbalizing animal inner speech. Journal of Pragmatics 217, 109–122. DOI: 10.1016/j.pragma.2023.09.005.

Peltola, Rea. 2021. Unfolding constructions: Postmodal auxiliaries in mirative complement patterns. Teoksessa Hilpert, Martin & Cappelle, Bert & Depraetere, Ilse (toim.), Modality and Diachronic Construction Grammar, 149–184. Amsterdam: John Benjamins. DOI: 10.1075/cal.32.06pel.

Corpora

Links

 
The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Suomeksi

Researcher of the Month: Jörg Tiedemann

Jörg Tiedemann
Photo: Linda Tammisto

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Jörg Tiedemann tells us about his work with resource development and OPUS, the World’s largest collection of openly available parallel translation datasets with a wide language coverage.

Who are you?

My name is Jörg Tiedemann and I am leading the language technology research group at the University of Helsinki. We are part of the Department of Digital Humanities and our students have a study track in the BA in Languages and the MA on Linguistic Diversity and Digital Humanities. My own background is in computer science from my undergraduate studies in Germany and computational linguistics from my doctoral studies in Uppsala, Sweden. The appointment as professor in language technology in Helsinki started in 2015 and since then I enjoy the multidisciplinary environment in our team.

What is your research topic?

My main research interests are connected with multilingual natural language processing from various perspectives. A lot of my work has been devoted to application-oriented research in particular in the field of machine translation (MT). Resource development has been a big part of my life and, already during my PhD, a lot of my time went into the collection and alignment of large, multilingual parallel corpora. For more than two decades, I have maintained OPUS, the World’s largest collection of openly available parallel translation datasets with a wide language coverage. This collection has been a main source for the development of translation technology world-wide and its language coverage is unique and invaluable for research on inclusive NLP.

In recent years, we pushed our efforts into the extension of the OPUS ecosystem to cover all aspects of MT development from data to tools and deployment. Pre-trained translation models are available from OPUS-MT, software packages are released for data manipulation, training, distilling, deploying and evaluating models. Web interfaces, applications, professional translation toolkits such as OPUS-CAT and dashboards support research, development and use, and our resources belong to the most popular ones on the Hugging Face model and data hub.

Another line of research is related to basic research on multilingual and cross-lingual NLP. The ERC project FoTran focused on representation learning with massively multilingual data and we investigated transfer learning capabilities, modularity and interpretability of large neural translation models. We also looked at uncertainty modeling in another research project and currently focus, among other things, on efficiency of NLP in order to reduce the ever-growing carbon footprint of language technology (within the GreenNLP project).

Finally, our research group is also devoting time to the development of large language models as part of the European project HPLT and OpenEuroLLM. Our contribution to those projects is mostly connected to multilinguality and evaluation, two very important and challenging topics in the field as a whole. Our goal is to push support for otherwise under-represented languages and also to improve multilingual evaluation and to reduce the effect of so-called “hallucinations” of generative AI.

How is your research related to Kielipankki – the Language Bank of Finland?

Most of our research is data-intensive and heavily depends on data collections, empirical evaluation and iterative training of models with compute-heavy machine learning. Language resources are essential in this process and we are both, providers and users, of the Language Bank of Finland. Even though most of our work is focused on machine learning and model development, we are also very much interested in making our resources available for research in the humanities. Many of the datasets we curate are directly interesting for linguistic research or, for example, in translation studies. Similarly, linguistic resources are essential for training, tuning and evaluating neural language models. Furthermore, such language models become essential tools in humanities as well and their influence will steadily grow also in linguistic studies, social sciences and various fields of traditional humanities.

Selected publications

Tiedemann, J., Aulamo, M., Bakshandaeva, D. et al. 2024. Democratizing neural machine translation with OPUS-MT. In Lang Resources & Evaluation 58, 713–755 (2024). https://doi.org/10.1007/s10579-023-09704-w

Mikko Aulamo, Nikolay Bogoychev, Shaoxiong Ji, Graeme Nail, Gema Ramírez-Sánchez, Jörg Tiedemann, Jelmer van der Linde, and Jaume Zaragoza. 2023. HPLT: High Performance Language Technologies. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 517–518, Tampere, Finland. European Association for Machine Translation. https://aclanthology.org/2023.eamt-1.61/

Jörg Tiedemann and Ona de Gibert. 2023. The OPUS-MT Dashboard – A Toolkit for a Systematic Evaluation of Open Machine Translation Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 315–327, Toronto, Canada. Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/2023.acl-demo.30

Tiedemann, J 2022, From open parallel corpora to public translation tools: The success story of OPUS. In E Volodina, D Dannélls, A Berdicevskis, M Forsberg & S Virk (eds.), LIVE and LEARN : Festschrift in honor of Lars Borin. Research Reports from the Department of Swedish, Multilingualism, Language Technology, Nro GU-ISS-2022-03, University of Göteborg, Göteborg, Sivut 133-138. http://hdl.handle.net/10138/351496

Resources

Projects

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

På svenska

Researcher of the Month: Daniela Piipponen

Daniela Piipponen
Photo: Therese Lindström Tiedemann

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Daniela Piipponen tells us about her research on historical linguistics and introduces the Digisvenska project.

Who are you?

I am Daniela Piipponen, a doctoral student in Scandinavian languages at the University of Helsinki.

What is your research topic?

Much of my own research concerns historical linguistics and the variety of Swedish used in Finland during the 19th and early 20th centuries, focusing on issues related to the standardisation of the written language. In my thesis, I investigate orthographic and morphological variation in the reading book Boken om vårt land (’The Book of Our Country’) by Zacharias Topelius in relation to the contemporary language norms.

In addition to my thesis, I have also researched (modern) Learner Swedish, and I have participated in the Digisvenska project (funded by the Swedish Cultural Foundation in Finland 2022–2024), a collaboration between the Faculty of Educational Sciences and the Faculty of Arts at the University of Helsinki (project leader Raili Hildén; the Faculty of Arts’ part was led by Therese Lindström Tiedemann). The overall aim of the project was to study fairness aspects in the B-Swedish matriculation examination (see also the project blog).

How is your research related to Kielipankki – the Language Bank of Finland?

In my research on language history, including parts of my thesis, I have often turned to the Language Bank’s collection of Newspaper and Periodical Corpus of the National Library of Finland to examine the language used in Swedish-language Finnish newspapers in the 19th century. The language of newspapers is a relatively standardised type of text that can be investigated over a longer period of time. In addition, there are possibilities for comparisons with the corresponding Swedish newspaper corpora maintained by Språkbanken Text in Gothenburg.

Within the Digisvenska project, we have also worked to develop two Learner Swedish corpora: Digisvenska corpus and Digisvenska Norm. Both corpora will also be available to other researchers via the Language Bank (however, use requires permission from the Matriculation Examination Board of Finland). The corpora are based on the performances of free writing from the digital matriculation examination in B-Swedish during eight test rounds between spring 2018 and autumn 2021. The Digisvenska corpus includes all written performances from the aforementioned test rounds, and contains a total of over 10 million tokens. Digisvenska Norm is a smaller subcorpus consisting of a total of 96 texts from two test rounds, where the texts have been manually normalized according to the norms of the standard language. The normalized corpus has been realized as a parallel corpus, allowing the normalized text to be compared with the original.

Within the project, we have used the corpora to investigate the linguistic breadth and accuracy of the texts and how these relate to the assessment. For example, together with Therese Lindström Tiedemann, I have analysed the verb conjugation in the material to see which tense forms are used at different skill levels, as well as whether the forms have been used according to the norms. I have also looked at the orthography and where it causes problems. In this case, I was also able to use the Studentsvenska 79/80 corpus to compare the results with those of older Swedish matriculation examinations. Finally, we also hope to continue to develop and use the material in the future. We are investigating the possibility of funding for further research, and have also worked to add correction annotations to the normalized material to improve the analysis tools.
 

Publications

Piipponen, Daniela. 2025. ”Låt din penna vara sig sjelf trogen” Variation och norm i Zacharias Topelius läsebok Boken om vårt land, med fokus på ortografi och morfologi. Helsingfors universitet. PhD Thesis. http://urn.fi/URN:ISBN:978-952-84-1317-2

Piipponen, Daniela, Lindström Tiedemann, Therese & Axelson, Erik. 2024. Digisvenska-korpusen: en inlärarkorpus baserad på studentprovet i B-svenska. In Kolu m.fl (eds.): Svenskan i Finland 20, p. 140–154. http://urn.fi/URN:ISBN:978-952-61-5327-8

Piipponen, Daniela. 2023. Herrarne och damerna. Variationen i den plurala definita substantivböjningen i Sverige och i Finland på 1800-talet. In Språk och stil NF 33, p. 71–106. https://doi.org/10.61965/sos.33.2023.18946

Corpora

 
The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Suomeksi

Researcher of the Month: Pekka Posio

Pekka Posio
Photo: Maarit Kytöharju

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Pekka Posio tells us about a research project that explores the link between gender and language use in the Spanish-speaking world. The extensive CoLaGe Corpus compiled during the project will be available via the Language Bank.

Who are you?

I am Pekka Posio, Professor of Ibero-Romance Languages at the Faculty of Arts, University of Helsinki. I focus on Spanish and Portuguese, and examine sociolinguistics, pragmatics and language change and variation. Currently, I am the head of discipline for Portuguese, Galician and Basque languages.

I studied Romance Languages and General Linguistics at the University of Helsinki, where I obtained my PhD in 2012. The topic of my dissertation was the expression of subject pronouns in Spanish and Portuguese. During my post doc phase, I worked in Salamanca, Berlin, Cologne and Ghent, studying impersonal constructions in Spanish and Portuguese. I also worked for three years as a university lecturer in Spanish at Stockholm University before returning to Helsinki in 2019 as an associate professor. In 2024, I was appointed as a professor.

What is your research topic?

Currently, my research focuses on language and gender in the Spanish-speaking world and I lead the research project Gender, Society, and Language Use: Evidence from Mexico and Spain (2021-2025), funded by the Kone Foundation. Language and gender is a well-established area of research in the study of the English language and English linguistics, but has received less attention in Spanish studies.

In this project, we are particularly interested in the mechanisms that link society and gender to language use, and whether there are differences in the relationship between gender and language in different societies that use the same language. These questions will be approached through both sociolinguistics and social psychology. We have collected a wide range of data, including both spoken and transcribed language and socio-psychological data on our informants. By combining these data, we will be able to explore the links between language and gender in a completely new way and at the same time renew the concept of gender as a sociolinguistic variable. In addition to the traditional comparison of female and male speech, we use scalar variables such as speakers’ perceptions of their own masculinity and femininity, and gender-related attitudes and perceptions.

We study different phenomena of language use – for example, the prevalence of different grammatical persons and ways of interacting in speech – in two societies that share the same language but differ in terms of gender roles and norms. We collected data between 2022 and 2023 in Guadalajara, Mexico, and Valencia, Spain. The research data generated by this project will help to broaden and diversify our understanding of gender and its manifestations, particularly in the societies we studied.

The post doc researchers in this project are Gloria Uclés Ramada, Sven Kachel, Andrea Carcelén Guerrero and Fien de Latte. The project has also employed a number of students as data collectors, transcribers and coders in Finland, Spain, Mexico and Germany.

How is your research related to Kielipankki – the Language Bank of Finland?

We have produced a corpus called Corpus for the Study of Language and Gender in Mexico and Spain (CoLaGe), which contains 111 hours and over one million words of recorded and transcribed speech from 127 informants. The corpus is divided into a sub-corpus for Valencia (CoLaGe-V) and Guadalajara (CoLaGe-G), and a smaller CoLaGe-D(iversity) corpus collected in Guadalajara, with informants representing gender and/or sexual minorities. In collecting the data, we have tried to obtain data that are as comparable as possible, with speakers from two age groups (30-40 and 60-70) and two countries. The data include sociolinguistic interviews, role-plays simulating conflict situations and material elicited for phonetic research in which informants describe images they have seen.

In addition to comparability, the collection of data was guided by the need to make all the extensive material available to other researchers, which is why a great deal of attention has been paid to issues such as pseudonymisation. The majority of the speech material has also been recorded on studio equipment, which allows it to be used for phonetic analysis. The Language Bank of Finland has been a natural location for the CoLaGe corpus since its inception. The social psychology data from the project will also be made available to researchers via the Finnish Social Science Data Archive.

Selected publications from the project

Carcelen Guerrero, A., Posio, P., Kachel, S. & Uclés Ramada, G. (Accepted 2025). CoLaGe: Corpus for the study of language and gender in two varieties of Spanish. Corpora. https://researchportal.helsinki.fi/files/328418218/CoLaGe-accepted.pdf

Uclés Ramada, G., Kachel, S. & Posio, P., 2025. Conflict, gender, and amount of talk: Gender differences in Spanish role play data. Pragmatics and Society. DOI: 10.1075/ps.23144.ucl

Posio, P., Kachel, S., & Uclés Ramada, G. 2024. Morphosyntactic stereotypes of speakers with different genders and sexual orientations: an experimental investigation. Linguistics. DOI: 10.1515/ling-2022-0143

More publications from Pekka Posio: https://researchportal.helsinki.fi/en/persons/pekka-posio

Corpus

Corpus for the Study of Language and Gender in Mexico and Spain (CoLaGe)
 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Suomeksi

Researcher of the Month: Simo Määttä

Simo Määttä
Photo: Veikko Somerpuro

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Simo Määttä tells us about his research that is based on sociological translation studies, critical sociolinguistics and critical discourse studies.

Who are you?

I am Simo Määttä, Assistant Professor of Translation Studies at the Faculty of Arts, University of Helsinki. I am Head of the Translation Studies Research Community TRAST and hold a title of docent in French Studies. I teach in the Master’s programme in Translation and Interpreting at the University of Helsinki. I am Chair of the Board of the Register of Legal Interpreters.

I received my PhD from the University of California, Berkeley in 2004 and have since worked at several universities in Finland, and since 2014 at the University of Helsinki.

What is your research topic?

My research is based on sociological translation studies, critical sociolinguistics and critical discourse studies. I am interested in how language use and other interactions are represented and what meanings are given to linguistic interactions – especially multilingual communication and linguistic variation.

One of my main research interests is public service (or community) and legal interpreting. In this field, I examine language ideologies, accuracy of interpreting, multimodality, the agency of participants in the interpreter-mediated encounter, the expression of empathy and the realisation of linguistic rights. In particular, I have studied lingua franca interpreting, where both the interpreter and the client speaking a foreign language communicate in a language that is not their first language. This is common, for example, when an asylum seeker, migrant or foreign national suspected of or victim of a crime communicates with an interpreter in French or English.

I lead the Translation, Immigration and Democracy project (2022-2025) funded by the Kone Foundation, where our research team analyses translation policies and practices in multilingual communication targeted to migrant populations. The research focuses on organisations (e.g. municipalities, organisations, companies, universities, media) operating in the Helsinki metropolitan area (Helsinki, Espoo and Vantaa) and in Tallinn. The project combines theories and methods of functionalist and sociological translation studies and critical linguistics.

The project is founded on the idea that multilingualism constitutes not only an opportunity for democracy, but also a challenge: the language barrier prevents migrants from participating in social, cultural and political life and from becoming full members of their local community and society. Translation aims to promote migrants’ access to information and participation, but it does not reach all migrants. The project approaches translation as a practice of governmentality, through which power is exercised and produced. One of the objectives is to propose new solutions, together with different actors, to improve the quality of translation policies and practices.

I am also involved in the EU Horizon-funded project ARENAS (Analysis of and Responses to Extremist Narratives), coordinated by Professor Julien Longhi (Cergy Paris Université), in which our international, multidisciplinary consortium analyses the extremist narratives affecting and threatening European political and social life. We explore the nature of extremist narratives and seek to understand them, in particular those concerning science, gender and the Nation. By understanding how these narratives work, we aim to find ways to counter extremist narratives and thus contribute to the harmonious development of Europe.

Within the ARENAS project, I am involved in a work package related to the circulation of extremist narratives, coordinated by historian Steven Forti from the Autonomous University of Barcelona. The ARENAS team in Helsinki is led by Dr. Katalin Miklóssy, Jean Monet Professor and Associate Professor of Political History. I am responsible for a task of qualitative research on how extremist narratives circulate between political discourse, traditional media and new media. The qualitative data for the study is selected on the basis of the quantitative data produced and analysed in the other tasks of the work package.

I also analyse the theory of discourse, ideology (especially language ideology), performativity, and hate speech. My previous research has focused on the translation of sociolinguistic variation in literature and language policies related to regional and minority languages.

How is your research related to Kielipankki – the Language Bank of Finland?

In the part of the ARENAS project that I am responsible for, we use the corpora available in the Language Bank on speeches made in the Finnish Parliament, especially in plenary sessions. These data have allowed us to see exactly how the topics discussed in traditional and new media correspond to the political debate in Parliament. In addition, our research has made use of the ParlaMint corpus and a corpus compiled for the ARENAS project, which consists of social media posts by politicians in different countries.

I also used the Suomi24 corpus from the Language Bank in a study co-authored with Yrjö Lauranto to examine how online discussants express dissenting and sympathetic opinions about gender and sexual minorities. We also used Suomi24 data in articles written with Ulla Tuomarla and Karita Suomalainen in Finnish and English, analysing discussions on immigration.

Selected publications

Määttä, S. & Kinnunen, T. 2024. The Interplay between Linguistic and Non-verbal Communication in an Interpreter-mediated Main Hearing of a Victim’s Testimony. Multilingua: Journal of Cross-Cultural and Interlanguage Communication 43(3), 299–330. DOI: 10.1515/multi-2023-0153

Määttä, S., Kinnunen, T., Kuusi, P. & Probirskaja, S. 2024. Kohderyhmätietous monikielisen kriisiviestinnän asiantuntijatyössä koronapandemian aikana. Työelämän tutkimus 22(4), 555–587. https://journal.fi/tyoelamantutkimus/article/view/142675

Määttä, S. 2023. Linguistic and Discursive Properties of Hate Speech and Speech Facilitating the Expression of Hatred: Evidence from Finnish and French Online Discussion Boards. Internet Pragmatics 6(2), 156–172. DOI: 10.1075/ip.00094.maa

Määttä, S. & Wiklund, M. 2023. Resolving Comprehension Problems in a Telephone-interpreted Screening Interview. Teoksessa: E. de Boe, J. Vranjes & H. Salaets (toim.) Interactional Dynamics in Remote Interpreting: Micro-analytical Approaches. New York: Routledge, 42–65. https://www.routledge.com/Interactional-Dynamics-in-Remote-Interpreting-Micro-analytical-Approaches/Boe-Vranjes-Salaets/p/book/9781032213286

Määttä, S. & Hall, M. 2022. Ideology and Discourse: Convergent and Divergent Developments. Teoksessa: S. Määttä & M. Hall (toim.) Mapping Ideology in Discourse Studies. Boston & Berlin: De Gruyter Mouton, 1–20. DOI: 10.1515/9781501513602-001

Määttä, S. & Lauranto, Y. 2022. Eriävän ja myötämielisen mielipiteen esittäminen sukupuoli- ja seksuaalivähemmistöjä koskevissa Suomi24-keskusteluissa. Virittäjä 126(2), 205–230. https://journal.fi/virittaja/article/view/100240

Määttä, S., Puumala, E. & Ylikomi, R. 2021. Linguistic, Psychological, and Epistemic Vulnerability in Asylum Procedures: An Interdisciplinary Approach. Discourse Studies 23(1), 46–66. DOI: 10.1177/1461445620942909

Määttä, S., Suomalainen, K. & Tuomarla, U. 2021. Everyday Discourse as a Space of Citizenship: The Linguistic Construction of In-groups and Out-groups in Online Discussion Boards. Citizenship Studies 25(6), 773–790. DOI: 10.1080/13621025.2021.1968715

Vernet, S. & Määttä, S. 2021. Modalités syntaxiques et argumentatives du discours homophobe en ligne : chroniques de la haine ordinaire. Mots – Les langages du politique 125, 35–51. https://journals.openedition.org/mots/27943

Määttä, S., Suomalainen, K. & Tuomarla, U. 2020. Maahanmuuttovastaisen ideologian ja ryhmäidentiteetin rakentuminen Suomi24-keskustelussa. Virittäjä 124(2), 190–216. https://journal.fi/virittaja/article/view/81931

Corpora

More information

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Suomeksi

Researcher of the Month: Marko Jouste

Marko Jouste
Photo: Sigga-Marja Magga

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Marko Jouste tells us about his research on Sámi culture and about his work with the Giellagas Corpus of Spoken Sámi languages.

Who are you?

I am Marko Jouste, a university lecturer and associate professor (Title of docent) specializing in Sámi culture at the Giellagas Institute for Sámi Studies, University of Oulu. Since the early 2010s, I have been an active member of the Giellagas Institute, where my research focuses on various aspects of Sámi culture, including music, history, and heritage. Additionally, I serve as the main developer of the Sámi Cultural Archive, located within the institute. Beyond my academic duties, I also work as a musician, performing with music groups such as Ulla Pirttijärvi & Ulda and Suõmmkar.

What is your research topic?

My research focuses primarily on Sámi music, culture and history, with a particular emphasis on historical audio recordings. Currently, I am leading several active research projects, including The Northern Sámi Fairy Tale Book 1956 – Returning Historical Archive Material to the Community and Developing Ethical and Legal Practices for Open Access (funded by the Kone Foundation), and Skolt Sámi Dance: The Transformative Journey of Tradition, Resilience, and the Arctic Quadrille, a collaborative project with dance researcher Petri Hoppu (funded by the Jenny and Antti Wihuri Foundation). Another project is Jaakko Sverloff’s Life Story – From Suonikylä in Petsamo Through the World Wars to a Leader of the Skolt Sámi (also supported by the Jenny and Antti Wihuri Foundation).

Additionally, I have contributed to the Research Council of Finland’s key project, the Skolt Sámi Memory Bank. This pilot project, operational between 2016 and 2018, focused on the management and cultural revitalization of Skolt Sámi music, language, and cultural materials preserved in sound archives in Finland. Through these projects, I aim to promote community engagement, advance ethical practices in archival work, and contribute to the revitalization and preservation of Sámi cultural heritage.

How is your research related to Kielipankki?

Kielipankki – The Language Bank of Finland is closely connected to my research through its integration with archival work. Since the 2010s, the Sámi Cultural Archive has collaborated with the Language Bank to develop Sámi language materials for broader use in both academic research and Sámi language communities. The Giellagas Institute’s Corpus of Spoken Sámi Languages currently includes three Sámi languages spoken in Finland: Northern Sámi, Aanaar (Inari) Saami, and Skolt Saami. Notably, the first sub corpus added to the Language Bank was the Northern Sámi sample corpus. In spring 2025, this collection will be expanded with the inclusion of the Aanaar Saami spoken language corpus.

The FIN-CLARIN consortium, which is the organization behind the Language Bank of Finland, has also provided funding for corpus development work at the Sámi Cultural Archive in 2014, 2019, and 2022. This collaboration significantly enhances the accessibility, preservation, and usability of Sámi language materials, aligning with my broader focus on Sámi culture and heritage. In my research, I extensively use language technology tools such as the Korp service, which facilitates the analysis and exploration of linguistic data, particularly in the context of Sámi languages.

Publications

Petri Hoppu & Marko Jouste (2025). Skolt Saami Dance: The Transformative Journey of Tradition, Resilience, and the Arctic Quadrille. London: Bloomsbury. [Painossa]

Jouste, Marko (2022) ”Skolt Saami Leuʹdd. Tradition as a medium of individual and collective remembrance”. The Sámi World. Edited by Sanna Valkonen, Áile Aikio, Saara Alakorva and Sigga-Marja Magga. London: Routledge, pp. 53–71.

Jouste, Marko & Mettovaara, Jukka & Morottaja, Petter & Partanen, Niko (2022). Archive Infrastructure and Spoken Language Corpora for Saami Languages in Finland. The 6th Digital Humanities in the Nordic and Baltic Countries 2022 Conference (DHNB 2022), Uppsala, Sweden, March 15-18, 2022. CEUR Workshop Proceedings. Aachen: RWTH Aachen University, pp. 269–278. https://ceur-ws.org/Vol-3232/paper25.pdf

Jouste, Marko & Lehtola, Veli-Pekka & Juutinen, Markus & Tanhua, Sonja (2022). ”Jääkk Sverloff johtajana ja kulttuuritulkkina – Kolttasaamelaisten historian käänteitä 1900-luvulla”. [Jääkk Sverloff as a Leader and a Cultural Interpreter – Turning points of Skolt Saami history in 20th century]. Suomen rajaseutujen kolonialismi. [Colonialism of Finnish Borderlands]. Toim. Rinna Kullaa, Janne Lahti ja Sami Lakomäki. Helsinki: Gaudeamus.

Jouste, Marko (2020). ”Suonikylän kolttasaamelainen itkuperinne 1900-luvulla”. [The Skolt Saami Lament Tradition of Suonikylä in the 20th Century]. Etnomusikologian vuosikirja Vol 32. Toim. Janne Mäkelä, Kaj Ahlsved, Viliina Silvonen. Helsinki: Suomen etnomusikologinen seura, pp. 10–45. https://doi.org/10.23985/evk.90118

Marko Jouste, Markus Juutinen, Eino Koponen (2020). ”Kolttasaamelaisen Näskk Moshnikoffin leuʹdd-kielen idiolekti ”. [The Idiolect of leuʹdd Language of Skolt Saami Näskk Moshnikoff]. Kulttuurintutkimus Vol 37, 1–2, pp. 32–56. Toim. Janne Saarikivi, Pirjo Kristiina Virtanen. Joensuu: Kulttuurintutkimuksen seura ry. https://journal.fi/kulttuurintutkimus/article/view/98099

Taarna Valtonen, Kati Kallio, Marko Jouste (2019). ”Olaus Sirman runojen vertailevaa luentaa -runojen poetiikka suhteessa suullisiin ja kirjallisiin lähikulttuureihin”. [Comparative Reading of Poems by Olaus Sirma. The Poetics of Poems in Relation to Oral and Literal Cultures Nearby]. Suomalais-Ugrilainen Seuran Aikakauskirja 97. Helsinki: Suomalais-Ugrilainen Seura, pp. 109–152. https://doi.org/10.33340/susa.75266

Marko Jouste, Markus Juutinen, Miika Lehtinen (2019): ”Isak Saba ja Paččjogas 1919:s čohkejuvvon nuortalaš leuʹddat. Isak Saba og de skoltesamiske leuʹddene som ble samlet inn i Paččjokk i 1919”. [Isak Saba and the Skolt Saami Leuʹdds Collected in Paččjogg in 1919]. Optegnelser. Isak Sabas folkeminnesamling. Čállosat. Isak Saba álbmotmuitočoakkáldat, Norsk Folkeminnelags skrifter 173 Oslo: Skandinavian Academic Press, pp. 283–301.

Jouste, Marko (2017). ”Áillohaš ja uuden joiun synty”. [Nils-Aslak Valkeapää and the Birth of the New Yoik]. Minä soin. Mun čuojan: Kirjoituksia Nils-Aslak Valkeapään elämäntyöstä. Toim. Valtonen, Taarna; Valkeapää, Leena. Rovaniemi: Lapland university press, pp. 233–258.

Marko Jouste (2011). Tullâčalmaaš kirdâččij ’tulisilmill lenteli’ – Inarinsaamelainen 1900-luvun alun musiikkikulttuuri paikallisen perinteen ja ympäröivien kulttuurien vuorovaikutuksessa. [The One Who Flew with the Fire eyes – The Musical Culture of the Aanar Sámi People in the Interaction of the Local Tradition and the Neighbouring Cultures]. Acta Universitatis Tamperensis 1650. Tampere: Tampere University Press. http://urn.fi/urn:isbn:978-951-44-8551-0

Corpora

More information

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Suomeksi

Researcher of the Month: Tamás Grósz

Tamás Grósz
Photo: Szabina Korbai

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Tamás Grósz tells us about his research on speech technology.

Who are you?

My name is Tamás Grósz and I am a Research Fellow in the Speech Recognition group at the Department of Information and Communications Engineering of Aalto University.

What is your research topic?

During my PhD years, my research was focused on Speech Technology, specifically on developing new deep-learning-based solutions for Automatic Speech Recognition (ASR). Although my main interest was acoustic modelling, I was also active in other areas. Paralinguistics, in particular, piqued my interest, and I worked on a wide variety of tasks. I regularly participated in the Interspeech ComParE challenges and won several times over the years. Perhaps the most notable of our systems is the one that automatically assesses the condition of patients suffering from Parkinson’s disease. Besides the competitions, I was also part of a project that concentrated on developing a speech-based solution for early detection of mild cognitive impairment. In the last years of my studies, my focus shifted towards silent speech interfaces. I had the pleasure of working with state-of-the-art prototypes and developing new systems that could generate speech from ultrasound tongue movement videos.

After graduation, I joined Mikko Kurimo’s lab as a postdoc, where I had an opportunity to work on other topics, including language modelling and AI explainability. Initially, I worked on subword-based language models for agglutinative languages like Hungarian and Finnish. While working with various models, I noticed the importance of curriculum learning. As a spin-off project, I have started investigating different ways of estimating the difficulties of training samples and constructing new curriculums for AI models.

Simultaneously, working on projects like Teflon, AASIS and Kielibuusti enabled me to learn more about children’s ASR, speech assessment and tools that can aid language learners. Our best models have been successfully integrated into a mobile application that can aid immigrants in learning the Finnish language.

In 2022, we developed a system that can recognize different kinds of stuttering (e.g. word/phrase repetition, prolongation, sound repetition and others) and won the INTERSPEECH 2022 Stefan Steidl Computational Paralinguistics Award. Later, we investigated how the emotional state of speakers can be recognized from non-verbal vocal expressions (such as laughter, cries, moans, and screams). Our system achieved first place for both tasks in the ACMMM CompParE competition. Since then, I have also worked on multimodal solutions for Emotion and Humor detection.

My current work mainly focuses on training and understanding Self-Supervised Foundation models as part of our Extreme-scale LUMI project and the LAREINA project. Explainable AI and model interpretation has been a long-term interest of mine, and with these new models and computational resources, I had the opportunity to explore new techniques. Recently, I have developed ways to find the relevant subspaces inside large foundation models and explore the concepts discovered by the models during pre-training, as well as understand the changes caused by the fine-tuning process. These techniques enabled us to better understand our models and guided us in designing new, better training algorithms.

How is your research related to Kielipankki?

As modern speech recognizers require a considerable amount of data, it became a priority to collect and annotate suitable corpora. In 2020, I joined the team creating the Donate Speech datasets (puhelahjat). This corpus, with its approximately 3200 hours of donated speech, enabled various other projects, including our FinW2V2 project at LUMI. Using this dataset and Aalto’s Finnish Parliament ASR Corpus 2008-2020, we have developed numerous Finnish ASR systems over the years.

Currently, I am also involved in the LAREINA project, building large speech foundation models and making them available for Industrial partners.

Recent publications

Getman, Y., Grósz, T., Hiovain-Asikainen, K. & Kurimo, M. (2024), Exploring adaptation techniques of large speech foundation models for low-resource ASR: a case study on northern Sámi, in Proc. of Interspeech. DOI: 10.21437/Interspeech.2024-479

Karakasidis, G., Kurimo, M., Bell, P. & Grósz, T. (2024), Comparison and analysis of new curriculum criteria for end-to-end ASR, Speech Communication p. 103113. DOI: 10.1016/j.specom.2024.103113

Moisio, A., Porjazovski, D., Rouhe, A., Getman, Y., Virkkunen, A., AlGhezi, R., Lennes, M., Grósz, T., Linden, K. & Kurimo, M. (2023), Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks, Language Resources and Evaluation 57(3), 1295–1327. DOI: 10.1007/s10579-022-09606-3

Phan, N., von Zansen, A., Kautonen, M., Grósz, T. & Kurimo, M. (2024), CaptainA a self-study mobile app for practising speaking, in Proc. of Interspeech. https://www.isca-archive.org/interspeech_2024/phan24b_interspeech.pdf

Virkkunen, A., Sarvas, M., Huang, G., Grósz, T. & Kurimo, M. (2024), Investigating the clusters discovered by pre-trained AV-Hubert, in Proc. of IEEE ICASSP 2024, pp. 11196–11200. DOI: 10.1109/icassp48485.2024.10447434

Getman, Y., Phan, N., Al-Ghezi, R., Voskoboinik, E., Singh, M., Grósz, T., Kurimo, M., Salvi, G., Svendsen, T., Strömbergsson, S. et al. (2023), Developing an AI-assisted low-resource spoken language learning app for children, in IEEE Access. DOI: 10.1109/access.2023.3304274

Grósz, T., Getman, Y., Al-Ghezi, R., Rouhe, A. & Kurimo, M. (2023), Investigating wav2vec2 context representations and the effects of fine-tuning, a case-study of a Finnish model, in Proc. of Interspeech. DOI: 10.21437/interspeech.2023-837

Grósz, T., Virkkunen, A., Porjazovski, D. & Kurimo, M. (2023), Discovering relevant sub-spaces of Bert, wav2vec 2.0, Electra and ViT embeddings for humor and mimicked emotion recognition with integrated gradients, in Proc. of the 4th Multimodal Sentiment Analysis Challenge and Workshop, pp. 27–34. DOI: 10.1145/3606039.3613102

Porjazovski, D., Getman, Y., Grósz, T. & Kurimo, M. (2023), Advancing audio emotion and intent recognition with large pre-trained models and Bayesian inference, in Proc. of the 31st ACM International Conference on Multimedia, pp. 9477–9481. DOI: 10.1145/3581783.3612848

Corpora

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Suomeksi

Researcher of the Month: Sofoklis Kakouros

Sofoklis Kakouros
Photo: Sofoklis Kakouros

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Sofoklis Kakouros tells us about his research on prosody and its associated phenomena.

Who are you?

I am Sofoklis Kakouros, a postdoctoral researcher with the Phonetics and Speech Synthesis Research Group in the Department of Digital Humanities at the University of Helsinki. Before joining this group, I held research positions at different universities across Finland and in the Netherlands, and I also worked in the industry as a speech scientist. My background centers on signal processing, cognitive science, and phonetics.

What is your research topic?

My research interests are rooted in speech and language, with a particular emphasis on understanding prosody and its associated phenomena. Prosody is about how we say something rather than what we say; it adds meaning beyond the words themselves. This includes elements like intonation and timing. Over the years, I have explored various aspects of prosody, focusing on information-theoretic processes within this domain. Overall, my work enhances our comprehension of how acoustic and linguistic variations are statistically organized into the prosody we perceive. For the past years, I have been working in the Research Council of Finland project titled ”Computational Modeling of Prosody in Speech”, aiming to understand the statistical organization in speech acoustics and its connections to prosodic dimensions such as prominence and emotions. This research has numerous applications, including the prosodic analysis of dialectal varieties and parliamentary speech.

How is your research related to Kielipankki?

To effectively analyze and train computational models for speech, an increasing amount of data is required. Kielipankki offers a diverse platform that provides access to the essential resources needed for my research, including materials for speech and language studies. In a recent project conducted by our group, I utilized the Finnish ASR corpus from Kielipankki to analyze recordings of Finnish parliamentary speeches.

Recent publications

Vainio, M., Suni, A., Šimko, J., and Kakouros, S. (2024). The Power of Prosody and Prosody of Power: An Acoustic Analysis of Finnish Parliamentary Speech. In Proceedings of the Conference of the Speech Prosody Special Interest Group (SProSIG) of the International Speech Communication Association – Speech Prosody (SpeechPro-2024), Leiden, The Netherlands, pp. 662–666. 10.21437/SpeechProsody.2024-134

Kakouros, S., Šimko, J., Vainio, M., and Suni, A. (2023). Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody. In Proceedings of the 12th ISCA Speech Synthesis Workshop (SSW-2023), Grenoble, France, pp. 127–133. 10.21437/SSW.2023-20

Kakouros, S. and O’Mahony, J. (2023). What does BERT learn about prosody? In R. Skarnitzl, & J. Volín (Eds.), Proceedings of the 20th International Congress of Phonetic Sciences (ICPhS-2023) (pp. 1454-1458). GUARANT International spol. s r.o.., Prague, Czechia. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2023/full_papers/622.pdf

Kakouros, S., Stafylakis, T., Mošner, L., and Burget, L. (2023). Speech-based emotion recognition with self-supervised models using attentive channel-wise correlations and label smoothing. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-2023), Rhodes, Greece, pp. 1–5. 10.1109/ICASSP49357.2023.10094673

Corpora

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Suomeksi

Researcher of the Month: Katri Hiovain-Asikainen

Katri Hiovain-Asikainen
Photo: Kai Lukander

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Katri Hiovain-Asikainen describes her research on spoken Sámi languages and speech synthesis.

Who are you?

I am Katri Hiovain-Asikainen, working my fourth year as a speech and language technologist in the Divvun group at the Arctic University of Norway. Our group develops language and speech technology applications especially for the Sámi languages, but also for other minority languages. I am responsible for the design and implementation of speech technology projects, in which collecting different types of audio data and building speech corpora for different Sámi languages is essential.

This year, our team has published the world’s first speech synthesis for Lule Sámi, and updated North Sámi speech synthesis to modern standards. In October, we also published the world’s first South Sámi synthesis. All the software and tools that we have developed are free and easily accessible to all.

My background is in linguistics and phonetics, and I received my PhD from the University of Helsinki in autumn 2023. The topic of my dissertation was the influence of the majority languages on the spoken North Sámi language. The aim of the research was to investigate the variations of prosodic features, such as quantity and intonation, in the regional spoken varieties of North Sámi, where the contacts with the majority languages (Finnish and Norwegian) are very close and multidimensional.

What is your research topic?

Currently, I focus on the development of speech synthesis and automatic speech recognition for three Sámi languages: North, Lule and South Sámi, all of which are official languages in Norway. There is a great need for speech technology applications in the Sámi-speaking communities, as written forms of the Sámi languages are relatively new, and not all Sámi speakers have had the opportunity to learn the written language in school in the same way as the speakers of the majority languages. Speech technology enables the oral use of minority languages in new contexts: for example, as a reading assistant at school, for learning the pronunciation, as an easy-to-use tool for dyslexic or visually impaired people, and in general, even for listening to the news instead of reading. Audio books and other spoken language content are also becoming more common, allowing you to listen to books while doing something else with your hands. Today, a smart home and smart loudspeakers speak Lule Sámi in a home where the language of the family is Lule Sámi. This strengthens the role of the language and supports the revitalisation of Sámi languages at a new level.

An automatic speech recognizer, on the other hand, enables different speech interfaces, for example in the car and at home, and of course on smart devices. It will soon be possible to dictate texts in Sámi languages and, for example, to produce automatic transcriptions for old archival recordings so that researchers can make better use of them. The possibilities are endless.

The focus of my research is strongly related to speech technology, and I am currently a visiting researcher in the Phonetics and Speech Synthesis Research Group at the University of Helsinki. In collaboration with other researchers in the group, we have been working on automatic dialect recognition, where the aim is to automatically identify the speaker’s dialect based, among other things, on various prosodic features. In addition, I am interested in different methods of speech synthesis evaluation, for example, how well the speech synthesis learns to produce complex and rare prosodic features such as quantity.

How is your research related to Kielipankki?

In the Divvun group we are currently preparing various Sámi speech corpora for publication via Kielipankki. There are Sámi archive recordings in different countries, but they are relatively scattered or not necessarily processed for publication, and transcriptions are not always available. We believe that making these existing materials more accessible would help many researchers and developers of speech technologies without making new recordings.

I have also gained access to a North Sámi speech corpus (Giellagas) in Kielipankki for research purposes, and the corpus has been very useful because of its versatility, especially in the study of automatic dialect recognition. Our aim at Divvun is to make similar corpora available as soon as possible. However, in the case of indigenous and minority languages, the publication of the corpora should be treated with caution, which we respect in our work.

Recent publications

Hiovain-Asikainen, K. (2023). Prosodic change and majority language influence in spoken North Sámi varieties. Helsingin yliopisto, Humanistinen tiedekunta, Digitaalisten ihmistieteiden osasto. Helsingin yliopisto. http://urn.fi/URN:ISBN:978-951-51-9406-0

Kakouros, S., & Hiovain-Asikainen, K. (2023). North Sámi dialect identification with self-supervised speech models. arXiv Preprint arXiv:2305.11864. In Proceedings of the 24th INTERSPEECH Conference (pp. 5306–5310). https://doi.org/10.48550/arXiv.2305.11864

Pirinen, F., Moshagen, S., & Hiovain-Asikainen, K. (2023, May). GiellaLT—a stable infrastructure for Nordic minority languages and beyond. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa) (pp. 643-649). https://aclanthology.org/2023.nodalida-1.63/

Hiovain-Asikainen, K., & de la Rosa, J. (2023). Developing TTS and ASR for Lule and North Sámi languages. In Proceedings of the 2nd Annual Meeting of the Special Interest Group on Under-resourced Languages (SIGUL). http://dx.doi.org/10.21437/SIGUL.2023-11

Corpora and Tools

  • Giellagas, Samples of Northern Saami
  • Borealium – tools for the small languages of the Nordic countries.

More information

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Suomeksi

Researcher of the Month: Elina Vaahensalo

Elina Vaahensalo
Photo: Elina Vaahensalo

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Elina Vaahensalo tells us about her research on confrontation and otherness in online discussions.

Who are you?

I am Elina Vaahensalo, doctoral researcher in Digital Culture at the Faculty of Humanities, University of Turku, in the Degree Programme in Digital Culture, Landscape and Cultural Heritage. In addition, at the beginning of October I will start as a researcher in the Academy project SoliPro (”Solidariteetit käytäntöön – Nuorten arkiyhteisöt tunnustuksen lähteenä ja ehkäisevän sosiaalityön areenana”), coordinated by the University of Tampere.

What is your research topic?

In my dissertation, I examine online discussion that produces otherness, especially from the perspective of anonymous Finnish-language online communities. I am interested in how confrontation, alienation and even violent hostility are constructed in Finnish-language online discussion cultures, and what different forms the concept of otherness takes in these cultures. Otherness is a fruitful conceptual starting point for research on online discussions because it can be used in a variety of ways to outline descriptions of community, group identities, and the sense of being an outsider or downgraded and different. In Finnish-language online discussions, otherness takes very different – and also contradictory – forms: the other can be an enemy who is violently and dehumanisingly opposed, but also a relatable fellow sufferer with whom one shares common, peer-based experiences of marginalisation.

In addition, my colleague Lilli Sihvonen and I have studied online cultures from the framework of media archaeology. In particular, we are interested in what happens when a cybercultural phenomenon or object – a meme that has gone viral or a social media platform – dies, and what kind of afterlife can be associated with it. Our interest is driven by the perception of the vulnerability of digital phenomena. In our view, online phenomena in Finnish, for example, are particularly vulnerable because they often do not spread globally and are therefore not stored very widely online. In storing Finnish-language online cultural phenomena, Kielipankki has therefore done a valuable job by depositing online discussions from both the Suomi24 forum and the Ylilauta forum.

In my research for the SoliPro project, I will continue my work on othering, but from an even more robust perspective of community and solidarity. My aim is to examine the descriptions of community, otherness and solidarity shared by young people on social media.

How is your research related to Kielipankki?

In my more recent research, I have used qualitative and ethnographic online discussion data that was collected by myself, but the Suomi24 data from Kielipankki also plays an important role for the beginning of my research career. In 2017, I started as a research assistant in the ”Citizen Mindscapes” consortium project, funded by the Research Council of Finland. The project, where I also wrote my Master’s thesis, was built around the Suomi24 data from Kielipankki. Already then, I developed the concept of othering online discourse, and tested its identification and quantitative measurement using the Suomi24 data. Experimenting with corpus-based research was quite a dive into the unknown for a cultural researcher such as myself. However, with all its challenges, it was a valuable lesson to see how working on Master’s thesis provides opportunities to try out different research tools – also outside one’s own comfort zone.

From time to time, I also teach digital culture students, and my teaching focuses on the tools and methods that can be used for conducting qualitative research on online discussions. I always encourage my students to use the online discussion corpora in Kielipankki, as they are unique collections of Finnish online culture, and they also prove that the language used online is worth saving and remembering.

Recent publications

Vaahensalo, E., & Sihvonen, L. (2022). Elävät, kuolleet ja elävät kuolleet keskustelufoorumit: verkkoyhteisöjen elämänvaiheet ja niiden tutkiminen. In R. Mähkä, M. Ahonen, N. Heikkilä, S. Ollitervo, & M. Räsänen (Eds.), Kulttuurihistorian tutkimusmenetelmät (pp. 411-429). Turun yliopisto.

Vaahensalo, E. (2022). ”Uuniin siitä” – Väkivaltainen ja toiseuttava verkkokeskustelu Ylilaudalla. Lähikuva – audiovisuaalisen kulttuurin tieteellinen julkaisu, 35(3), 29–44. https://doi.org/10.23994/lk.121893

Vaahensalo, E. (2022). Organisaatiot ja toiseuttava verkkokeskustelu. In H. Kantanen & M. Koskela (Eds.), Procomma Academic 2022: Poikkeuksellinen viestintä. ProCom – Viestinnän ammattilaiset ry. https://doi.org/10.31885/2022.00001

Vaahensalo, E. (2021). Samanlaista toiseuttamista, erilaisia toisia: Toiseuttavan verkkokeskustelun muodot anonyymeissä suomenkielisissä keskustelukulttuureissa. Media & Viestintä, 44(3), 1–29. https://doi.org/10.23983/mv.111507

Vaahensalo, E. (2021). Kontekstualisointimalli sosiaalisen median lähdekritiikin avaimena. Informaatiotutkimus, 40(3), 110–141. https://doi.org/10.23978/inf.107897

Vaahensalo, E. (2021). Creating the other in online interaction: Othering online discourse theory. In J. Bailey, A. Flynn, & N. Henry (Eds.), Handbook on technology-facilitated violence and abuse: International perspectives and experiences (pp. 227-246). Emerald Studies on Digital Crime, Technology & Social Harms. https://doi.org/10.1108/978-1-83982-848-520211016

Suominen, J., Saarikoski, P., & Vaahensalo, E. (2019). Digitaalisia kohtaamisia: Verkkokeskustelut BBS-purkeista sosiaaliseen mediaan. Helsinki: Gaudeamus.

Corpora

More information

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Suomeksi

Researcher of the Month: Aku Rouhe

Aku Rouhe
Photo: Jasmine Gustafsson

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Aku Rouhe tells us about his research on speech recognition. His current work includes, among other things, fine-tuning large language models that are optimized for Finnish and Nordic languages. These openly available LLMs have been created through successful academia-enterprise collaboration.

Who are you?

I am Aku Rouhe. For several years, I did research in the Aalto University Speech Recognition research group, and defended my doctoral thesis there this past February. After Aalto, I moved to Silo AI (now owned by AMD), where I work with large language models (LLMs) – I have moved from speech to text. My interest in language is also part of my free time in creative writing.

What is your research topic?

In my doctoral thesis, I compared end-to-end models with more traditional multi-model decomposed systems. In recent years, both the academia and commercial deployments in speech recognition have largely moved to end-to-end models. However, my work showed how multi-model decomposed systems remain a competitive alternative, for instance, in terms of recognition accuracy. Indeed, the main advantage of end-to-end models is probably their simplicity.

End-to-end models often require vast training resources. Thus, it was important for me to study end-to-end models applied to under-resourced languages as well.

My current work at Silo is on fine-tuning large language models such as Poro and Viking, which are models optimized for Finnish and Nordic language. These LLMs were developed in a collaborative research project between Silo and TurkuNLP.

How is your research related to Kielipankki?

End-to-end models hunger for data, so large corpora are needed. I was involved in compiling the Aalto Finnish Parliament ASR Corpus 2008-2020, which consists of Finnish Parliament plenary session recordings, and also in the Lahjoita Puhetta project, where volunteers donated their speech to produce the Puhelahjat corpus. I got to combine both of these large speech corpora in an article that was published when I was finalizing my PhD, at a time when I was involved with the LAREINA project. Nowadays, the Finnish speech recognition resources are respectable for a language spoken by so few.

Recent publications

Rouhe, A., Grósz, T., Kurimo, M. 2024. Principled Comparisons for End-to-End Speech Recognition: Attention vs Hybrid at the 1000-Hour Scale. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 623-638, 2024. doi: 0.1109/taslp.2023.3336517

Virkkunen, A., Rouhe, A., Phan, N. et al. 2023. Finnish parliament ASR corpus. Lang Resources & Evaluation 57, 1645–1670 (2023). doi: 10.1007/s10579-023-09650-7

Moisio, A., Porjazovski, D., Rouhe, A. et al. 2023. Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks. Lang Resources & Evaluation 57, 1295–1327 (2023). doi: 10.1007/s10579-022-09606-3

Rouhe, A., Virkkunen, A., Leinonen, J., Kurimo, M. 2022. Low Resource Comparison of Attention-based and Hybrid ASR Exploiting wav2vec 2.0. Proc. Interspeech 2022, 3543–3547,
doi: 10.21437/Interspeech.2022-11318

Corpora

More information

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Suomeksi

Researcher of the Month: Tuukka Törö

Tuukka Törö
Photo: Riina Kiianmies

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Tuukka Törö tells us about his research on Finnish speech synthesis. Neural network models, which are trained with large amounts of audio data from varied datasets, enable researchers to analyze speech in new ways.

Who are you?

I am Tuukka Törö. I have been working as a doctoral researcher at the University of Helsinki’s Phonetics and Speech Synthesis Research Group since the beginning of this year. My background is in linguistics and phonetics, and I hold a BA in English studies from the University of Malmö and an MA in Phonetics from the University of Helsinki. After writing my Master’s thesis on controlling speaking styles in speech synthesis, I spent some time working with YLE on AI radio projects where we created synthetic ‘actors’ for radio features.

In my current position, I work in the Academy of Finland funded project Predictive Processing Approach to Modelling Prosodic Hierarchy for Speech Synthesis. The project’s aim is to develop text-to-speech (TTS) synthesis inspired by the predictive processing theory of human cognition.

While my focus has become more technically inclined, the primary inspiration behind my work stems from a fascination with how social structures influence speech, from macro level variation to how people convey social dynamics in specific contexts.

What is your research topic?

Currently I am researching macro level language variation using neural-network models built for TTS and speech recognition. While the models’ original purpose is in technological applications, they enable us to analyze speech in new ways. As the models are trained with large amounts of audio, they can be used to model ’wild’ data of varying quality on a large scale instead of picking apart specific acoustic features from small, professionally recorded datasets.

Within the academy project, my aim is to tie together sociolinguistic variation with the predictive processing and speech synthesis side of things. Hopefully, in the coming years we will learn something new about how humans perceive social cues in speech and how high-level social predictions can be utilized to improve speech synthesis.

How is your research related to Kielipankki?

I often use corpora from Kielipankki such as Samples of Spoken Finnish (SKN), FinSyn (to be available in Kielipankki), and most of all Donate Speech (Lahjoita puhetta). In order to train speech synthesizers that we control on social variables – such as age, gender, and dialect – we need a large amount of audio data from people with a rich variety of backgrounds. With Finnish being a relatively small language, it is vital to have a concentrated effort for building large datasets like the Donate Speech corpus.

Recent publications

Törö, T., Suni, A. and Šimko, J. (2024). Analysis of regional variants in a vast corpus of Finnish spontaneous speech using a large-scale self-supervised model, Proceedings of Speech Prosody 2024, Leiden, Netherlands. DOI: 10.21437/SpeechProsody.2024-8

Šimko, J., Törö, T., Vainio M., and Suni, A. (2023). Prosody under control: Controlling prosody in text-to-speech synthesis by adjustments in latent reference space, Proceedings of the 18th International Congress of Phonetic Sciences, Prague, Czech Republic. http://hdl.handle.net/10138/565382

Other related work

Corpora

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Suomeksi

Researcher of the Month: Heidi Niva

Heidi Niva
Photo: Emmi Pollari

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Heidi Niva tells us about her research on Finnish grammatical phenomena and introduces a Vepsian-Finnish dictionary project. In a joint research, she also aims to evaluate the corpus of online discussions as a source for a language researcher.

Who are you?

I am Heidi Niva, a postdoc Finnish language researcher. I am currently a substitute lecturer of Finnish language and culture at the University of Helsinki. I am also actively involved in the LOST DOC collective, a community for postdoc language researchers.

What is your research topic?

Both in my dissertation and afterwards, grammatical phenomena have been in the focus of my research. Among other things, I have studied the structures that are used to express futurity in Finnish. Now I am involved in a joint project where we study the structures expressing avertivity, i.e. non-realization of events. I am also working in a project where we aim to compile a Vepsian-Finnish dictionary. Vepsian, also known as Veps, is a related but endangered language spoken south of Lake Onega (Ääninen). In addition to the dictionary project, I am also doing research on adpositional structures in the Veps language.

How is your research related to Kielipankki?

In my research on the Finnish grammar, instead of normativity, I am more interested in how people actually use linguistic structures, and what types of meanings and connotations these structures can convey. For this purpose, I have used the resources in Kielipankki: The Suomi24 Sentences Corpus 2001-2020 for the study of Modern Finnish, and the corpora of Early Modern Finnish and Old Literary Finnish for the study of the older forms of the language. I am also currently using the Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s and the Finnish News Agency Archive Corpus.

In fact, the Suomi24 Sentences Corpus 2001-2020 is itself the subject of our joint research with Max Wahlström and Olli Silvennoinen. What is interesting about this corpus is that it largely represents informal language use but is still different from spoken language in terms of its linguistic features. In addition, the corpus is a diverse source in terms of the formality of language use and the occurrence of linguistic phenomena as they seem to be influenced by the various topics of discussion and their styles of expression. In our forthcoming article, we will critically examine what kind of source the Suomi24 corpus actually is for a language researcher.

Publications

Niva, Heidi 2022: Suomen progressiivirakenne intentioiden ja ennakoinnin ilmaisuissa. Helsinki: Helsingin yliopisto. Available: http://urn.fi/URN:ISBN:978-951-51-8727-7

Niva, Heidi 2024: Tulen muistamaan hänet aina. Tulla V-mAAn vääjäämättömän tulevaisuuden ilmaisukeinona. Virittäjä 128(2), 238–263. DOI: 10.23982/vir.126878

Corpora

Links

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Suomeksi

Researcher of the Month: Krister Lindén

Krister Lindén
Photo: Juhani Jokinen

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Krister Lindén, the Director of the Language Bank, describes how researchers in Humanities can benefit from the use of artificial intelligence in their corpus-based research.

Who are you?

I am Krister Lindén. At the University of Helsinki, I am Research Director for Language Technology at the Department of Digital Humanities, and Deputy Team Leader at the Centre of Excellence for Ancient Near Eastern Empires. For national research infrastructures, I am the Director of the Language Bank of Finland, the National Coordinator of FIN-CLARIN, and the PI of FIN-CLARIAH. At the EU level, I am Chair of the National Coordinators Forum of CLARIN, a research infrastructure for the humanities and social sciences, and a member of the CLARIN Legal Issues Committee (CLIC).

What is your research topic?

I have always been interested in language technology and its application and, due to my involvement in the Language Bank, increasingly also in the prerequisites for developing and applying technology:

  • How can we use data to answer a broad range of research questions in the humanities and social sciences?
  • Where can we obtain development and test data to develop and evaluate our data processing methods?
  • Under what conditions can data be shared with other researchers so that they can verify the proclaimed performance of the methods?

An independent evaluation of methods is important to ensure progress and that we find the best methods in each case. If only a preliminary evaluation is needed, and a small-scale experiment is sufficient, you can give ChatGPT a few examples to see how it copes with the task. If there is insufficient data to reliably use a statistical method, and the task requires a high precision method, it may be quicker to use manually developed methods. On the other hand, if there is enough data, a suitable machine learning method is available, and the processing environment performance is sufficient, this combination often provides the most reproducible development path.

All the above development paths are data-driven and require data to be shared with other researchers for replication. In previous years, there has been a strong enthusiasm for completely open source data sets. While this is still a desirable goal, there are many datasets that, for one reason or another, cannot be made available to everyone. Gradually, as our community of researchers together with the lawmakers have succeeded in developing a legal framework for data access which is open enough for academic researchers to study the data and verify the results in a relatively straightforward way, while keeping the data accessible to a sufficiently small audience not to risk personal data nor infringe on copyrights.

A new development need is to create a method for researchers in the humanities and social sciences to discuss the content of datasets which they deposit in the Language Bank with an AI.

How is your research related to Kielipankki?

The Language Bank provides both a platform for tool development and an opportunity to show how different types of research-oriented datasets can be shared with other researchers in a safe and legal way.

Recent publications

Jauhiainen, T., Zampieri, M., Baldwin, T. C., & Linden, K. (2024). Automatic Language Identification in Texts. (Synthesis Lectures on Human Language Technologies). Springer. https://doi.org/10.1007/978-3-031-45822-4

Jauhiainen, T., Piitulainen, J., Axelson, E., Dieckmann, U., Lennes, M., Niemi, J., Rueter, J., & Linden, K. (2024). Investigating Multilinguality in the Plenary Sessions of the Parliament of Finland with Automatic Language Identification. In D. Fišer, M. Eskevich, & D. Bordon (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024): ParlaCLARIN IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (pp. 48-56). (International conference on computational linguistics), (LREC proceedings). European Language Resources Association (ELRA). https://researchportal.helsinki.fi/files/312866811/ArtikkeliJulkaistu.pdf

Sahala, A., & Linden, K. (2023). BabyLemmatizer 2.0 – A Neural Pipeline for POS-tagging and Lemmatizing Cuneiform Languages. In A. Anderson, S. Gordin, B. Li, Y. Liu, & M. C. Passarotti (Eds.), Proceedings of the Ancient Language Processing Workshop associated with the 14th International Conference on Recent Advances in Natural Language Processing, RANLP 2023 (pp. 203-212). INCOMA. https://aclanthology.org/2023.alp-1.23

Linden, K., Niemi, J., & Kontino, T. (Eds.) (2023). CLARIN Annual Conference Proceedings 2023. (CLARIN Annual Conference Proceedings). CLARIN ERIC. https://researchportal.helsinki.fi/files/298353929/CE-2023-2328_CLARIN2023_ConferenceProceedings.pdf

Lindén, K., Ruokolainen, T., Hämäläinen, L., & Harviainen, J. T. (2023). Ethically Archiving a Hard-to-Access Massive Research Data Set in the Language Bank of Finland: The Finnish Dark Web Marketplace Corpus (FINDarC). In M. M. Rantanen , S. Westerstrand, O. Sahlgren, & J. Koskinen (Eds.), Proceedings of the Conference on Technology Ethics 2023 – Tethics 2023 (pp. 114-131). (CEUR Workshop Proceedings; Vol. 3582). CEUR-WS.org. https://researchportal.helsinki.fi/files/295005165/FP_10.pdf

Kamocki, P., Linden, K., Puksas, A., & Kelli, A. (2023). EU Data Governance Act: Outlining a Potential Role for CLARIN. In T. Erjavec, & M. Eskevich (Eds.), Selected papers from the CLARIN Annual Conference 2022 (pp. 57-65). (Linköping Electronic Conference Proceedings; No. 198). CLARIN ERIC. https://doi.org/10.3384/ecp198006

Linden, K., Jauhiainen, T., & Hardwick, S. (2023). FinnSentiment: A Finnish Social Media Corpus for Sentiment Polarity Annotation. Language Resources and Evaluation, 57(2), 581-609. https://doi.org/10.1007/s10579-023-09644-5

Axelson, E., Hardwick, S., & Linden, K. (2023). HFST Training Environment and Recent Additions. In A. Hurskainen, K. Koskenniemi, & T. P. (Eds.), Rule-Based Language Technology (pp. 60-69). (NEALT Monograph Series; No. 2[1]). Northern European Association for Language Technology. http://hdl.handle.net/10062/89595

Links

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Suomeksi

Researcher of the Month: Juraj Šimko

Juraj Šimko
Photo: Veikko Somerpuro

Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Juraj Šimko tells us about his research on speech articulation and prosody. The Phonetics and Speech Synthesis Research Group at the University of Helsinki also aims to use large language models for finding answers to certain theoretical questions related to speech.

Who are you?

I am a University Lecturer in Phonetics, working at the University of Helsinki since 2013. Prior to that I have studied and worked at several Universities in Slovakia, Ireland and Germany, and I spend several years as a Language Specialist in Microsoft. I currently also hold an Honorary Professorship at the Indian Institute of Technology in Guwahati. My background is in Maths, Cognitive Science and Phonetics.

I am a member of the Phonetics and Speech Synthesis Research Group at the Department of Digital Humanities, but I am currently also involved in an ERC Advanced grant (to Professor Alice Turk) called Planning the Articulation of Spoken Utterances at the University of Edinburgh, where we investigate and model cognitive processes behind speech production and articulation.

What is your research topic?

I am passionate about human speech research. Besides speech articulation, my own as well as our Group’s main research interest is speech prosody, that is, essentially, all those melodic, rhythmic, emotional aspects of speech that go beyond the linguistic message that we pass on when we speak. In our current project Predictive Processing Approach to Modelling Prosodic Hierarchy for Speech Synthesis we are working on a novel speech synthesis architecture that is inspired by the influential theoretical and modelling paradigm of human cognition called Predictive Processing. Of course, the first obvious aim is to produce a world-class speech synthesis, and our team has indeed been creating state-of-the-art Finnish and Finland Swedish synthesis systems. But we also want to use the huge language models that drive technological applications as statistical representations of speech material used for their training, and use them to answer theoretical questions related to speech. These questions include, among others, distribution and evolution of accents and dialects, relationship between sociolinguistics and prosody, and prosodic patterns in politicians’ parliamentary speeches.

How is your research related to Kielipankki?

In order to do all that, we need quite a lot of data. Some of it we create ourselves, with invaluable assistance from Kielipankki experts: we have designed and recorded FinSyn corpus of high quality speech material intended for speech technology application, primarily for speech synthesis. The corpus contains ~75 hours of studio quality recordings from three voice talents, two of them speaking Finnish and one Finland Swedish. This corpus will appear as a part of Kielipankki collection. Our work on dialects and sociolinguistics heavily relies on other Kielipankki corpora, primarily the groundbreaking Donate Speech (Lahjoita puhetta) Corpus and Aalto Finnish Parliament ASR Corpus.

Recent publications

Törö, T., Suni, A. and Šimko, J. (2024). Analysis of regional variants in a vast corpus of Finnish spontaneous speech using a large-scale self-supervised model, Proceedings of Speech Prosody 2024, Leiden, Netherlands. DOI: 10.21437/SpeechProsody.2024

Vainio, M., Suni, A., Šimko, J. and Kakouros, S. (2024). The Power of Prosody and Prosody of Power: An Acoustic Analysis of Finnish Parliamentary Speech, Proceedings of Speech Prosody 2024, Leiden, Netherlands. DOI: 10.21437/SpeechProsody.2024

Elie, B., and Šimko, J., and Turk, A. (2024). Optimization-based modeling of Lombard speech articulation: Supraglottal characteristics. JASA Express Letters, 4(1). https://doi.org/10.1121/10.0024364

Kakouros, S., Šimko, J., Vainio M., and Suni, A. (2023). Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody, Proceedings of the 12th ISCA Speech Synthesis Workshop (SSW), Grenoble, France. https://doi.org/10.21437/SSW.2023-20

Šimko, J., Törö, T., Vainio M., and Suni, A. (2023). Prosody under control: Controlling prosody in text-to-speech synthesis by adjustments in latent reference space, Proceedings of the 18th International Congress of Phonetic Sciences, Prague, Czech Republic. http://hdl.handle.net/10138/565382

Šimko, J., Adigwe, A., Suni, A. and Vainio M. (2022). A Hierarchical Predictive Processing Approach to Modelling Prosody, Proc. 11th International Conference on Speech Prosody, Lisbon, Portugal. https://doi.org/10.21437/SpeechProsody.2022-86

Corpora

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Suomeksi

Researcher of the Month: Lotta Leiwo

Lotta Leiwo
Photo: Veikko Somerpuro

Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Lotta Leiwo tells us about her research in folkloristics, digging into the life and work of Finnish-American T-Bone Slim.

Who are you?

I am Lotta Leiwo, a doctoral researcher at the University of Helsinki, where I am studying for a PhD in history and cultural heritage. My dissertation in Folklore Studies examines the political role and nature-related rhetoric of Finnish-American women in the Finnish Socialist Federation (FSF) in the early 20th century. My main research data consists of FSF documents and a newspaper called Toveritar. The Toveritar, a mouthpiece of the FSF, targeted women and was edited and written mainly by women.

Prior to my doctoral project, I worked for two years as a research assistant on the project T-Bone Slim and the transnational poetics of the migrant left in North America (Kone Foundation 2022–2023). My main responsibility in this international project was the construction of the T-Bone Slim corpus and database. During the project, I wrote my Master’s thesis on Finnish socialist women in North America and found the topic for my dissertation.

What is your research topic?

In the T-Bone Slim project, an international research team studied the life and literary works of the second-generation American Finnish Matti Valentinpoika Huhta (1882–1942), also known as T-Bone Slim. Huhta was born in Ashtabula, Ohio, to a Finnish family that emigrated from Kälviä, Central Ostrobothnia. He spent his childhood and youth in Finnish communities in the US, working as a dock worker and as a correspondent for the local chapter of the temperance movement. In the 1910s, Huhta abandoned his family and took up a life as a ’hobo’ or itinerant worker. By the 1920s, Huhta became radicalised, joining the Industrial Workers of the World (IWW) and becoming a columnist for IWW newspapers and periodicals. He continued his writing career under the pen name T-Bone Slim until his death. Huhta lived his last years in New York, where he worked as a deck scow captain. In May 1942, he was found drowned in New York’s East River and was almost forgotten for several decades. For further exploration of the unresolved questions surrounding T-Bone Slim’s death, please visit our project blog and read Saku Pinta’s two-part text ”Who Killed T-Bone Slim” Part I and Part II.

In the late 2010s, musician John Westmoreland, a relative of Slim’s, discovered his ”Uncle Matt’s” T-Bone Slim writing career. Around the same time, academic interest in Slim, who had a Finnish background, began to grow, and his relatives and researchers found each other over T-Bone Slim Studies. The research continued in a project funded by the Kone Foundation, which brought together John Westmoreland and scholars from Finland, the UK, the US, Canada, and Australia. Kirsti Salmi-Niklander is the Principal Investigator of the project. We collected the T-Bone Slim materials gathered by the researchers from various archives organizing them into a corpus to enchance accessibility for others interested in the subject. In total, data from 14 archives across three continents and five countries – the United States, Canada, Finland, Sweden and Australia – provided the materials.

The corpus encompasses a total of 1294 texts written by T-Bone Slim and published in English in IWW periodicals. However, Slim also wrote in Finnish on occasion and occasionally used Swedish. Furthermore, the corpus also includes the surviving manuscripts written by Slim.

The texts written by T-Bone Slim are a gold mine for researchers. Slim used language cleverly, combining different genres and means of expression. In addition, the historical, literary and cultural references found in the texts provide an opportunity to examine the IWW movement, transnational migration and history in the United States from diverse perspectives. The language employed in the texts is rich, insightful, and even playful, and may be of interest to linguists. As the material comprises both published and unpublished texts, it offers insights into both the editorial processes of political publishing and the writing practices of an individual author.

Within the framework of the project, I have examined the literary practices, literacy acquisition of Finnish migrant-settlers and Slim’s utilization of genres from a semiotic perspective. Notably, Slim’s texts exhibit multilingualism in both background and content, incorporating intertextuality and multimodality across various genres and oral-literary practices. Such practices are evident, for example, in his song lyrics. In typical IWW style, Slim wrote lyrics addressing social injustices to popular song tunes known to readers. The lyrics were thus written to be sung, with the aim of provoking the reader/singer to reflect on the message of the lyrics. As Owen Clayton, a collaborator on our project, has observed, T-Bone Slim sought to activate and engage readers through language and words. I, too, am continually amazed and delighted by Slim’s skilful written expression.

How is your research related to Kielipankki?

In the early stages of the project, we thought long and hard about a suitable repository for the T-Bone Slim corpus and database. Our priority was to find a long-term storage solution for the materials that would ensure the materials’ widespread accessibility. Equally important was the need for the corpus to be explored and analysed through digital humanities methods.

The T-Bone Slim corpus and database will be published in April 2024 in Kielipankki, which fulfills all our storage and access requirements. The collection consists of photographic and microfilm scans of the original materials (newspapers, periodicals and manuscripts) with transcriptions and a database. The database includes all the texts in the corpus accompanied by metadata (date of publication, publication, title of the text, archive from which the material was collected, language, etc.). Additionally, we have experimented abstracting the data into a subset of the materials. For example, the people and places mentioned by T-Bone Slim and information about the poems or songs contained in the texts are listed in the abstracted data. The purpose of the database is to facilitate data navigation and serve as a foundation for more detailed abstraction of the data by other researchers.

T-Bone Slim Corpus and Database Launching Event

Welcome to the Resurrection – T-Bone Slim Corpus and Database Launching Event on Monday May 20th, 2024 at 15:00–17:00. The launching event is open to the public and the program can be followed both via Zoom and on-site at the Finnish Literature Society (Hallituskatu 1, Helsinki). More information and registration for remote participants.

Publications

Apajalahti, Eeva-Lotta et al. (2022). ”Ihmistieteelliset näkökulmat metsiin tuottavat tietoa moninaisista metsäsuhteista ja niiden tulevaisuuksista.” Vuosilusto 14(2022): 13–51. Available: https://lusto.fi/wp-content/uploads/2022/12/Lusto-Vuosilusto14.pdf.

Leiwo, Lotta (2024). ”When One’s Life Becomes the Field. Assessing the Field in Collaborative Autoethnography.” Marburg Journal of Religion 25(1). https://doi.org/10.17192/mjr.2024.25.8693.

Leiwo, Lotta (2023). ”Luontokin näkyy olevan köyhälistöä vastaan” Luonto kolmantena tilana Toveritar-lehden paikkakuntakirjeissä 1916–1917. Master’s thesis. Helsinki: University of Helsinki. http://urn.fi/URN:NBN:fi:hulib-202305302306.

Leiwo, Lotta (2023). ”Suomen koloniaalin osallisuuden kontekstit haltuun: Hoegaerts, Josephine, Tuire Liimatainen, Laura Hekanaho ja Elizabeth Peterson (toim.). 2022. Finnishness, Whiteness and Coloniality.” Elore, 30(2), 142–147. Book review. https://doi.org/10.30666/elore.137470.

Mäkelä, Heidi Henriikka, Leiwo, Lotta, Linkola, Hannu ja Rinne, Jenni (2023). ”The spiritual forest: an ethnographic exploration on Finnish forest yoga and the forest landscape.” Landscape Research. https://doi.org/10.1080/01426397.2023.2268550.

Corpora

Entries from the Research Project’s Blog

Leiwo, Lotta (2023). ”T-Bone Slim Database – Final Steps.” ’T-Bone Slim and the transnational poetics of the migrant left in North America’ Research Project’s Blog. 18.12.2023. https://blogs.helsinki.fi/tboneslim/2023/12/18/t-bone-slim-database-final-steps/.

Leiwo, Lotta (2023). ”T-Bone Slim Database – Next Steps.” ’T-Bone Slim and the transnational poetics of the migrant left in North America’ Research Project’s Blog. Published 22.6.2023. https://blogs.helsinki.fi/tboneslim/2023/06/22/t-bone-slim-database-next-steps/.

Salmi-Niklander, Kirsti (2023).”’T-Bone Slim’ eli Matti V. Huhta ajatteli ja kirjoitti kahdella kielellä kulkurielämästä ja työläisten oikeuksista” ’Vähäisiä lisiä’ Blog. Published 12.5.2023. https://www.finlit.fi/ajankohtaista/blogi/t-bone-slim-eli-matti-v-huhta-ajatteli-ja-kirjoitti-kahdella-kielella-kulkurielamasta-ja-tyolaisten-oikeuksista/.

Clayton, Owen (2023). ”Technocracy and T-Bone Slim’s Break with Ralph Chaplin” ’T-Bone Slim and the transnational poetics of the migrant left in North America’ Research Project’s Blog. Published 1.3.2023. https://blogs.helsinki.fi/tboneslim/2023/03/01/technocracy-and-t-bone-slims-break-with-ralph-chaplin/.

Dalbello, Marija (2022). ” From my Archival ‘Digs’, part I. Finding Slim!” ’T-Bone Slim and the transnational poetics of the migrant left in North America’ Research Project’s Blog. Published 12.12.2022. https://blogs.helsinki.fi/tboneslim/2022/12/12/finding-slim/.

Pinta, Saku (2022). ”T-Bone Slim’s Forgotten Finnish-Language Writings in the IWW Press” ’T-Bone Slim and the transnational poetics of the migrant left in North America’ Research Project’s Blog. Published 20.10.2022. https://blogs.helsinki.fi/tboneslim/2022/10/20/t-bone-slims-forgotten-finnish-language-writings-in-the-iww-press/.

Leiwo, Lotta (2022). ”T-Bone Slim Database – First Steps.” ’T-Bone Slim and the transnational poetics of the migrant left in North America’ Research Project’s Blog. Published 5.10.2022. https://blogs.helsinki.fi/tboneslim/2022/10/05/t-bone-slim-database-first-steps/.

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Hae Kielipankki-portaalista:
Krista Ojutkangas
Kuukauden tutkija: Krista Ojutkangas

 

Tulevat tapahtumat


Yhteystiedot

Kielipankin tekninen ylläpito:
kielipankki (ät) csc.fi
p. 09 4572001

Aineistoihin ja muuhun sisältöön liittyvät asiat:
fin-clarin (ät) helsinki.fi
p. 029 4129317

Tarkemmat yhteystiedot