
Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Dejan Porjazovski tells us about his research on systems for spoken language understanding (SLU).
I am Dejan Porjazovski. I came to Finland back in 2018 to pursue a Master’s degree in Machine Learning, Data Science, and Artificial Intelligence at Aalto University. My interest in language technologies led me to join the Automatic Speech Recognition Group at Aalto, first as a summer intern and later as a Master’s thesis worker and a PhD student. I defended my PhD thesis in May 2025.
After graduation, I joined Aivot Labs as a Machine Learning Engineer, where I work with speech-to-text, text-to-speech, and large language models (LLMs) to build Finnish conversational agents, used in the medical domain.
My doctoral thesis focused on spoken language understanding for low-resource languages. Spoken language understanding is an umbrella term that covers different speech and language technologies that allow computers to comprehend human speech.
In the thesis, I explored different speech embedding methods, how the amount of data affects their performance, and whether they have language-agnostic capabilities (which is very important for low-resource languages).
Furthermore, I compared two paradigms for building spoken language understanding systems: cascading and end-to-end (E2E). The E2E models require a large amount of data to learn the task. The cascading systems, while more data efficient, come with a higher complexity. To this end, I compared the E2E and cascading systems on various SLU tasks, such as named entity recognition and topic identification, focusing predominantly on Finnish but also on other languages.
The last area of my research relates to out-of-distribution generalisation of E2E spoken language understanding models. As the hands-free interaction devices become more popular, it is important for the systems to perform reliably when presented with data not seen during training.
As part of the research, I used the Aalto Finnish Parliament ASR Corpus 2008-2020 to develop cascading and end-to-end named entity recognition models for spoken Finnish.
I was also involved in the collection of the Donate Speech datasets (puhelahjat). The corpora include over 3000 hours of speech, annotated with various metadata, such as age, gender, and topic. I used the corpora to develop a topic identification system for spontaneous Finnish speech, as well as models for extracting metadata from speech. During this period, I was part of the LAREINA project.
Porjazovski, D., Grósz, T., & Kurimo, M. (2024). From raw speech to fixed representations: A comprehensive evaluation of speech embedding techniques. IEEE/ACM Transactions on Audio, Speech, and Language Processing. DOI: 10.1109/TASLP.2024.3426301
Porjazovski, D., Grósz, T., & Kurimo, M. (2023, September). Topic identification for spontaneous speech: Enriching audio features with embedded linguistic information. In 2023 31st European Signal Processing Conference (EUSIPCO) (pp. 396-400). IEEE. DOI: 10.23919/EUSIPCO58844.2023.10289822
Moisio, A., Porjazovski, D., Rouhe, A., Getman, Y., Virkkunen, A., AlGhezi, R., … & Kurimo, M. (2023). Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks. Language Resources and Evaluation, 57(3), 1295-1327. DOI: 10.1007/s10579-022-09606-3
Porjazovski, D., Leinonen, J., & Kurimo, M. (2021, August). Attention-based end-to-end named entity recognition from speech. In International Conference on Text, Speech, and Dialogue (pp. 469-480). Cham: Springer International Publishing. DOI: 10.1007/978-3-030-83527-9_40
Porjazovski, D., Leinonen, J., & Kurimo, M. (2020, October). Named entity recognition for spoken finnish. In Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery (pp. 25-29). DOI: 10.1145/3422839.3423066
The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.
All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.