Researcher of the Month: Tommi Kurki

Tommi Kurki

Photo: Kaisla Kurki

 

Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Adjunct Professor, Senior Lecturer Tommi Kurki from the University of Turku tells us about how he makes use of the resources provided by Kielipankki.

Who are you?

I am Adjunct Professor in the Finnish language at the University of Turku and work there as a senior lecturer. My fields of expertise include sociolinguistics, especially language variation and change (in the Finnish language) and methodology in sociolinguistics. Currently, I am the principal investigator in Digilang, an infrastructure project where the digital linguistic research materials of the School of Languages and Translation Studies in the University of Turku are collected, organized and developed. (see Kurki & al. 2018).

What is your research topic and how is it related to the Language Bank of Finland?

I am interested in several linguistic topics, most of which have been connected with language change. In my early undergraduate years, I got familiar with longitudinal corpora, and this is probably why I have been interested in many types of Finnish corpora and especially longitudinal ones ever since. I have used at least the Follow-up Study of Dialects of Finnish corpus, the The Finnish Dialect Syntax Archive, Samples of Spoken Finnish and the Digital Morphology Archives. When examining the variation in Finnish, I have usually dealt with phonological, morphophonological and morphological features but during the past few years I have tried to extend my scope on prosodic features as well.

However, linguistic corpora have been an essential part of my career: collecting and processing material, compiling and developing corpora. In the 1990’s, I was recruited as a trainee to the Finnish Dialect Follow-up Project conducted by Kotus (the former Research Institute for the Languages of Finland, currently the Institute for the Languages of Finland). In the project, I wrote my MA thesis (1998a) and wrote two research reports (1998b, 1999) as a young researcher in Kotus. As part of the Follow-up Project, I also completed my doctoral thesis (2005) that dealt with the mechanisms of language change as well as the methodology of studying language change.

Until today, all the projects directed by me have been connected with spoken language and linguistic corpora. ”Linguistic Variation in the Province of Satakunta in the 21st Century” is a sociolinguistic project funded by the Finnish Cultural Foundation. In this project, over 200 local speakers were recorded, representing various age groups and 16 municipalities in Satakunta. Currently, this data is being morphologically and syntactically annotated. The corpus is to be made available in the Language Bank of Finland during the next few years. The data from this project and from the Samples of Spoken Finnish corpus (available in the Language Bank) have been analyzed for instance in Kurki & al., 2011.

The Regional and Social Variation in Finnish Prosody Project is funded by the Kone foundation and the Digilang project, and it was started in 2013 by my and my colleague PhD Tommi Nieminen (see for example Kurki & al. 2014). In this project, we compiled a sociophonetic corpus where speakers recorded their voices over the Internet in elicitation tasks. Representative sets of data from this corpus are being segmented and annotated. The objective of this project is to examine the prosody of Finnish and to pay more attention to regional and social variation than before. This corpus will also be available in the Language Bank of Finland in a few years.

Apart from my research projects, The Language Bank of Finland has been an integral part of my work as a lecturer and supervisor. When I was working in the Syntax Archive, one of my most important tasks was to introduce students to different linguistic corpora and to help them find good material for their BA and MA theses. Suitable examples and materials for my students were easy to find when I was giving courses on Finnish dialects and dialectology or on corpus linguistics. All the corpus projects I am running at the moment were originally planned so as to make the collected data available via the Language Bank of Finland. As a speech and language research expert, I have also participated in designing the Donate Speech campaign (by Vake) in collaboration with Professor Mikko Kurimo (from Aalto University) and the Language Bank of Finland.

Publications related to the resources:

Kurki, Tommi 1998a: Kui Kuivlahdel puhuta? Eurajoen vanhan murteen ja puhekielen vertailua sekä ikäryhmittäisten ja sukupuolikohtaistan erojen tarkastelua. Pro gradu ja suomen murteiden seuruuhankkeen osatutkimus (118 sivua + 39 liitesivua). Turun yliopisto, suomen kieli.

Kurki, Tommi 1998b: Kielellinen vaihtelu ja muutos Alastaron murteessa. Kotimaisten kielten tutkimuskeskuksen seuruuhankkeen tutkimusraportti. (79 sivua + 35 liitesivua). Helsinki: Kotus.

Kurki, Tommi 1999: Kielellinen vaihtelu ja muutos Pälkäneen murteessa. Kotimaisten kielten tutkimuskeskuksen seuruuhankkeen tutkimusraportti.  (114 sivua + 51 liitesivua). Helsinki: Kotus.

Kurki, Tommi 2005: Yksilön ja ryhmän kielen reaaliaikainen muuttuminen. Kielenmuutosten seuraamisesta ja niiden tarkastelussa käytettävistä menetelmistä. SKST 1036. SKS, Helsinki.

Kurki, Tommi, Siitonen, Kirsti, Väänänen, Milja, Ivaska, Ilmari & Ekberg, Jari 2011: Ensi havaintoja Satakuntalaisuus puheessa ‐hankkeesta. Sananjalka 53, 83–108. DOI: https://doi.org/10.30673/sja.86706.

Kurki, Tommi – Nieminen, Tommi – Kallio, Heini & Behravan, Hamid 2014: Uusi puhesuomen variaatiota tarkasteleva hanke. Katse kohti prosodisia ilmiöitä. – Sananjalka 56 s. 186–195. URN: http://urn.fi/urn:nbn:fi:ele-1733815.

Kurki, Tommi – Inaba, Nobufumi – Kaivapalu, Annekatrin – Koponen, Maarit – Laippala, Veronika – Leblay, Christophe – Luutonen, Jorma – Mutta, Maarit – Nikulin, Markku & Reunanen, Elisa 2018: Digilang – Turun yliopiston digitaalisia kieliaineistoja kehittämässä. – Proceedings of the Research Data and Humanities (RDHum) 2019 Conference: Data, Methods and Tools, p. 41–56. Studia Humaniora Ouluensia 17. Oulu: University of Oulu. URN: http://urn.fi/urn:isbn:9789526223216.

 

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive.

Researcher of the Month: Jenny Tarvainen

Jenny Tarvainen - kuva: Inka Huuskonen
Photo: Inka Huuskonen

 

Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Jenny Tarvainen, graduate from the University of Jyväskylä tells us about how she makes use of the resources International Corpus of Learner Finnish, ICFLI and The Suomi 24 Corpus provided by Kielipankki.

Who are you?

I am Jenny Tarvainen. In January 2019, I graduated from the University of Jyväskylä with the Finnish language as the major subject in my master’s degree. At the moment I teach the Finnish language for immigrants, with an intention to start my doctoral education in the near future. I was drawn into corpus research in my bachelor’s studies already, and no change in this interest is expected in the future. The Language Bank of Finland, Kielipankki, has become quite familiar to me during the years.

What is your research or development work topic?

My Master’s thesis (Tarvainen 2018) presented a comparative corpus study on the phraseological features of the verb SAADA (to gain) in native Finnish and learner Finnish. The aim was to find out, with Contrastive Interlanguage Analysis (CIA), how the usage or the verb SAADA by Finnish language learners differs from how the native speakers use this verb. To address these differences, I focused on the word forms and the meanings in the cotext of the verb. I also studied the correlation between these forms and meanings with statistical methods. An interesting finding was that the correlation between the forms and the meanings was firmer in the usage of those studying Finnish as a foreign language than in the texts by the native speakers, i.e. a specific form of the verb SAADA appeared in the learner language more often with the specific meaning found in the cotext: the discussion around the verb form saavat (they get), for example, focus most probable on family or people in general, whereas the themes found around the base form saada are place, direction and area.

During my studies and after graduating I have also worked as a research assistant in the research projects led by professor of Finnish language Jarmo Jantunen at the University of Jyväskylä. The research projects study how homo and hetero sexual people are discussed in the media (Jantunen 2018) and what kind of discourses arise when the discussion concerns different cities in the Metropolitan area (forthcoming). During these research projects I have learned about the Computer Assisted Discourse Studies (CADS). At the moment I am working on the research plan for applying for the doctoral studies during the autumn.
Corpora will provide data for my research in the future, too: I intend to use machine learning to study discourses in The Suomi 24 Corpus, related to the Metropolitan area.

How is your research related to Kielipankki?

For the Master’s thesis I compiled the data from the International Corpus of Learner Finnish, ICFLI International Corpus of Learner Finnish, ICFLI
The corpus comprises texts written by students of Finnish as a foreign language which have been categorized according to the Common European Framework of Reference for Languages (CEFR) / to reference levels. I used the texts of the advanced students because the reference data was compiled of the texts by the native Finnish speakers. The variety of texts (essays, summaries, emails, job applications…) made it possible to study learner language widely instead of studying features that are typical to only a specific genre only, or the impact of a specific native tongue.

The Suomi 24 Corpus provided by Kielipankki has offered data for the other studies. It has been possible to sample smaller subcorpora from the data based on the search results, such as the subcorpora of homos and heteros and the subcorpora of the different cities in the Metropolitan area to provide access to discourses in these subcorpora.

Publications related to the resource

Tarvainen, Jenny 2018: SAADA-verbin fraseologiaa: vertaileva korpustutkimus oppijan- ja natiivikielestä. Master’s thesis. University of Jyväskylä. https://jyx.jyu.fi/handle/123456789/59273?show=full
Jantunen, Jarmo H. 2018: Homot ja heterot Suomi24:ssä: analyysi digitaalisista diskursseista. Puhe ja kieli, 38(1), 3–22. https://doi.org/10.23997/pk.65488

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive.

Researcher of the Month: Anna Puupponen

Anna Puupponen - kuva: Tapio Laitinen
Photo: Tapio Laitinen

 

Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Anna Puupponen, postdoctoral researcher at the University of Jyväskylä tells us about how she makes use of the resources Corpus of Finnish Sign Language and ProGram data. The stories Snowman and Frog, where are you? in her research.

Who are you?

I am Anna Puupponen and I am working as a postdoctoral researcher at the Sign Language Centre at the University of Jyväskylä. I finalized my PhD in May 2019 and at the moment I am continuing my postdoctoral research on Finnish Sign Language (FinSL).

What is your research topic?

My doctoral research focused on a relatively understudied area of sign language linguistics: signers’ head and body movements. In the PhD project I studied actions of the signer’s head and body and the role that these actions play in structuring of language, in interaction and transfer of meanings.

I am currently doing research within various projects at the Sign Language Centre, focusing on embodied communication in signed situations, the similarities and the differences between the signing of adults and children, the sign language processing through neuroimaging, and the signing fluency of native signers and sign language learners.

How is the research work related to Kielipankki?

Several multimodal resources have been published in the Language Bank of Finland, Kielipankki, which I have participated in compiling and made use of in my research. A corpus comprising signed stories, the Snowfrog corpus ProGram data. The stories Snowman and Frog, where are you? was published in 2016 and the first part of the Corpus of Finnish Sign Language (Corpus FinSL) in 2019. In linguistic research on sign languages, corpus data can be seen as having an especially central role. Sign languages often have a weak status in the society, they lack well developed institutional standards, and their transmission from one generation to the next one is disturbed. In building descriptions and grammars of sign languages, it is important to study language-internal variation from extensive data sets. Sign language corpora are important also for the development of sign language teaching.

This data driven approach was in a central role in my PhD project. I used the sign language corpora published in Kielipankki in studies where I focused on the sequences of actions of the head and body, and the semiotic features of these sequences, in signed narratives and conversations. As the Snowfrog corpus and Corpus FinSL are very similar to the relevant corpora published on Swedish sign language with respect to the principles of compilation, I could also conduct a comparative study between Finnish and Swedish sign languages in my doctoral research.

Currently I’m using Corpus FinSL in a research project where we focus on the depictive language use of signers of different ages. The first part of The Corpus of Finnish Sign Language published in Kielipankki comprises signed narratives and discussions from 21 signers aged between 18 and 29 years. In the project we analyse Corpus FinSL data as well as data from children using FinSL collected in the VIKKE project hosted by the Sign Language Centre.

Publications related to the resource:

Puupponen, A. (2019). Understanding nonmanuality: A study on the actions of the head and body in Finnish Sign Language. PhD dissertation. University of Jyväskylä.
Puupponen, A. (2019). Towards understanding nonmanuality: A semiotic treatment of signers’ head movements. Glossa: a journal of general linguistics 4(1): 39. 1–39. DOI: https://doi.org/10.5334/gjgl.709
Jantunen, T.; Mesch, J.; Puupponen, A. & Laaksonen, J. (2016). On the rhythm of head movements in Finnish and Swedish Sign Language sentences. In Proceedings of Speech Prosody 2016 [organized at Boston University, May 31–June 3, 2016], pp. 850–853
Press release of Anna Puupponen’s dissertation on the website of the University of Jyväskylä.

The developer’s point of view to the Corpus of Finnish Sign Language was presented in the interview of Juhana Salonen in May 2020.

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive.

Researcher of the Month: Markus Mattila

Markus Mattila - kuva: Markus Mattila
Photo: Markus Mattila

 

Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Markus Mattila, MA graduated from Åbo Akademi tells us about how he makes use of the resource The Suomi 24 Sentences Corpus (2017H2) (beta).

Who are you?

I am Markus Mattila. I graduated last year from Åbo Akademi University with Master’s degrees in Finnish language and English language and literature. I also have a Master of Economic Sciences degree from before. At the moment, I am working as a substitute teacher and planning to take up postgraduate studies.

What is your research topic?

In my MA thesis in Finnish language, I studied language change, focusing on situational idioms containing possessive suffixes and, to be more precise, the agreement between the possessive suffix and the subject of the clause. In standard usage, the possessive suffix should agree with the subject, as in olen huolissani vs. *olen huolissaan. The research questions investigated in my thesis were:
• How common is the use of non-agreeing possessive suffixes in the first person singular in certain situational idioms?
• Have there been any changes in the proportion of non-agreeing forms – i.e. forms contrary to the usual norms – during the period under investigation in this study?
• Do the studied idioms differ from one another with respect to how common the use of the non-agreeing possessive suffix is?
Based on a pilot study, the expressions selected for further investigation were olla huolissaan [to be worried], olla pahoillaan [to be sorry] and olla innoissaan [to be excited]. In order to answer my research questions, I conducted a corpus study comprising three time periods: 2001—2006, 2007—2011 and 2012—2017. The statistical significance of the results of the study was tested using cross tabulation and Pearson’s χ² / chi-square test.

How is the research work related to Kielipankki?

A corpus-based study was the best possible method for researching such a rare phenomenon. Since language change takes place more often in spoken language than in the more controlled and stable written language, I chose to take my research data from the vast Suomi24 corpus provided by Kielipankki. The corpus consists of all discussions in the discussion forum Suomi24 between 2001 and 2017. These discussions, which are unofficial and written under a pseudonym, are a lot closer to spoken language than texts in official documents, news articles or literature, and are thus a very useful resource for investigating a research topic of this kind.
The specific resource in my research was The Suomi 24 Sentences Corpus (2017H2) (beta) which I first used as a whole to gain an overall picture of the data. After that, I divided the messages into the aforementioned time periods in order to study the possible changes. The corpus data was retrieved with the web based Korp concordancer tool available at Kielipankki, which I found simple and pleasant to use. One factor contributing to this positive experience was the excellent technical support provided, for which I would like once more to express my gratitude to the personnel concerned.

Publications related to the resource you have used:

Mattila, M. (2019): Olen pahoillani ja huolissaan” : Tutkimus persoonakongruenssista olotilanilmausidiomeissa Suomi24-korpuksessa 2001–2017, Master’s thesis (Pro gradu). Åbo Akademi. http://www.urn.fi/URN:NBN:fi-fe2019062421760

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive.

Researcher of the Month: Anita Nuopponen

Anita Nuopponen - kuva: Harri Huusko
Photo: Harri Huusko

 

Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Anita Nuopponen, professor in technical communication at the University of Vaasa tells us about how she makes use of the resource The Finnish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland, Kielipankki Version.

Who are you?

I am Anita Nuopponen, professor in technical communication from the School of Marketing and Communication, Communication Studies, University of Vaasa.

What is your research topic?

I have once again returned to terminology research, the subject of my dissertation in 1994. My special interest still concerns relations between concepts. The typology I have created for them is still relevant, since there is a need to distinguish between various types of concept relations in information systems and digitalization initiatives. Part of the relation types in the classification is going to be included in the next version of the international Terminology Standard ISO 704. The second current research area I am focusing on – also related to conceptual relations – is developing a systematic concept analysis method that makes use of the relations. I am currently working on an article on conceptual analysis as an aid to research work and also on a collaborative article on terminological methods in teaching special languages to students in various fields. Both will appear in the VAKKI Symposium series.

How is the research work related to Kielipankki?

At the moment I am on research leave and work partly on FIN-CLARIN initiative funding with the aim to create for Kielipankki content that is similar to the work I have done on my Terminology Forum site since 1994. I have thus returned to continue the work I started years ago! I am now looking for online vocabularies and glossaries available in Finnish in various fields, and creating a link list out of them, but the aim is to deposit glossaries with Kielipankki’s collections when possible. Interested parties in various fields, teachers, enterprises, associations and other organizations have compiled glossaries covering their own fields, and published them online. Several people could benefit from these if only they were available. All glossaries do not end up in TSK’s TEPA term bank or the Helsinki Term Bank for the Arts and Sciences. Many valuable resources disappear when the creator of the vocabulary changes jobs, retires or when the website of a company is renewed.

I became familiar with Kielipankki in the context of my presentation “Vaikeasti käsitettävä käsitteen käsite” [The concept of the concept is difficult to comprehend] in the Annual Conference on Linguistics in 2015. I used the data from the year 2000 included in the The Finnish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland, Kielipankki Version to study the definitions of concept and how the word concept is used and how concepts are addressed – I focused mainly on general language. The word concept is a frequently used word in general language and its function follows the personal intuition of each writer, and often that intuition is identical to the definition used in terminological research and given in dictionaries of general language. However, already in the next sentence it can be mixed with word, term or even phenomenon. This often happens in scientific writing, too.

Publications related to the resource you have used:

The paper on concepts mentioned above is yet to be published. The present project on Terminology Forum has not yet resulted in related publications, but there are presentations and articles from various contexts on making use of the internet in collecting and disseminating terminological resources. (Publication list: http://lipas.uwasa.fi/~atn/AnitaNuopponen/index.html)

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive.

Researcher of the Month: Emma Sepänaho

Emma Sepänaho - kuva: Sofia Tikanmäki
Photo: Sofia Tikanmäki

 

Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Emma Sepänaho, graduate student at the University of Helsinki tells us about how she makes use of the resources Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s, Version 2.

Who are you?

I am Emma Sepänaho, a fourth-year student of the Finnish language at the University of Helsinki. I am currently working on my pro gradu thesis on easy-to-read Finnish.

What is your research topic?

In my pro gradu thesis I study long words in easy-to-read Finnish media texts. At the moment I intend to concentrate on lemmas with 20 characters or more. I aim to do morphological analysis, study inflection and word formation, multiple morphemes and the frequencies of morphemes, as well as the semantic fields of such long words. Keeping in mind that the recommendations for easy-to-read Finnish state that the usage of long words should be avoided, it is interesting to find out that even one corpus of easy-to-read Finnish can contain more than a thousand tokens of this length. Easy-to-read language has been studied in Finland only little so far, and my study will hopefully result in valuable information about the nature of easy-to-read Finnish.

How is the research work related to Kielipankki?

I have collected the data in my thesis from the subcorpus Selkosanomat/Selkouutiset included in the Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s, Version 2, provided by Kielipankki. The subcorpus comprises of easy-to-read Finnish media texts published in Selkosanomat magazine (previously Selkouutiset) in 2006-2013. The corpus is valuable for my research because making use of the concordance tool Korp allows me to focus on specific tokens extracted with the search query, instead of searching for long words from easy-to-read language texts manually. Initially, my intention was to search for complex words with more than three syllables, but the parsing method does not currently allow such searches with satisfactory results. Fortunately, a search query defining the number of characters in the tokens produces solid data for my research.

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive.

Researcher of the Month: Katri Leino

Katri Leino - kuva: Katri Leino
Photo: Katri Leino

 

Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Katri Leino, doctoral candidate at the Aalto University tells us about how she makes use of the resources Yle News Archive Easy-to-read Finnish 2011-2018, source and The Suomi 24 Corpus in her research.

Who are you?

I am Katri Leino and I am a PHD student in Mikko Kurimo’s Speech Recognition group at Aalto University. I earned my master’s degree (tech.) from Aalto University. My master thesis was about adapting speech recognition models to a certain environment or a speaker. User experience has always been one of my interests. For my PHD, I wanted to combine the technical knowledge and methods from natural language processing (NLP) field with a human perspective of the human-computer interaction (HCI) field. Therefore, I asked Antti Oulasvirta, leader of User Interfaces group at Aalto University, to be my instructor while Mikko Kurimo supervises my studies. Oulasvirta’s group is highly focused on user modelling which provides a new perspective to NLP research.

What is the research topic?

In my main research project, I research how Finns type with their smartphones. The project is funded by the Foundation of Emil Aaltonen. Finns often complain that typing with touch keyboard is frustrating because it is difficult to hit the right keys and the predictive methods such as auto-correction do not work well for Finnish. I want to find ways to improve the typing experience and also see how the style of the language affects typing.

We are currectly collecting typing samples with the Typing Test at http://typingtest.aalto.fi  (the Finnish version: kirjoitustesti.aalto.fi). The Typing Test works on a browser. The task for the participants is to type given sentences as correctly and fast as possible. All keypresses and timestamps are saved as a data set which will be published 2020. The English data set was published this year (https://userinterfaces.aalto.fi/typing37k/ ). Our aim for the Finnish data is to collect typing samples with a wide variety of skill levels to have better understanding on successful strategies and challenges.

How is the research work related to Kielipankki?

In the Finnish Typing Test, we sampled the sentences from Kielipankki’s Yle’s easy-to-read news corpus Yle News Archive Easy-to-read Finnish 2011-2018, source and The Suomi 24 Corpus. When measuring typing speed, easy-to-remember sentences are recommended so participant does not have to check sentence many times while typing. Easy-to-read news were suitable for the test for that reason. Suomi24 corpus was selected because I also wanted to include real conversational sentences to see if the text style makes affects typing speed.

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive.

Researcher of the Month: Niko Partanen

Niko Partanen - kuva:  Sonja Holopainen, Kotus
Photo: Sonja Holopainen, Kotus

 

Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Niko Partanen, researcher in the Kone Foundation funded project Language Documentation meets Language Technology: The Next Step in the Description of Komi tells us about his ongoing research in which he will produce new resources for the Language Bank of Finland.

Who are you?

I am Niko Partanen, a researcher in the project Language Documentation meets Language Technology: The Next Step in the Description of Komi funded by the Kone Foundation. When working as a senior adviser in the Institute for the Languages of Finland last year I was able to get to know several language resources archived in Finland in an unique way and I will continue working with questions on archiving and digitalization also in the future. The scope and the quantity of language resources in Finland are very good but there are still a lot of open questions, especially in the practices of web publishing today and providing for the accessibility that is appropriate for different user groups. This summer I will spend as a visiting researcher at the University of Helsinki.

What is your research topic?

My research topic is variation and change in Komi-Zyrian dialects using different digital resources from different periods. My research focuses on certain known but inadequately described interesting features in the dialects that I currently work with towards articles on phonological and morphological subjects.

Researchers have been collecting resources on Komi dialects for over a hundred years already, which makes it possible to compare data over a long time. There are never too much data about endangered resources which is the situation also with the Komi dialects. This fact has made me study various resources that have previously been collected and published in different formats. I have worked for example with text identification related to these activities, which is one of the most effective means in transforming hand written texts into digital format.

I aim at developing and making use of language technology within speech data research. Our research project Language Documentation meets Language Technology: The Next Step in the Description of Komi lead by Rogier Blokland and Michael Rießler and funded by the Kone Foundation that is still going on for some years focuses on developing the morphosyntactic analysis of Komi, and the project has published articles on a regular basis on different solutions making use of natural language technology. In practice we can take a text in the Komi dialect and run it through the analyzer developed in the Giellatechno environment, with relatively good results for each word. However, it is not entirely clear how good the analysis needs to be for solving different kinds of research questions in a realistic way. In this respect I can myself serve as a test subject when I aim at answering specific research questions using this resource. Our project will also produce a wider description of the Komi syntax, and my doctoral research will also be finalised during the project period.

How is your research related to Kielipankki?

My research project is in the process of transferring its corpora of Komi to the infrastructure provided by the Language Bank. The corpora compiled and transcribed during the earlier project between 2014 and 2016 will be available in the Korp interface, which is very important for the researchers. It is of utmost importance that the resources would be made available for the whole research community as quickly as possible, and the practices for this to take place as easily as possible should be actively developed.

At the moment I am working on scripts for the Language Bank for analyzing the data in the Komi corpus and for configuring them into the format required by the Korp interface. This also applies to the simultaneous checking of the files. Since this is the result of manual work for five years already, the transcripts include a lot of minor non-standard structures that are now searched for with an automated process and fixed with appropriate measures. Otherwise these non-standard structures or anomalies would mean various kinds of problems for the user. For example, part of the contents of the corpus would not be visible through Korp, or the data would be located in a wrong place. All solutions and experiences gathered within the project will naturally be published in accordance with the principles of open science.

So far I have not used the Language Bank resources in my work, but I am interested in using the resources of Finnish and Karelian that are available in the Language Bank.

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive.

Researcher of the Month: Mietta Lennes

Mietta Lennes
Photo: Hanna Westerlund

 

Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Mietta Lennes tells us about her PhD study and about her work in FIN-CLARIN.

Who are you?

My name is Mietta Lennes and I work as a Project Planning Officer for the FIN-CLARIN consortium that is coordinated by the University of Helsinki. I help and advise researchers and students in their various problems related to managing, analyzing and publishing language corpora. In addition, I teach online courses in corpus linguistics, speech analysis and data management. I am a phonetician by training.

What is your research topic and how is your research related to Kielipankki?

My forthcoming doctoral dissertation deals with the link between the phonetic variability and the frequencies of words in casual spoken Finnish. For instance, it is previously known that, in any language, words that occur often tend to be shorter than words that occur rarely. However, the phonetic phenomena that may affect this situation can only be studied with a sufficiently large corpus. Furthermore, the speech recordings must be of high technical quality so as to allow for reliable acoustic-phonetic measurements.

For phonetic analysis, I have used a corpus called The FinINTAS Corpus of Spontaneous and Read-aloud Finnish Speech and especially the subcorpus FinDialogue that contains conversational speech. The corpus will be available in the Language Bank of Finland when I finish my PhD study. The FinINTAS corpus was mainly collected during the international INTAS 00-915 project and the associated Finnish projects in which the phonetic properties of reading aloud were compared with those of spontaneous speech. In practice, I was responsible for planning and coordinating the speech recordings and the annotation work of the corpus. Several students in Phonetics and Finnish from Helsinki as well as from St. Petersburg participated in these efforts. Together, we gradually managed to annotate the corpus comprehensively enough in order to produce some publications.

In my PhD study, I also needed information about the frequencies of word forms in spoken Finnish. The number of word tokens in the FinDialogue corpus alone was too small for this purpose, and there were no suitable corpora available in the Language Bank of Finland at that time. Fortunately, the transcripts of the 1970s subcorpus of what is now called the Longitudinal Corpus of Finnish Spoken in Helsinki (1970s, 1990s and 2010s) (Helpuhe1) happened to be available on the server of the Department of Linguistics of the University of Helsinki, and I was able to use these texts. The Helsinki spoken material was even somewhat similar in style to the FinDialogue corpus. However, the transcription practices of the material collected in the 1970s had varied a great deal, and so I needed to manually edit and harmonize the texts in order to be able to calculate at least approximate word frequencies. Looking back to this messy project, it feels great to know that all three subcorpora of the Longitudinal Corpus of Finnish Spoken in Helsinki – both the audio recordings and their aligned transcripts – have been more recently deposited in the Language Bank of Finland, thanks to the research group of Hanna Lappalainen.

Why do you work for FIN-CLARIN?

A researcher who collects language material often runs into the fact that a huge mass of texts or a collection of audio recordings alone does not directly provide the desired answers. I have learned from experience in many projects that it is easy to make audio and video recordings of speech, but it takes a lot of planning and hard work to collect the material systematically and then to prepare, organize, transcribe and annotate the files, which tends to be much more time-consuming. The researcher should also carefully describe the data and the analysis methods. One needs to make sure that it will possible to make gradual improvements to the study and to reuse the data later.

Even if the corpus has been properly created, some manual labour or tailored automatic methods may be necessary in order to perform a specific analysis that is required to answer the research question. In this detective work, a collection of services like Kielipankki, together with the entire network of researchers within FIN-CLARIN, can be extremely valuable. I believe that, in the future, versatile skills in data management will become a more and more important part in any researcher’s competence.

My own work in FIN-CLARIN is interesting and varied. It feels great to be able to help a student or a researcher solve a technical problem related with his or her research or to discover a tool that matches the purpose. Together with the entire Kielipankki team and the co-operating partners of FIN-CLARIN we also brainstorm and develop new services that can be provided via Kielipankki for the researchers’ benefit.

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive.

Researcher of the Month: Krista Lagus

Krista Lagus - photo: Linda Tammisto
Photo: Linda Tammisto

 

Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Krista Lagus, docent of the University of Helsinki, tells us about her research on The Suomi 24 Corpus.

Who are you?

I am Krista Lagus, professor in digital social science at the University of Helsinki, faculty of Social Sciences, Centre for Research Methods. I also participate in the collaboration network in Digital Humanities, HELDIG.

What is your research topic?

At the moment I am doing research within the Citizen Mindscapes consortium focusing on the emotional waves, types of interaction and topics of discussion in the social media. With the focus specifically on the Suomi24 discussions with an aim of identifying different perspectives and means of research, especially those motivated by the social sciences. We aim to develop interfaces and tools deriving from some of these means of research, in time, for social sciences researchers, as part of digital humanities. The latest result is Lääketutka that sheds light to the discussions of people over the medication, the symptoms and their health, from a completely novel perspective. It is available at www.laaketutka.fi.

Otherwise I have done research also on adapting methods from machine learning and neural networks to modelling different fields of language research. Such include for example inducing morphological segmentation with adaptive methods, modelling concept systems, as well as modelling discussion topics. There have often been practical applications calling for these; topic modelling was, for example, applied to data mining and exploring data in large text corpora. My background is in information technology within which I obtained my doctorate at the Helsinki University of Technology in 2000.

How is your research related to Kielipankki?

When it became evident that we wanted to open the Suomi 24 discussions to be available by all researchers, the size of the resource, appr. 70 million messages, was both a special challenge and a change. The Language Bank of Finland and FIN-CLARIN was a natural and a sufficiently solid partner for our project. The collaboration led to the publication of the Suomi24 resource owned by Aller, for all interested parties via the interactive user interface in the concordance tool Korp. The entire language resource is also licensed for download for academic non-commercial research purposes.

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive.

Researcher of the Month: Eero Voutilainen

Eero Voutilainen - kuva: Mika Federley
Photo: Mika Federley

 

Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Eero Voutilainen, PhD student at the University of Helsinki, tells us about his research on Plenary Sessions of the Parliament of Finland, Helsinki Korp Version.

Who are you?

I’m Eero Voutilainen, senior specialist at the Records Office of the Parliament of Finland and PhD student at the University of Helsinki.

What is your research topic?

In my forthcoming doctoral dissertation I am writing about linguistic norms and the regulation of interactions in the plenary sessions of the Parliament of Finland. What I’m interested in are how norms interact, how they are negotiated and how they affect the linguistic choices of the Members of Parliament. Although it is commonly thought that a plenary session is just a collection of monologues, it actually involves quite a lot of interaction.

I have also done research, e.g., on linguistic genres, ideologies of language planning, the relationship between spoken and written language as well as the translative case in Finnish.

How is your research related to Kielipankki?

In my doctoral research I am using the Plenary Sessions of the Parliament of Finland, Helsinki Korp Version, which can be found in the Language Bank of Finland. Thanks to this resource I could e.g. retrieve the utterances recorded in the plenary sessions’ minutes of the chairman managing the flow of conversation. One can also search in the resource in a comparatively convenient manner for comments of MPs on the instructions and recommendations they have received.

In my research I also compare the discussions retrieved from the minutes to the video recordings of the plenary sessions. Various editorial changes have been made to the minutes in order to ensure their readability, which is something one should be aware of when using the minutes in order to study the language of the plenary sessions.

Thanks to FIN-CLARIN, I was also able to participate in the CLARIN workshop in Sofia, Working with Parliamentary records, held in March 2017. The workshop focused on parliamentary corpora collected in different countries, on the solutions used for the collection of these resources as well as on the question of how to use them in various human and social science researches. For details on the program of the workshop and its presentations, please click the link above.

 

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive.

Researcher of the Month: Paul-Thor Holmberg

 

photo: Valtteri Airaksinen

Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Paul-Thor Holmberg, Master of Arts of the University of Oulu, tells us about his research on the Finnish Tree Bank and the Finnish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland.

Who are you?

I’m Paul-Thor Holmberg, Master of Arts of the University of Oulu. My major degree was in Finnish language. My Master’s thesis was published in December 2016.

What is your research topic?

In my Master’s thesis I discuss how the route of the variable expressed by the verbs mennä (to go) and tulla (to come) are windowed in Finnish. Windowing means the way in which the language user’s conceptualisation of the situation is reflected in the language, that is, what is (s)he decides to verbalize and what is it that (s)he decides to omit.

How is your research related to Kielipankki?

The resources I am using in my research are the Finnish Tree Bank 3 and the Finnish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland, both of them to be found in Kielipankki.

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive.

Hae Kielipankki-portaalista:
Tommi Kurki
Kuukauden tutkija: Tommi Kurki

 

Yhteystiedot

Kielipankin tekninen ylläpito:
kielipankki (ät) csc.fi
p. 09 4572001

Aineistoihin ja muuhun sisältöön liittyvät asiat:
fin-clarin (ät) helsinki.fi
p. 029 4144036 / 029 4129317