Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Jenny Tarvainen, graduate from the University of Jyväskylä tells us about how she makes use of the resources International Corpus of Learner Finnish, ICFLI and The Suomi 24 Corpus provided by Kielipankki.
I am Jenny Tarvainen. In January 2019, I graduated from the University of Jyväskylä with the Finnish language as the major subject in my master’s degree. At the moment I teach the Finnish language for immigrants, with an intention to start my doctoral education in the near future. I was drawn into corpus research in my bachelor’s studies already, and no change in this interest is expected in the future. The Language Bank of Finland, Kielipankki, has become quite familiar to me during the years.
My Master’s thesis (Tarvainen 2018) presented a comparative corpus study on the phraseological features of the verb SAADA (to gain) in native Finnish and learner Finnish. The aim was to find out, with Contrastive Interlanguage Analysis (CIA), how the usage or the verb SAADA by Finnish language learners differs from how the native speakers use this verb. To address these differences, I focused on the word forms and the meanings in the cotext of the verb. I also studied the correlation between these forms and meanings with statistical methods. An interesting finding was that the correlation between the forms and the meanings was firmer in the usage of those studying Finnish as a foreign language than in the texts by the native speakers, i.e. a specific form of the verb SAADA appeared in the learner language more often with the specific meaning found in the cotext: the discussion around the verb form saavat (they get), for example, focus most probable on family or people in general, whereas the themes found around the base form saada are place, direction and area.
During my studies and after graduating I have also worked as a research assistant in the research projects led by professor of Finnish language Jarmo Jantunen at the University of Jyväskylä. The research projects study how homo and hetero sexual people are discussed in the media (Jantunen 2018) and what kind of discourses arise when the discussion concerns different cities in the Metropolitan area (forthcoming). During these research projects I have learned about the Computer Assisted Discourse Studies (CADS). At the moment I am working on the research plan for applying for the doctoral studies during the autumn.
Corpora will provide data for my research in the future, too: I intend to use machine learning to study discourses in The Suomi 24 Corpus, related to the Metropolitan area.
For the Master’s thesis I compiled the data from the International Corpus of Learner Finnish, ICFLI International Corpus of Learner Finnish, ICFLI
The corpus comprises texts written by students of Finnish as a foreign language which have been categorized according to the Common European Framework of Reference for Languages (CEFR) / to reference levels. I used the texts of the advanced students because the reference data was compiled of the texts by the native Finnish speakers. The variety of texts (essays, summaries, emails, job applications…) made it possible to study learner language widely instead of studying features that are typical to only a specific genre only, or the impact of a specific native tongue.
The Suomi 24 Corpus provided by Kielipankki has offered data for the other studies. It has been possible to sample smaller subcorpora from the data based on the search results, such as the subcorpora of homos and heteros and the subcorpora of the different cities in the Metropolitan area to provide access to discourses in these subcorpora.
Tarvainen, Jenny 2018: SAADA-verbin fraseologiaa: vertaileva korpustutkimus oppijan- ja natiivikielestä. Master’s thesis. University of Jyväskylä. https://jyx.jyu.fi/handle/123456789/59273?show=full
Jantunen, Jarmo H. 2018: Homot ja heterot Suomi24:ssä: analyysi digitaalisista diskursseista. Puhe ja kieli, 38(1), 3–22. https://doi.org/10.23997/pk.65488
The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.
All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive.