Researcher of the Month: Emma Sepänaho

Emma Sepänaho - kuva: Sofia Tikanmäki
Photo: Sofia Tikanmäki


Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Emma Sepänaho, graduate student at the University of Helsinki tells us about how she makes use of the resources Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s, Version 2.

Who are you?

I am Emma Sepänaho, a fourth-year student of the Finnish language at the University of Helsinki. I am currently working on my pro gradu thesis on easy-to-read Finnish.

What is your research topic?

In my pro gradu thesis I study long words in easy-to-read Finnish media texts. At the moment I intend to concentrate on lemmas with 20 characters or more. I aim to do morphological analysis, study inflection and word formation, multiple morphemes and the frequencies of morphemes, as well as the semantic fields of such long words. Keeping in mind that the recommendations for easy-to-read Finnish state that the usage of long words should be avoided, it is interesting to find out that even one corpus of easy-to-read Finnish can contain more than a thousand tokens of this length. Easy-to-read language has been studied in Finland only little so far, and my study will hopefully result in valuable information about the nature of easy-to-read Finnish.

How is the research work related to Kielipankki?

I have collected the data in my thesis from the subcorpus Selkosanomat/Selkouutiset included in the Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s, Version 2, provided by Kielipankki. The subcorpus comprises of easy-to-read Finnish media texts published in Selkosanomat magazine (previously Selkouutiset) in 2006-2013. The corpus is valuable for my research because making use of the concordance tool Korp allows me to focus on specific tokens extracted with the search query, instead of searching for long words from easy-to-read language texts manually. Initially, my intention was to search for complex words with more than three syllables, but the parsing method does not currently allow such searches with satisfactory results. Fortunately, a search query defining the number of characters in the tokens produces solid data for my research.


The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive.