Researcher of the Month: Viljami Haakana

Viljami Haakana - kuva: Mika Federley
Photo: Mika Federley


Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Viljami Haakana, a student and research assistant at University of Helsinki tells us about how he makes use of the resource The Finnish Dialect Syntax Archive.

Who are you?

I am Viljami Haakana, a sixth year student of general linguistics at the University of Helsinki and a research assistant for Associate Professor Kaius Sinnemäki in a subproject within his larger ERC project Linguistic Adaptation (GramAdapt). My contribution in the project mostly deals with coding scripts for processing the data.

What is the research topic?

The aim of the project is to find out if there are differences in the probability of the occurrence of third person number agreement on the predicate between the two third person plural subject pronouns he or ne in the Finnish language (both refer to “they”; in the standard language, the former refers to humans and the latter refers to nonhumans but in spoken language it’s more complicated as ne is commonly used of people as well). In other words, we are trying to find out if the probability of the third person plural ending –vat/-vät or its dialect counterpart occurring after the pronoun he is greater than after the pronoun ne. Variation in the number agreement has been studied for decades in the study of the Finnish language, but to our knowledge not from the point of view whether it occurs more often with the pronoun he or the pronoun ne.

How is the research work related to Kielipankki?

The research is carried out on The Finnish Dialect Syntax Archive data, tentatively on the version published in the Korp interface The Finnish Dialect Syntax Archive’s Helsinki Korp Version, and afterwards on the version available in the Language Bank of Finland’s Download service, The Finnish Dialect Syntax Archive’s Helsinki Download Version. We search the data for the instances where the subject he or ne (or the dialectal equivalents e.g. hyö [eastern equivalent of he], nee [dialectal equivalent of ne] etc.) is followed by the predicate verb in third person, in the singular or plural. The instances are then divided in four categories based on which pronoun is identified and if number agreement exists or not. I have programmed a Python script that searches for these instances in the version of the corpus that is available in the Download service ( and writes the results to a file. In addition the script keeps track of the sociolinguistic information of each instance, such as the speaker’s age, sex, and dialect. There are approximately 7.000 instances in the corpus. At a general level our aim is to study those linguistic and sociolinguistic structures that may affect variation in number agreement on the predicate.

Publications related to the corpora

Ikola, Osmo, Ulla Palomäki & Anna-Kaisa Koitto 1989. Suomen murteiden lauseoppia ja tekstikielioppia. Suomalaisen Kirjallisuuden Seuran Toimituksia 511. Helsinki: Suomalaisen Kirjallisuuden Seura.Väänänen, Milja 2016. Subjektin ilmaiseminen yksikön ensimmäisessä persoonassa: Tutkimus suomen vanhoista murteista. Annales Universitatis Turkuensis C 430. Turku: Turun yliopisto.


The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive.