Researcher of the Month: Sam Hardwick

Sam Hardwich - kuva: Bess Hardwick
Photo: Bess Hardwick


Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Sam Hardwick, project researcher at the University of Helsinki tells us about developing some of the tools provided by the Language Bank, Kielipankki.

Who are you?

I’m a freelance consultant, researcher and programmer. I started in language technology at the University of Helsinki in a research software project called HFST. We developed code for computational morphology, which ended up being used in eg. inflecting dictionaries and spellcheckers for languages with extensive morphology (like Finnish, Sámi and Greenlandic). Since then I’ve worked on the technical side of various infrastructure and research projects, and done private consulting work.

What is your research or development work topic?

Right now I’m involved with publishing a sentiment corpus for Finnish. This is a collection of texts gathered from social media with their sentiment – whether they are positive, neutral or negative – annotated by humans. This will be the basis for automatic sentiment classification for future corpora and tools.

I’m also involved with the ANEE-project, helping to make a treebank for Akkadian, which again will be the basis of an automatic annotation tool. Hopefully we’ll be ultimately able to automatically annotate more of the texts in this ancient language.

How is the development work related to Kielipankki?

I’ve done a lot of development work directly for Kielipankki. For example, right now I’m planning an API for accessing corpora directly from code. NLP applications are more and more the domain of general machine learning people, not just language experts, and there’s a lot of interest in our data and resources.

Publications related to the resources or tools:

Hardwick, S., Enqvist, E. J., Onikki-Rantajääskö, T. A., & Linden, B.K. J. (2018). Tieteen kansallinen termipankki (TTP) ja tiedonlouhinnan apuneuvot. Poster (in Finnish) at the Annual Conference of Linguistics, Helsinki, Finland.

I’ve published demonstrations for various bits of code and analysis, some of it perhaps comprehensible in English, here:


The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

