Newsletter of the Language Bank of Finland 2/2023


Researchers of the Month in 2023

  1. Therese Lindström Tiedemann – Swedish as a second language, pseudonymisation of linguistic data
  2. Maria Sarhemaa – appellativization of first names in Finnish language
  3. Noora Hoffrén – constructed action in Finnish Sign Language and Finnish language
  4. Johanna Vaattovaara – language awareness, language attitudes
  5. Rosa González Hautamäki – within-speaker variation, human-induced voice modifications
  6. Mikael Varjo – zero-subject constructions in Finnish everyday conversation
  7. Niina Kunnas – corpus of spoken Meänkieli
  8. Nobufumi Inaba – language change, research on Old Literary Finnish
  9. Sampo Pyysalo – natural language processing (NLP), large language models (LLMs)
  10. Anna Dmitrieva – text simplification
  11. Aleksi Sahala – research on ancient text data
  12. Tiina Onikki-Rantajääskö – the Helsinki Term Bank for the Arts and Sciences

All previous researchers of the month can be found in the archive.

Do you know researchers who use the Language Bank of Finland and who might be good candidates for Researcher of the Month? Would you be one of them? Inform us:

New, updated or extended corpora in 2023

Extensions of the Newspaper and Periodical Corpus of the National Library of Finland (KLK)

The Finnish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland version 2, Korp (klk-fi-v2-korp) contains newspapers and periodicals of the digital collections of the National Library of Finland from the years 1771–2021. The corpus is available in Korp and contains over 22 billion words in total. The language of the sentences of the corpus has been identified with the HeLI-OTS language identifier. We are currently working on the corresponding Swedish sub-corpus in order to make it available via Korp during spring 2024. Read more about the KLK update here.

Korp is moving to a new server

The Korp service is currently moving to a new server, to increase performance. The process will be completed in January 2024. The available corpora and functionalities will remain the same, but searches will be faster.

In addition, a significant upgrade to the Korp software in use via the Language Bank of Finland is forthcoming during spring 2024. After the upgrade, the Korp system offered in Finland will be easier to keep in synchrony with the Korp system that is developed at Språkbanken in Sweden.

Would you like to offer your own resource to be distributed via Kielipankki?

Submit the basic details about your own resource to the Language Bank of Finland:

Instructions: Publishing a privacy notice of processing personal data for research purposes

When you obtain a resource containing personal data via the Language Bank of Finland and start processing it for a new purpose, you must prepare a privacy notice regarding the purpose of processing, publish the notice openly in electronic format, and provide a link to the notice to the Language Bank. The purpose of a privacy notice is to help data subjects understand the purposes for which their data is used. You should always primarily follow the data protection guidelines of your own organisation. In addition, the Language Bank offers some instructions to help you collect the pieces of information that are usually required for a privacy notice regarding research purposes. Read more

Courses and training materials

The online course Corpus Linguistics and Statistical Methods (5 ECTS) will be offered again in Jan-Feb 2024 and it can be taken either in Finnish or in English. The course is open to all universities and you can also participate in it from outside Finland. Course details

The FIN-CLARIAH research infrastructure received funding for 2024–25

FIN-CLARIAH, the national research infrastructure for Social Sciences and Humanities in Finland, received FIRI funding from the Research Council of Finland for continuing its work in the period of 2024–25. FIN-CLARIAH consists of two components, FIN-CLARIN and DARIAH-FI. On December 1st, the FIN-CLARIAH people gathered together in Tampere to discuss their achievements during the past two years. You can find the deliverables produced in the FIN-CLARIAH project on the Language Bank website.

ParlaCLARIN IV in May 2024 (Turin, Italy): workshops, demos and a call for papers

During the past few years, extensive parliamentary datasets from different countries have been processed within CLARIN with the aim of compiling them in a format that allows for research in various disciplines. Researchers and developers of parliamentary resources are invited to join the ParlaCLARIN workshop to be held in Turin, Italy in May as part of the LREC2024 conference. Deadline for submissions: 19.2.2024. Read more:

CLARIN funding opportunities

Did you know that CLARIN offers grants for, e.g., researcher and teacher mobility, events and training activities? Check out the funding opportunities and current calls:

New video: ”Introduction to the Language Bank of Finland”

The brief introductory video (4 min 40 s) offers a summary of the corpora, tools, other services and opportunities for depositing your own resource that are available via the Language Bank of Finland. The video comes with Finnish and English subtitles and it can be found on our YouTube channel. Another version of the video will soon be available with examples in English.

The Language Bank Of Finland is on vacation during 23.12.2023–7.1.2024

We wish you a relaxing holiday season!

Mietta Lennes and Wilhelmina Dyster
Project Planners


Subscribe/unsubscribe to this newsletter:

See also the CLARIN Newsflash: