Newsletter of the Language Bank of Finland 1/2024


The ’Researcher of the Month’ series reached a milestone

Ever since the spring of 2016, the Language Bank of Finland has regularly published the Researcher of the Month series online. In these blog texts, researchers from different fields present their work and tell about the language resources they use. Now, we are already celebrating the 100th researcher profile!

Researchers of the Month in 2024

  1. Liisa Mustanoja – sociolinguistics, sociophonetics, spoken language of Tampere
  2. Tanja Säily – variation and change in the English language, historical corpus linguistics
  3. Harri Uusitalo – historical linguistics, ecolinguistics
  4. Lotta Leiwo – folkloristics, the T-Bone Slim Corpus
  5. Juraj Šimko – phonetics, speech synthesis
  6. Krister Lindén – The Language Bank of Finland (100th researcher profile)

All previous researchers of the month can be found in the archive.

Do you know researchers who use the Language Bank of Finland and who might be good candidates for Researcher of the Month? Would you be one of them? Inform us:

New, updated or extended corpora in 2024

New or updated tools in 2024

Stay tuned for upcoming corpora in the autumn

  • Version 2 of the Swedish Sub-corpus of the The Newspaper and Periodical Corpus of the National Library of Finland is currently being prepared for publication via the Korp service.
  • An update with new sign language material will be added to the CFinSL corpus in the autumn.
  • The much-anticipated update to the Suomi 24 corpus is also in preparation, including forum posts from the years 2021-2023.
  • Among others, corpora containing spoken language from Satakunta and Tampere are also upcoming.

Change in metadata platform

The Language Bank of Finland maintains metadata records of all the resources it distributes. Each individual resource version has its own metadata record with a persistent identifier.

For providing the metadata records, the Language Bank has been using a platform called META-SHARE, but the system is no longer supported. All our currently existing metadata records have been moved to COMEDI, a service hosted by a Norwegian CLARIN centre, CLARINO Bergen. The persistent identifiers of the metadata records curated by the Language Bank of Finland now point to the corresponding records on the COMEDI system.

Please note that, although the metadata records now look a bit different, the content and location of the actual language resources remain unchanged.

Korp upgrade in the final stage

To increase performance, the Korp service was moved to a new server early this year. A major upgrade to the Korp software in use via the Language Bank of Finland is to be completed in the autumn 2024. After the upgrade, the Korp instance in Finland will be easier to keep in synchrony with the Korp system that is developed at Språkbanken in Sweden.

Mylly service has been discontinued

Due to very low usage, the Mylly service was shut down. If you still have data in Mylly or in case you wish to utilise the Mylly tool scripts on other services, read the instructions here.

The file formats recommended by the Language Bank of Finland

In the CLARIN Standards Information System, the Language Bank of Finland now provides a list of the file formats that are supported and recommended when depositing language resources. We would be happy to receive your feedback.

Would you like to offer your own resource to be distributed via Kielipankki?

Submit the basic details about your own resource to the Language Bank of Finland:

Courses and training materials

The online course Corpus Linguistics and Statistical Methods (5 ECTS) will be offered again in Sep-Oct 2024 and again in early 2025. In the second period, we organize the course Introduction to Speech Analysis. The Data Clinic kicks off in mid-November and is intended for students planning the data management for their MA or PhD thesis. These online courses are open to students from all universities and you can also participate in them from outside Finland.

Recent events:

The summer meeting of the FIN-CLARIAH research infrastructure

FIN-CLARIAH consists of two components, FIN-CLARIN and DARIAH-FI. On June 10th, the FIN-CLARIAH people gathered together in Helsinki to discuss the potential of artificial intelligence in research within the humanities and social sciences. In addition, the objectives of the FIN-CLARIAH project were outlined in posters, which can be found on the Kielipankki website.

Language Bank of Finland participated in the LREC-COLING 2024 conference workshops

The staff of the Language Bank of Finland gave presentations in the workshops of the LREC-COLING 2024 conference in Turin, Italy. Jussi Piitulainen attended the ParlaCLARIN IV workshop and presented ”Investigating Multilinguality in the Plenary Sessions of the Parliament of Finland with Automatic Language Identification”. Tommi Jauhiainenpresented at the SIGUL 2024 workshop with the title ”Improving Language Coverage on HeLI-OTS”. The new languages and improved language models described in the presentations are part of the HeLI-OTS 2.0 language identifier, published this spring.

European Language Data Space (LDS) workshop was organized in Finland

The European Language Data Space and the University of Helsinki brought together experts from the Finnish Industry, Public Administration and Research to discuss the importance of language data for the development of Language Technologies and AI-based tools in Finland. The LDS workshop aimed to raise awareness on the European Commission’s objectives as well as the new business opportunities for the commercialization of language data. The workshop programme can be found on the LDS event page, where links to the recordings and presentations will be added later.

The panel discussions in the LDS workshop featured representatives from the public and private sector. Many of them are involved in the LAREINA project, which aims to develop speech technology for Finnish, Finland-Swedish and the Sámi languages.

CLARIN funding opportunities

Did you know that CLARIN offers grants for, e.g., researcher and teacher mobility, events and training activities? Check out the funding opportunities and current calls:

The Language Bank Of Finland is on vacation during 20.6.2024 – 4.8.2024

We wish you a relaxing summer!

Mietta Lennes and Wilhelmina Dyster
Project Planners


Subscribe/unsubscribe to this newsletter:

See also the CLARIN Newsflash:

Pink flower in an urban environment