<< List of all deliverables

D1.1.1: Updating LBF resource selection

Project: FIN-CLARIAH
Grant agreement: Academy of Finland no. 345610
Start date: 01-01-2022
Duration: 24 months

WP 1.1: Report on Updating LBF resource selection
Date of reporting: 2022-09

Report author: Jussi Piitulainen (UHEL)
Contributors: Ute Dieckmann, Varpu Vehomäki, Krister Lindén, Mietta Lennes (UHEL)
Deliverable location: Corpora | Kielipankki

Description

The Kielipankki data sets are available in appropriate channels: the download service, the Korp concordance engine, and a data directory in the Puhti computing enviroment. The data sets have persistent identifiers and are documented in public metadata records, resource family pages, and resource group pages.

We are in progress updating data sets (Suomi24, STT newswire) with Universal Dependencies (UD2) annotations in addition to the previous annotation model. We are in progress using automatic language identification to separate the Finnish and Swedish texts in a large new batch of the National Library newspaper corpus (KLK). Data sets in the ingestion pipeline are being documented and prioritized to become available in the appropriate Kielipankki channels.

Why should you deposit your resource with the Language Bank of Finland?

When you deposit a corpus or a tool to be distributed via the Language Bank of Finland maintained by FIN-CLARIN, your work will gain more visibility and your resource will be available for users. Many Finnish research funding organizations recommend that all research data containing language be deposited with the Language Bank of Finland.

If a corpus or tool is readily available, it will be used and cited more often. A unique, persistent identifier and citation instructions are assigned to each resource that is distributed via the Language Bank of Finland. This makes it easy for you and others to refer to your resource in publications. The language resources you deposited can also be included in your CV.

In some cases, it is not possible to make a resource openly available. The terms and conditions for distributing your resource will be agreed by us with you. If necessary, it is possible to restrict access to the resource for identified users only, or to individual users who are granted access based on a research plan they presented. In the latter case, access rights can be managed conveniently in our online service called Language Bank Rights.

Inform the Language Bank of Finland about a forthcoming language resource