Long-term preservation plan of the Language Bank of Finland

This plan outlines the Language Bank’s approach towards preserving different types of language resources.


The Language Bank accepts and makes available resources with a natural language component in the data or the metadata that have been produced by researchers in Finland or by researchers of Finnish or Fenno-Ugric languages.

FIN-CLARIN is responsible for the long-term preservation of the deposited data. The Language Bank does not delete deposited resources without their owner’s consent.

FIN-CLARIN employs a corpus production line framework for preparing, depositing and enriching language resources. FIN-CLARIN’s experts provide depositors with support and consultation during the different phases of a language resource’s lifespan.

The Language Bank has a three-tiered categorization of the service level provided for each deposited resource:

  1. The resource is under active development. The Language Bank of Finland fixes any issues as soon as possible. Service level A data is curated by the Language Bank and migrated to new formats when necessary.
  2. The resource is developed only upon user request. The Language Bank of Finland aims to fix issues concerning the resource, but external contributions may be required.
  3. The resource is available “as is”. The Language Bank of Finland does not fix nor develop the resource.

The national coordinator of FIN-CLARIN makes the ultimate decision to deposit a new resource and allocate its service level. The legal team for research support at the University of Helsinki participates in preparing new agreements and allocating appropriate license categories.


In addition to the service levels, each deposited corpus also belongs to a preservation category. The preservation category is independent from the service level, although they may be affected by the same factors.

  1. The corpus is preserved for five years, then reassessed by the FIN-CLARIN management group. According to the reassessment result, the corpus is either removed, preserved for an additional five years, or moved to category 2.
  2. The corpus is preserved indefinitely. In practice, this means 50 years or more.

In contrast, for software used in the Language Bank, the focus is on the function of each piece of software. Individual pieces of software are predominantly not preserved as such and may be updated or replaced by another serving the same function. Migration from one software version to another or to a whole new software is documented in the Language Bank Portal and communicated to the users.

