Guidelines for processing corpora containing personal data in the Language Bank of Finland


URN for this page:

You are required to follow these guidelines when processing corpora from the Language Bank of Finland that contain personal data.

NB: This page contains a preliminary English translation of the corresponding Finnish guidelines. In case you find the content unclear, please contact FIN-CLARIN. However, if you have more specific questions related to local procedures at your university or to the content of your Privacy Notice, we advise you to consult the Data Protection Officer of your home organization.

How do I know if the corpus contains personal data?

If a corpus available via the Language Bank of Finland contains personal data, the license conditions will include the following tag:

PRIV: There are personal data in the resource.

The license information for an individual corpus can be found on the list of corpora of the Language Bank of Finland as well as in the metadata record of the resource in question. The metadata can be accessed via the persistent identifier of the corpus (i.e., the URN address included in the citation instructions).

Description of the personal data included in a corpus

The metadata of a corpus tagged with the PRIV condition may include a separate description of the personal data included in the corpus. Among other details, the description of personal data should provide the following information:

  • the Data Controller of the original corpus or data set
  • the types of personal data and the groups of data subjects that are included in the resource
  • the original legal basis of processing of the personal data
  • a description of the purposes for which access to the data can be granted via the Language Bank of Finland
  • the potential more detailed instructions for processing the data in question.

In case you are unable locate a description of the personal data for a specific corpus and you cannot find the corresponding details in the general description of the resource, please contact the service address of FIN-CLARIN for more details: fin-clarin (ATT)

How should you process a corpus that includes personal data?

By using the corpora and other resources available in the Language Bank of Finland, you agree to the General Terms of Use of the Language Bank as well as to the corpus-specific license conditions.

When using a corpus with the license condition PRIV, you must commit to processing the personal data confidentially, carefully and only for the purpose for which you were granted permission to access the data.

  • In case you were granted access on the basis of a personal application and in case you presented a research plan or another description of the purpose of use, you may only use the material for the specified purpose. When applying for the use of a specific resource, you may also be notified about additional restrictions that apply to the processing of the resource.
  • In case you are granted access to the resource without a separate application but you are required to log in as a researcher or as a student, you are only allowed to process the data for research purposes or for your personal study purposes.

When processing personal data, apply sufficient safeguards according to the instructions provided by your home organization. Note that additional safeguards may be in order if processing sensitive personal data (that belong to so-called special data categories).

Remember your duties as a Data Controller

When you start processing a corpus that contains personal data which you obtained via the Language Bank of Finland for a new research project or for some other purpose, you and/or your home organization will become the Data Controller, with regard to your purpose of use. When requested, the Data Controller is, e.g., obliged to show that the processing of the personal data has been lawful.

When processing personal data, you should primarily follow the instructions and guidelines given by your home organization. In case no such instructions are available, please refer to, e.g., the Data Management Guidelines (published by the Finnish Social Science Data Archive) when planning your data processing activities.

Remember to make a Privacy Notice

As a Data Controller, you are usually required to provide a Privacy Notice concerning the personal data processing conducted by you. Again, please follow the instructions and guidelines given by your home organization.

When you start using a corpus in the Language Bank of Finland and the corpus includes personal data, you should publish the Privacy Notice regarding your purpose of use for the resource. The Privacy Notice can be published on the website of your home organization, for instance.

When compiling the Privacy Notice of your project, you may need to refer to details in the original Privacy Notice of the corpus, or to the description of the personal data included in the metadata of the corpus.

Submit the brief title of your project and the link to the publicly available Privacy Notice to the Language Bank of Finland by using this form. The link will then be published on the Language Bank website, so as to make the information accessible for all interested parties.

Apply protective measures

Follow the instructions of your home organization. If required, you may also check out a few examples of safeguards, including some that are usually applied by the Language Bank of Finland (examples in Finnish only).

Personal data in scientific presentations and publications

When creating scientific publications and giving scientific presentations, you must process personal data responsibly and according to good ethical practices.

When reporting the results of scientific research, personal data must primarily be either completely removed or pseudonymized, which can be achieved, e.g., by grouping the ages of the research subjects, place names, etc. into larger categories. The aim is to prevent the participants from being identified either according to the data that is included in the publication or presentation or by combining with other data.

However, in some cases it may be necessary to include personal data in scientific publications and presentations. For instance, brief samples of the data may need to be included in a research article, of a fragment needs to be played back for the audience in a scientific conference. You should carefully consider the potential risks and other effects this might have on the research subjects or to people close to them. It is important to include only the required content in the presented samples, and all unnecessary information and details should be removed or pseudonymized by using the appropriate methods.

Please note that in case the research subjects have been explicitly informed that none of their personal data will be published, and in case it is not possible to make the samples fully anonymous, you may not publish or present the personal data without contacting the research subjects again for their specific consent.

Several purposes of use?

In case a restricted +PRIV-tagged corpus needs to be processed for several different purposes – e.g., you notice later that you wish to conduct a new study that is not directly compatible with your original research – you need to apply for separate permission to use the corpus for each different purpose. Naturally, you will also need to mention all purposes in your Privacy Notice.

Errors and misconduct

In case you notice that a corpus or resource contains some personal data that you think it should not contain according to the resource description, you must notify, without delay, either the Language Bank of Finland or the Data Controller of the original resource. Similarly, in case you have reason to suspect that personal data may have fallen into the wrong hands, notify the Language Bank of Finland or the original Data Controller as soon as possible.

Privacy practices of the Language Bank of Finland

Search the Language Bank Portal:
Emmi Lahti
Researcher of the Month: Emmi Lahti



The Language Bank's technical support:
kielipankki (at)
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at)
tel. +358 29 4140599 / +358 29 4129317