Preliminary evaluation of data protection

In case personal data are processed in your research project and there are high risks associated with the processing, you are required by data processing regulations to carry out a data protection impact assessment (DPIA) before starting to process the personal data. The higher risks the processing involves, the more carefully you need to protect the data. Consider the protection measures and methods you can use so as to minimize or eliminate the risks.

The list of questions on this page is intended to help you plan your research project. You can use the questions to make a preliminary assessment of the risks that may be involved in the processing of personal data in your research. A data protection impact assessment is likely to be required if you answer ”yes” to more than one of the ten questions. Please note that the interpretations of the questions may vary in practice, and the individual criteria mentioned under each question are suggestions only.

When processing personal data, you should primarily follow the instructions given by the data controller. Therefore, you must always check with your home organization whether and how you are required to carry out the data protection impact assessment.

Further information regarding data processing impact assessment is available on the website of the Office of the Data Protection Ombudsman.

Preliminary evaluation questions

1. Will personal data be processed on a large scale?

Processing can be considered as large-scale processing if, for example:

  • There are more than 10 000 research participants/data subjects
  • A large amount of data about the same individual is collected
  • Data is collected about a large portion of the members of a specific group (for example, a large portion of the members of a small ethnic group or the employees of a certain employer)
  • The processing is permanent or long in duration
  • The processing is geographically extensive

2. Will sensitive or highly personal data be processed?

Sensitive or highly personal data includes:

  • Data concerning health
  • Location data (monitoring the movement of a person)
  • Genetic data
  • Biometric data for the purpose of identifying a person
  • Racial or ethnic origin
  • Political opinions
  • Religious or philosophical beliefs
  • Trade union membership
  • Sex life or sexual orientation
  • Data concerning criminal convictions or offences
  • Financial data that might be used for payment fraud
  • Electronic communication (such as emails)
  • Data otherwise considered as very personal (such as notes and diaries)

3. Will there be exceptions to the following rights of data subjects:

  • Informing participants about the project
  • Right to receive copies of data processed about the participant
  • Right to rectify inaccurate personal data
  • Right to restriction of processing
  • Right to object to the processing of personal data (for example, if the processing takes place in a public place, discussion board etc. where data subjects cannot avoid the collection of data)

4. Will data from multiple datasets be combined in a way that is unpredictable to the data subjects?

  • For example, combining data collected for two different purposes or data held by two different data controllers

5. Will the research involve the processing of data concerning individuals who are in a vulnerable position and for whom it may be difficult to exercise the rights of data subjects?

  • e.g., children, the elderly, asylum seekers and patients

6. Will the processing involve automated decision-making (meaning a decision with no human involvement) and/or profiling that may produce significant effects to the participant?

  • Significant effects or legal effects may include exclusion, discrimination, significant impact on privacy, determining the compensation of a participant on the basis of automated decisions etc.

7. Will personal data be used for evaluation or scoring of participants?

  • For example, assessing or predicting disease/health risks or creating a profile based on an individual’s behavior

8. Does the research involve systematic monitoring of the participants?

9. Will new technology be used for processing of personal data in an innovative way?

  • Will data be collected or processed in a novel way?
  • Are the consequences of the use of the new technology unknown?

10. If the research material/data is published or if it would be leaked to the public, could it cause significant harm to data subjects?

  • e.g., threat of violence or persecution

Last updated 6.9.2021


Brochure about the Language Bank of Finland for research participants

Research participants should be given sufficient details regarding the study for which personal data are to be collected. It is recommended that the following brochure be used as a supplement to the rest of the information that is provided to the research participants. The brochure includes basic information on the Language Bank of Finland and on the process of storing research materials for further use in the long term.

Last updated 10.5.2022

Guidelines for processing corpora stored in the Language Bank of Finland that contain personal data


URN for this page:

Always comply with these guidelines when processing corpora obtained from the Language Bank of Finland that contain personal data.

Does the corpus contain personal data?

Corpora stored in the Language Bank of Finland that contain personal data have the following label in their licence:

PRIV: There are personal data in the resource.

The licence details of individual corpora can be found in the corpora listing of the Language Bank of Finland next to the corpus in question as well as in its metadata, which can be accessed using the persistent identifier assigned to the corpus (i.e., the URN address included in the citation instructions).

Resource-specific data protection terms and conditions

All corpora labelled PRIV contain a separate description of the resource-specific data protection terms and conditions, including the following details:

  • Data controller of the personal data that are distributed via the Language Bank of Finland
  • Types of personal data and data subject groups included in the corpus 
  • Description of the purposes for which the corpus can be further distributed by the Language Bank of Finland
  • Restrictions regarding the location and transfer of the personal data to countries outside Finland
  • Further processing instructions pertaining to personal data in the specific corpus, if any

The creation of resource-specific data protection pages is currently in progress. In case you discover that a separate description of the data protection terms and conditions for a specific corpus is not yet available and you cannot find corresponding information in the metadata of the resource, please request clarification from the FIN-CLARIN service address: fin-clarin(at)

How to process corpora that contain personal data?

By using the corpora stored in the Language Bank of Finland, you undertake to comply with the general terms of use of the Language Bank of Finland as well as corpus-specific special terms. 

When using a PRIV-labelled corpus, you undertake to process the personal data included in it confidentially, carefully and solely for the purpose for which you were granted access to the corpus. Further restrictions are described in the resource-specific data protection terms and conditions that are published along with the corpus-specific license.

  • If you are granted access to a corpus on the basis of a personal application and you have presented a research plan or a similar description of the purpose in connection with the application, you can use the corpus only for the purpose stated. Additional restrictions which apply to individual corpora are stated in resource-specific license and data protection terms and conditions.
  • If you gain access to a corpus without a separate application, but access requires logging in as a researcher or student, the corpus can be processed only for research and teaching purposes. Additional restrictions which apply to individual corpora are stated in resource-specific license and data protection terms and conditions.

When processing corpora that contain personal data, please apply sufficient protective measures in accordance with the instructions provided by your own organisation. Special care is needed when processing corpora that contain sensitive personal data (also known as special categories of personal data).

Carry out your duties as the data controller

When starting to process a corpus obtained through the Language Bank of Finland that contains personal data for the purposes of new research or another purpose, you and/or your home organisation assume the role of data controller for the corpus. Among other responsibilities, the controller is obliged to demonstrate the lawfulness of the processing of personal data, when necessary.

The instructions provided by your own organisation must be observed in the first instance when processing personal data. If instructions provided by your home organisation are unavailable, you can familiarise yourself, for example, with the Data Management Guidelines published by the Finnish Social Science Data Archive when planning the processing.

Remember to draw up a privacy notice

As the controller, you must usually draw up a privacy notice on the processing of personal data. Comply with the instructions provided by your own organisation in this instance as well. When drawing up a privacy notice, you can utilise the privacy notice associated with the original corpus, or the description of the personal data included in it.

When starting to use a corpus stored in the Language Bank of Finland that contains personal data, first publish the privacy notice pertaining to your purpose of processing, for example, on a website provided by your organisation. You can share a short title of your project that is understandable to the general public as well as a link to the openly available privacy notice by using this formWe publish this information on the Language Bank of Finland website to make it available to anyone interested in the purposes for which the corpus is used.

Apply proportionate protective measures

Comply with the guidelines of your own organisation. When necessary, you can view examples of protective measures employed by the Language Bank of Finland and other potential measures which you may need when processing personal data. 

Personal data in scientific presentations and publications

Personal data must also be processed responsibly and in compliance with good ethics when creating scientific publications and presentations based on corpora.

When reporting on the results of scientific research, personal data must be, as a rule, removed or redacted, for example, by pseudonymisation and by classifying data subjects’ age, domicile and other details into more extensive categories so that study participants cannot be identified on the basis of such details or by combining them with other data.

In certain cases, presenting scientific research results requires the presentation of data that contain personal data. For example, it may be necessary to link short individual samples from the corpus to a scientific article, or a specific section must be presented in connection with a conference presentation. However, carefully consider the potential impact on and risk to the study subjects, their family members or others close to them associated with publishing or presenting samples that contain personal data. The scope of the samples intended for publication must not exceed the scientific purposes, and all unnecessary personal data must be removed or pseudonymised from the samples using appropriate means.

Please also note that if the study subjects have been, for some reason, clearly informed that no personal data associated with them will be published, and the sample to be published cannot be fully anonymised, a separate consent for publishing the sample must be requested from the subjects.

Several purposes? 

If a PRIV-labelled corpus, which requires access rights, is to be processed for more than one purpose – for example, if at a later date there is a wish to carry out a new study not directly connected to the previous topic – access rights must be applied for from the Language Bank of Finland separately for each purpose. Naturally, all grounds for the processing must be stated in the privacy notice(s).

Errors and misconduct

If you come across personal data which you believe should not be included in a corpus based on its description, please report the matter immediately to the Language Bank of Finland and/or directly to the controller of the data. This also applies to instances where you suspect that personal data have, for some reason, fallen into the wrong hands.

Privacy practices of the Language Bank of Finland

Last updated 30.8.2021