Encrypting data at the Language Bank

Status: Approved

Revision 8560 approved 29.3.2019 by Urpo Kaila, CSC Head of Security. (Subsequent changes are only cosmetic, only major changes will require a new review.)

Last updated: 2.4.2019


This document describes how encryption is used at the Language Bank of Finland. The intended audience are administrators and users of the Language Bank that have a need to encrypt data to secure it from unauthorized view.

Encryption has been around for a long time and the basics of secure encryption are well understood. Less well understood is the implications of long term encryption for archival purposes. Official guidelines [1] are often vague about what tool to use and how to handle key management. This document attempts to cover the whole lifecycle of the encryption and decryption process of data.


In this document we assume that both sender and recipient of the encrypted data have safeguards in place that the data is only accessible in decrypted form to authorized personnel. We also assume that there are physical and organizational safeguards in place, such as secured computing environments, backups and anti virus software. The scope of this document is to describe the secure storage and delivery of encrypted sensitive data between sender (here the Language Bank of Finland) and recipient (an authorized researcher). The authorization process of the researcher is not part of this document.

When to encrypt

Long term encryption requires a considerable administrative overhead to safely work. The challenges are not so much technical than organizational. So only encrypt if you cannot avoid it by other means, for example access restricted storage. Some data, such as data containing sensitive personal information will likely needed to be encrypted, but copyright protected data might not have such high security requirements.

Secure encryption

Usage scenario

We assume that the encrypted data needs to be available for a long time (10+ years) and shareable among authorized users.

Encryption basics

There are 2 basic encryption methods available:

  • Symmetric encryption using a shared key
  • Asymmetric encryption using a private/public key pair.

Symmetric encryption is straight forward: Data is encrypted using a password and the password is shared with the users that need to decrypt the data. The data is decrypted with the same password. The drawback of this method is that it does not scale well. The more users, the more likely it is that the password spreads to unintended audiences. Transmitting the password to the intended user is also difficult, since it has to happen via an encrypted and secured channel itself.

Asymetric encryption does not have that drawback. Data is encrypted with the public keys of the authorized users, they decrypt it with their personal private keys. Deauthorization happens by removing a public key from the encrypted data by re-encryption with authorized public keys. Public keys can be shared with anyone, since they can only encrypt, but not decrypt. Only the private keys need extra protection.

Software and algorithms

The usage scenario above demands public key cryptography. This cryptography is sufficiently secure using the following software and parameters. The software is open source and widely used,  the algorthims are also well-understood by the cryptographic community and the key lengths are an adequate compromise between security and performance.

  • GnuPG version 2 (gpg2) (from legitimate distributions like Ubuntu, Fedora)
  • Keylength 4096 bit RSA (secure enough and fast).
  • SHA256 hash algortihm

The suitability of the software and the algorithms needs to be checked regularily, once per year.

Key management

Access keys

The key pairs used to access the data need to be

  • personal (no shared keys!)
  • strong: keylength and other parameters as above

Protecting private keys

The private keys need to be well protected, to make sure only the rightful owner has access to them.

  • Make sure the private keys are only readable by your user account.
  • Secure the private key with a password.
    • Use no less than 14 characters, see [2] for ideas.
    • Make sure the password is unique and not used elsewhere.  Consider using a Password Manager.

Authorized persons at the Language Bank

Long term preserved encrypted data needs

  • Access by at least 5 authorized persons at any given time.
  • An extra master key without a password printed as QR-code and stored in a physical safe in a sealed envelope, to be used in emergency cases.

Authorization is checked once a year and no longer valid keys are removed from the data. This requires re-encryption.

Distribution of data

This section assumes that the Language Bank (”data provider”) has identified the recipient and the recipient is authorized to receive a copy of the encrypted data. In this case the data is re-encrypted with the recipient’s verified public key and signed with the Language Bank’s authorized person’s private key. This ensures the integrity of the sent data.

The recipent must agree to policies defining security standards on the recipient’s end, for example where the unencrypted data is processed, how long it can be retained, etc.

Measures against misuse

Protecting against intentional misuse of encrypted and potentially sensitive data is difficult and most measures can be circumvented. Elaborate tracing methods like watermarking can make it harder for a malicious user to spread the data without being caught as the source, but it also increases complexity in the distribution process and potentially decreases the scientific value of the data. The prevention of unauthorized copying of the decrypted data at the recipiend’s end is not scope of this document.


[1] A collection of cyber security guidelines at the Finnish Communications Regulatory Authority (FICORA, in Finnish): https://www.viestintavirasto.fi/kyberturvallisuus/ncsa-fi.html

[2] 6 Techniques For Creating Strong Passwords: https://www.lifewire.com/8-character-password-2180969

Search the Language Bank Portal:
Tommi Kurki
Researcher of the Month: Tommi Kurki



The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4140599 / +358 29 4129317