Processing of Sensitive Data: Table of Contents and Introduction

This page describes the current state (06/2026) of ”self-service” options for processing ”sensitive data” at CSC. It describes the following environments:

  • High Performance Computing (Supercomputers)
    • LUMI
    • Roihu
  • Self-service virtual machines
    • cPouta
    • Sensitive Data Services
  • Containers  (Rahti)

The soon to be decommissioned supercomputers Puhti and Mahti are not covered.

Defintions

Sensitive Data

This document follows CSC’s definition of Sensitive Data. We assume the processing of data that should not leak but where a leak is not catastrophic, for example subsets of copyrighted data as opposed to company or state secrets. The aim is to describe CSC’s systems in a way that gives the data owner the possibility to make an informed decision about the fitness for purpose of the various processing environments.

Pricipal Investigator

A Principal Investigators or ”PI” is the leader of a scientific research project.

Project

A ”project” is a set of resources managed by a Principal Investigator. This includes compute resources, storage, authorized users.

The environments

CSC essentially provides three different environments for data processing in general.

  1. High Performance Computing (HPC) for massive parallel processing and very large datasets
  2. Self-service virtual machines which can support sensitive data processing
  3. The Rahti container service

These environments are described below.

High Performance Computing (HPC)

Common data security features of all HPC systems

CSC’s HPC systems are secured against unauthorized access and data leaks as described below.

Access

Principal Investigators can request Projects and manage their members. Projects are assigned a space on the shared HPC file system that only they have access to. Projects cannot by themselves give access to their project to other projects. PIs can, however, add any registered user on the HPC system to their project. Project members are required to use SSH keys to access the systems, these keys need to have a passphrase. Access only via passwords is not possible. Keys need to be added to the Project via web interface that can only be accessed using Multi Factor Authentication (MFA).

Data security

As mentioned above, a project’s data is only accessible to the project in question. However, the data in the project space is not encrypted on the file system level. Also the transmission of data between compute nodes is not encrypted. If a malicious user is able to gain admin priviledges, for exaple by exploiting a Zero Day Vulnerability, this user can read all other users’ data stored on the shared filesystems. To mitigate this risk, CSC’s System Administrators actively monitor the HPC system and the operating system’s security announcements. CSC has a policy to make sure that the physical disk drives are securely disposed of in case of replacement. Presently CSC’s HPC systems are not suitable for the processing of very sensitive data, such as medical personal data or state secrets and the processing of personal data requires caution.

Below the individual HPC systems are described in more detail from a security perspective.

LUMI

The LUMI Supercomputer has no additional security feature in addition to the ones described above.

Roihu

In addition to the features described above, the not yet released Roihu uses temporary SSH certificates that the user has to re-generate once per day using MFA.

Self-service virtual machines

Virtual machines are essentially operating systems running on top of software instead of hardware. This makes it possible to dynamically offer computing environments. CSC offers two types of virtual machines for self service: cPouta and SD-Desktop as part of the Sensitive Data Services described below. Both offerings are based on OpenStack but are very different in terms of data security. The main differences are summarized below. For details refer to the respective documentation. SD-Desktop is based on ePouta, a virtual private cloud that is also offered as a separate service, but is out of scope for this document, since ePouta is not ”self-service”.

cPouta

cPouta enables users to create and administer virtual machines (”VMs”) with full operating system access (”root”). The VMs are isolated against other user’s VMs and shell access can be restricted to a subset of project members. The superuser of the VM can configure it in any way he or she likes. The downside of this flexibility is that the superuser is also responsible for possible misconfigurations. CSC only takes responsibility of the underlying OpenStack management software. It should be noted that CSC’s cPouta admins have the theoretical ability to list all existing VMs and could take copies of them. This, however, forbidden by policy. cPouta VMs can be configured to be accessible to the internet and via Secure Shell (ssh).

Sensitive Data Services

CSC’s Sensitive Data Services are a collection of services specifically designed to process sensitive data. Data will be encrypted at upload and at rest and only visible to the virtual machines created by the same project. The machines are not connected to the internet, they are only accessible by a virtual desktop connection which requires multifactor authentication.  It is possible for the project’s PI to share the data with other projects. Only the PI can export result data from the isolated environment. Note that the encryption keys are managed by CSC, this means that in theory admins with very high priviledges could get access to them. A user can mitigate this by adding an extra layer of encryption to the data where the user controls the keys. However, this also has drawbacks: Data sharing between projects is not easily possible anymore and key handling is manual, lost keys can mean lost data.

Rahti

Rahti is CSC’s container environment based on OKD. Containers are not virtual machines. Containers can be used for web services which in turn might process sensitive data. Note that the user is responsible for keeping containers up-to-date. Rahti provides a separate Security Guide.

Summary and Discussion

The table below summarizes the advantages and disadvantages of the different environments for data processing in general and sensitive data processing in particular.

Environment Advantages Disadvantages Notes
LUMI
  • Massive parallel processing
  • International access via EuroHPC
  • Data potentially more exposed
Roihu
  • Massive parallel processing
  • National supercomputer, easy access via my.csc.fi
  • Data potentially more exposed
Expected for general availability in June 2026
cPouta
  • Virtual Machines with full admin rights
  • Accessible internet
  • VM admin is fully responsible for security
  • Limited usability for resource intensive computation
SD Desktop (VM of Sensitive Data Services)
  • Virtual Machines without admin rights
  • Not accessible via internet
  • data can be moved only by PI
  • Consumes considerably more resources than cPouta
  • Software cannot be easily installed
  • Limited usability for resource intensive computation
Rahti
  • Containers use little resources when idle
  • Accessible via internet
  • Accessible via internet
  • User is fully responsible for security on container level
  • Limited usability for resource intensive computation
  • Steeper learning curve relative to cPouta

With the exception of Sensitive Data Services CSC’s systems are not designed to process sensitive data. Depending on the use case Sensitive Data Services might not be suitable for data processing for other reasons, e.g. if massive parallel processing is required. In that case the user has several options:

  • pre-process the data in SD Desktop in a way to reduce sensitivity and process the result in LUMI or Roihu
    • Example: convert spoken audio into text and remove names and places.
  • encrypt the raw data at rest and decyrpt only during processing, immediately delete already processed data

The examples above require a good understanding of encryption technology and put a lot of the burden onto the user to manage encryption keys, copy data, etc. CSC is working on solutions to make data transfer between SD Desktop and Roihu (and later LUMI) easier and ensure that the data is processed on Roihu using encryption. We are also piloting another solution which should make data transfer between a user controlled environment (like a personal laptop) and LUMI more secure and manageable.

Hae Kielipankki-portaalista:
Minna Sääskilahti
Kuukauden tutkija: Minna Sääskilahti

 

Tulevat tapahtumat


Yhteystiedot

Kielipankin tekninen ylläpito:
kielipankki (ät) csc.fi
p. 09 4572001

Aineistoihin ja muuhun sisältöön liittyvät asiat:
fin-clarin (ät) helsinki.fi
p. 029 4129317

Tarkemmat yhteystiedot