Sensitive Data Planning Meeting

Time and place: 28.3.2018 13-14.30, CSC

Present: Satu Saalasti, Mietta Lennes (HY), Martin Matthiesen (CSC)

Agenda

General discussion

We discussed a possible use case of sensitive data to be shared. We looked at the issue from various angles:

What type of data?
How is it collected?
Can and should it be split into subcollections?
How can it be distributed?
How can secure access on the user’s machine be facilitated?

Concrete next steps

Data to be gathered

Satu proposed to gather verbal motoric data from children age 3-7 with and without speech impairments (the latter serving as control). Permission to distribute will be requested from the children’s guardians.

The gathering with happen first under a controlled environment, the following data will be gathered:

Audio
Video (of the face)
Sensor data from sensors attached to the face

The following derived data will be created

Transscriptions of the utterings (readily anonymised, personal data of the data subject is already kept separately)
The sensor data will be alligned with the video recording

The children will be asked to repeat a set of pre-defindes words and utterances a few times.

Data distribution

We discussed several options as to what to distribute. It is possible to distribute only the (anonymised) transskripts.

We decided to concentrate on the distribution for the full dataset (audio, video, sensor data, aligned transcriptions). Reasons:

This data is the most sensitive, so a working process for that data will also work as a fall back option for less sensitive data.
Sharing of the full dataset is relevant for other researchers to either reproduce results or apply new analyzing methods to the same data and be able to compare results.

The data will be stored and distributed via the Language Bank of Finland’s Download service (https://korp.csc.fi/download). Access permissions will be handled using the Language Bank Rights system.

As to how the data can be distributed safely we had several ideas, one was to package the data into a VeraCrypt container and distribute the password separately. We also discussed DRM techniques that would make it maybe possible to withdraw access at any give time.

We will also need to look at the application process via LBR/REMS:

How do we ensure secure identification of the researcher? Suomi.fi, bank credentials?
What information would the researcher have to provide apart from a research plan?
What would be the approval process?

Data usage

The data usage should be secure but also easy at the user’s end. Too complicated usage conditions will lead to the user copying the data away from the secured container. While this breach would not be our responsibility it should be minimized

Action points

Satu: Planning the data collection

Martin/Mietta: Planning data distribution: DRM, VeraCrypt, credential distribution

All(later): Looking at the application process via LBR/REMS:

Hae Kielipankki-portaalista:

Kuukauden tutkija: Heidi Niva

Yhteystiedot

Kielipankin tekninen ylläpito:
kielipankki (ät) csc.fi
p. 09 4572001

Aineistoihin ja muuhun sisältöön liittyvät asiat:
fin-clarin (ät) helsinki.fi
p. 029 4129317

Tarkemmat yhteystiedot