Sensitive Data services (CSC)

In case special protective measures are required in order to process your research data, you may consider using the Sensitive Data services available at CSC. The SD services can support you in processing your research data securely. SD Connect (for storing and sharing data) and SD Desktop (for processing and computing) provide a completely isolated work environment, available just for you and your colleagues.

Before starting a Sensitive Data project, make sure that you have carefully assessed the risks concerning your dataset. SD services should only be used when necessary, since the possibilities for processing the data within SD Desktop are intentionally limited.

How to use the SD Services at CSC – step by step

  1. Create a CSC user account, unless you already have one. (mycsc.fi)
  2. Activate multi-factor authentication (MFA) for your CSC user account. (mycsc.fi)
  3. Create or join a CSC project whose members are allowed to access the same sensitive data. Take note of the project number. (mycsc.fi)
  4. The project manager needs to add or invite all new members to the project. (mycsc.fi)
  5. The project manager needs to activate SD services for the project. (mycsc.fi)
  6. Accept the terms of use of the SD Services. (mycsc.fi)
  7. Prepare the data that is to be uploaded and processed with the SD Services. Make sure you protect the data well until it is uploaded to SD Connect.
  8. Log in to SD Connect (using multi-factor authentication), to upload the sensitive data. Note that Chrome will probably work better than other browsers. (sd-connect.csc.fi)
    • Select the project that will be allowed to access the data. (sd-connect.csc.fi)
    • Select Upload. Give a name to the folder (i.e., the ’bucket’ that will contain your files). Note that the name of the folder/bucket cannot be changed after it has been created. Select the files to be encrypted and uploaded from your local device (this will work for files up to 100 GB). Then click Upload. (sd-connect.csc.fi)
    • In case you need to upload files larger than 100 GB, you can use the command-line tool to automatically encrypt and transfer the data from Allas, see instructions.
    • With SD Connect, it is also possible to share folders across different CSC projects. (sd-connect.csc.fi)
  9. If you need to use special software within SD Desktop, check the list of the tools currently available via SD Software Installer.
    • Some versions of, e.g., ELAN (for audio & video annotation), Praat (for speech analysis, annotation and signal processing) and Whisper (an automatic speech recognition tool) are already available for SD Desktop users.
    • It is also possible to import your own software to SD Desktop in containers, see instructions.
    • If you have any questions, please contact CSC Service Desk (subject: Sensitive Data).
  10. Login to SD Desktop (using multi-factor authentication), to use and to analyse the sensitive data in a secure desktop environment. (sd-desktop.csc.fi)
    • Click on Go to SD Desktop Management. (sd-desktop.csc.fi)
    • Again, select the correct project. To set up the virtual machine for the new desktop, select the operating system (e.g., Ubuntu 22.04). For the resources to be allocated, ’Small computation’ is often sufficient, unless you know that you will be performing heavy computation. Give a name to the desktop (you may set up several desktops/virtual machines for different purposes). (sd-desktop.csc.fi)
    • If you intend to use audio or video data in SD Desktop, you should select Add External Volume.
      • This will not affect, e.g., the amount of data that can be loaded from Allas to the desktop. However, using an external volume will make the connection faster when inspecting the files in SD Desktop.
      • Moreover, if the virtual machine gets completely stuck for some reason, the data on the external volume can be recovered, whereas the data stored within the main desktop volume cannot. (sd-desktop.csc.fi)
    • Remember to make all the aforementioned selections first. Then, wait patiently for the desktop to be built! The list of desktops may not update immediately, although the building of the new desktop may already be in progress. The building stage may take up to 30 minutes. (sd-desktop.csc.fi)
    • When the new desktop appears ’active’ on the list, click on Go to Connections page and enter the desktop of your choice. Your new SD Desktop instance opens. (sd-desktop.csc.fi)
    • To copy your data (to make your data available) to the desktop, look for Data Gateway in the menus. Double-click on it and log in with your CSC credentials. Note that it is not possible to copy and paste anything to SD Desktop, so you will need to type your password character by character.
    • To install specific software to your desktop, go to Data Gateway -> SD Connect  > Open folder…
      From folder tools-for-sd-desktop, move ’sd-installer’ to the Desktop folder. The file will now be copied to your virtual machine. Right-click on sd-installer on the desktop and select Allow launching. Then double-click on it. You can now select the tool you wish to install on your SD Desktop virtual machine.
    • Process and analyze your data in the desktop environment.
    • If you need to stop working, you can pause the desktop in order not to spend resources, via Go to SD Desktop Management. This will not delete your data. However, it will take some time to restart the desktop later. (sd-desktop.csc.fi)
    • Note that if you delete a desktop entirely, all the data stored in that desktop instance will be deleted, and it will not be possible for anyone to get it back.
  11. For security reasons, data can only be exported from SD Desktop by the manager of the CSC project in question. See instructions. (sd-desktop.csc.fi)

 

 

Instructions: Publishing a privacy notice of research purposes

When you obtain a resource containing personal data from the Language Bank of Finland (Kielipankki) and start processing it for a new purpose, you must prepare a privacy notice regarding the purpose of processing, publish the notice openly in electronic format, and provide a link to the notice to the Language Bank. The purpose of a privacy notice is to help data subjects understand the purposes for which their data is used.

The Language Bank now offers some guidelines to help you collect the pieces of information that are usually required for a privacy notice regarding research purposes. Please note, however, that you should always primarily follow the data protection guidelines of your own organisation.

Privacy notice – Instructions for researchers

How to cite individual corpora, the Language Bank of Finland and FIN-CLARIN

Suomeksi

It is important to cite language resources in a coherent way. This will enable other researchers to replicate your research, and the authors or developers of the resource can receive credit for their work.

By providing a reference to the Language Bank of Finland and to its language resources, you can also help FIN-CLARIN keep track of the usage of its corpora and services and maintain the Language Bank of Finland.

References to individual resources available in the Language Bank of Finland

When you use a language resource (a corpus or a tool) that is available via the Language Bank of Finland, please adhere to the citation instructions provided by the Language Bank. This way, you provide an accurate reference to the exact version of the resource. In the Language Bank of Finland, every resource version has a unique persistent identifier that is always included in the reference. The identifier exists in order to ensure that the resource can be accessed and the study can be replicated in the future even if the location of the resource changes.

The license conditions of many corpora and tools require the users to provide a reference to the resource in question. In this case, the license terms will usually mention the BY condition (Attribution; Nimeä in Finnish). A reference is systematically required for all language resources that are licensed for academic use (CLARIN ACA) or for individual use (CLARIN RES). Even openly licensed language resources may require appropriate citation (e.g., Creative Commons Attribution and other open licenses).

How to find the reference instructions of individual corpora

Reference instructions for individual corpus versions or variants can be found by clicking on the quotation mark icon-quote-right on the Corpora list of the Language Bank of Finland.

The reference instructions are also mentioned in the metadata of each language reource. The metadata of the corpora that are available via the Language Bank of Finland are stored and distributed on the COMEDI service. The metadata record of a specific language resource can always be accessed with the persistent identifier that is included in the citation instructions, or by clicking on the corpus title on the corpus list of the Language Bank. In the metadata record, the link to the reference instructions can usually be found in the Documentation section. In some cases, the citation instructions are directly available in the Attribution Details field. The metadata record also provides details on the corpus-specific license.

For corpus versions that are offered via the Korp concordancing service, the link to the citation instructions is available in the corpus information frame that pops up when the mouse cursor is moved over a corpus title in the corpus selection menu, as well as under the corpus details in the information column on the right when an individual search result is selected in the concordance view.

In case the resource is available via the download service of the Language Bank of Finland, it includes a file called README containing the persistent identifier of that particular resource version.

Reference format

As an example, here are the reference instructions to the language resource titled Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s, Version 2:

University of Helsinki (2017). Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s, Version 2 [text corpus]. Kielipankki. Retrieved from http://urn.fi/urn:nbn:fi:lb-2017091901

Note that the exact formatting practices of data references may vary in different publications. In any case, it is best to try and include the details that are included in the citations instructions provided by the Language Bank of Finland. When you are writing scientific journal articles or producing other research output, you may need to check the publication-specific instructions in order to see whether it is customary to include data sources in the bibliography or to create a separate list for them.

References to the Language Bank of Finland, FIN-CLARIN or CLARIN

The address of the Language Bank of Finland (Kielipankki)

In case you wish to refer to the Language Bank of Finland as a collection of services, please use the web address www.kielipankki.fi.

Refer to the FIN-CLARIN consortium

A presentation of the FIN-CLARIN consortium on the web portal of the Language Bank of Finland: http://urn.fi/urn:nbn:fi:lb-2014120212

Refer to CLARIN ERIC

The general reference instructions of CLARIN ERIC and CLARIN services can be found under CLARIN Frequently Asked Questions.

More information about citing data

Search the Language Bank Portal:
Pekka Posio
Researcher of the Month: Pekka Posio

 

Upcoming events


Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information