The Donera Prat corpus of Finland-Swedish speech is available for research use

Donera Prat: The Corpus of Donated Finland-Swedish Speech (doneraprat) is a collection of speech recordings that accumulated during the Donera Prat campaign that was running during 29.11.2021-5.3.2024. Donera Prat was the Swedish counterpart of the Finnish Lahjoita Puhetta (Donate Speech) campaign. The Donera Prat resource is now available under restricted terms and conditions for researchers via the download service of the Language Bank of Finland.

The Donera Prat corpus contains a total of about 89 hours of speech audio recordings, all of which have been manually transcribed into plain text files. The resource also includes information about the elicitation tasks for which each of the speech samples was donated in the original campaign, and the background details that were voluntarily provided by speech donors.

Applying for rights to use the Donera Prat resource for academic research

Access to the Puhelahjat resource may be granted for purposes related to language research or to the research and development of AI (artificial intelligence). Academic researchers can now apply for access to the dataset. For academic research use, the license terms and conditions (including the data protection terms and conditions) can be found at http://urn.fi/urn:nbn:fi:lb-2024111128.

Researchers can now apply for access via the Language Bank Rights system. The researcher must present a research plan before the license can be granted. As the resource contains personal data, the researcher must also submit the link of a public privacy notice regarding the research purpose.

License agreement determines the use of the Donera Prat resource

In parallel to the license for research use, a license agreement will be offered for companies who wish to use the data. The details and instructions regarding commercial use will soon be available online. Interested companies may request further information at lahjoita-puhetta@helsinki.fi.

 

More information

Search the Language Bank Portal: