(Updated on 29.10.2022: Added information about the license for commercial use)
The Donate Speech Corpus (Puhelahjat) is a collection of speech recordings that accumulated during the Donate Speech campaign between 16.6.2020 and 14.9.2021. The resource is now available via the download service of the Language Bank of Finland under restricted terms and conditions.
The Donate Speech Corpus contains a total of about 3200 hours of speech recordings, out of which about 1600 hours have been transcribed. The resource also includes information about the elicitation tasks for which each of the speech samples was donated in the original campaign, and the background details that were voluntarily provided by speech donors.
Access to the Puhelahjat resource may be granted for purposes related to language research or to the research and development of AI (artificial intelligence). Academic researchers can now apply for access to the dataset. In parallel to the license for research use, a license agreement is offered for companies who wish to use the data. The details regarding commercial use will soon be available online. Interested companies may request further information at email@example.com.
For academic research use, the license terms and conditions (including the data protection terms and conditions) can be found at http://urn.fi/urn:nbn:fi:lb-2022020223.
Researchers can now apply for access via the Language Bank Rights system. The researcher must present a research plan before the license can be granted. As the Puhelahjat resource contains personal data, the researcher must also submit to Kielipankki a notification that contains the public information about the processing of the personal data.
Donate Speech (Puhelahjat) datasets for research use (corresponding information for commercial use will soon be available)
Donate Speech (Puhelahjat) Corpus Metadata
Donate Speech (Lahjoita puhetta) campaign information in Kielipankki