
| Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
|---|---|---|---|---|---|---|---|---|
| Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
These resource versions are not yet available in the Language Bank of Finland.
| Shortname | Name and metadata | License | Formats | Support level | Contact Person | Resource group and help | Location | Other information |
|---|---|---|---|---|---|---|---|---|
| Shortname | Name and metadata | License | Formats | Support level | Contact Person | Resource group and help | Location | Other information |
This corpus is extracted from the Finnish parliament plenary session transcripts and videos by the
Aalto Speech Recognition group. The original session transcripts and videos are available at the web
portals of the Parliament of Finland (avoindata.eduskunta.fi and verkkolahetys.eduskunta.fi).
The Finnish corpus is split into three parts:
1. 2015-2020 set
2. 2008-2016 set
3. Development and evaluation sets
A non-overlapping combination of the 2008-2016 set and the 2015-2020 set form a training set of size:
– 1 422 318 sample pairs
– 3 130 hours of speech
– 19 356 831 word tokens
The Finland Swedish corpus contains:
– 3889 sample pairs
– 6.4 hours of speech
– 333 483 word tokens
All audio files in this corpus are single-channel wavs with sample rate 16 kHz and 16-bit precision.
The transcript files (.trn) are plain text files.
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021081105
Last modified on 2025-09-30