Aalto Finnish Parliament ASR Corpus 2008-2020 (fi-parliament-asr)

Suomeksi


Currently available versions of this resource

ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level
ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level

Upcoming versions of this resource

These resource versions are not yet available in the Language Bank of Finland.

ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information
ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information

Resource information

This corpus is extracted from the Finnish parliament plenary session transcripts and videos by the
Aalto Speech Recognition group. The original session transcripts and videos are available at the web
portals of the Parliament of Finland (avoindata.eduskunta.fi and verkkolahetys.eduskunta.fi).

The Finnish corpus is split into three parts:
1. 2015-2020 set
2. 2008-2016 set
3. Development and evaluation sets

A non-overlapping combination of the 2008-2016 set and the 2015-2020 set form a training set of size:
– 1 422 318 sample pairs
– 3 130 hours of speech
– 19 356 831 word tokens

The Finland Swedish corpus contains:
– 3889 sample pairs
– 6.4 hours of speech
– 333 483 word tokens

All audio files in this corpus are single-channel wavs with sample rate 16 kHz and 16-bit precision.
The transcript files (.trn) are plain text files.

License and access

  • All versions of this resource are available publicly (PUB).
  • Click on the license image to see the resource-specific license text.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021081105

Last modified on 2025-09-30