Finnish Broadcast Corpus

Suomeksi


Currently available versions of this resource

ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level
ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level

Upcoming versions of this resource

These resource versions are not yet available in the Language Bank of Finland.

ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information
ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information

Content corresponding to the previous LAT version of the material is now available in the Language Bank download service

The Language Bank LAT platform was discontinued at the end of 2020, and this material is no longer accessible via the LAT service. The corresponding content is available in downloadable format. The data can therefore be further explored and processed using tools such as ELAN and Praat.

Resource information

The Finnish Broadcast Corpus is divided into two main parts: FBC-1 and FBC-2.

The Finnish Broadcast Corpus 1, FBC-1 contains 65 radio and tv recordings broadcast by YLE – the Finnish Broadcasting Company during the year 2003. Parts of the audio and video material have been annotated either manually or automatically in various levels: e.g., utterance (orthographic transcript), word, phone. FBC-1 was compiled under an initiative called Integrated Resources for Speech Technology and Spoken Language Research in Finland, funded by the Academy of Finland. It is CSC’s first multimodal corpus.

Details of the size of FBC-2 are being updated.

The material in the FBC-1 represents four categories:
* Radio monologues
– broadcast telegraph news (24 × 3 minutes, Nov. 2003)
– broadcast lectures of the week (8 × 14 minutes).
* Radio dialogues
– unfinished recordings of the Moninaisuusfoorumi event (5 × 1h).
* TV monologues
– broadcast main news read by Arvi Lind ja Eeva Polttila (15 × 30 minutes, September – November 2003), including the very last news telecast by Arvi Lind on October 15, 2003
* TV dialogues
– broadcast Aamu-TV programs (13 × ca. 12 minutes, 2003).

Formats:
* WAV audio format
* HQ_Pure audio format (44,1–48 KHz) (supported by the Puh-Editor, which is now obsolete)
* HQ_Pure audio format (16 KHz) (supported by the Puh-Editor, which is now obsolete)
* MPEG2 video

Funding Project:
Puheteknologian ja puheentutkimuksen yhteiset resurssit Suomessa, Integrated Resources for Speech Technology and Spoken Language Research in Finland (SA-Puhe)
Funding Type: National Funds
Funder: Academy of Finland
Funding Country: Finland
Project duration: 01/01/2002 – 12/31/2004

License and access

  • This resource requires you to apply for individual access rights (RES). Apply
  • Click on the license image to see the resource-specific license text.

This page has a persistent identifier: http://urn.fi/urn:nbn:fi:lb-2025032701

Last modified on 2025-05-14