LAT service to be discontinued in November 2020

Suomeksi

For technical reasons, the LAT service (lat.csc.fi) will be discontinued in the Language Bank of Finland as of November 30, 2020.

The LAT platform itself is no longer developed in its present form by MPI, and the media browser component Annex (part of LAT) is based on deprecated Adobe Flash technology that will stop working at the end of this year.

What will happen to the corpora that used to be available in LAT?

A replacement service for LAT has not yet been selected by the Language Bank of Finland. However, all the speech and sign language corpora that were previously available in LAT will be made available by alternative means.

All of the LAT corpora can be offered as downloadable packages that can be used and studied directly on the user’s local computer. In the download service, the corpora can be accessed under the same conditions as via LAT. For some corpora, more advanced solutions might already be available.

Which corpora are affected by LAT shutting down?

The LAT instances of the following corpora will be affected:

  • aku-egg: Speech and EGG (Electroglottography) Simultaneous Recordings
  • cfinsl-conv: Corpus of Finnish Sign Language: conversations
  • cfinsl-elicit: Corpus of Finnish Sign Language: elicited narratives
  • eduskunta-v1-lat: Plenary Sessions of the Parliament of Finland, Kielipankki LAT Version 1
  • elfa-lat: The Helsinki LAT Version of the ELFA Corpus
  • fbc-lat: The Helsinki LAT Version of the Finnish Broadcast Corpus
  • ffe: a single unpublished video file by an unknown creator, access restricted to the owner (this data will be archived temporarily but will be removed quite soon unless the owner turns up!)
  • finka: The Corpus of Border Karelia, Kielipankki LAT version
  • giellagas-north: Samples of Northern Saami
  • helpuhe1: The Longitudinal Corpus of Finnish Spoken in Helsinki (1970s, 1990s and 2010s)
  • kipo: The 2010 Language Policy Program of the Sign Languages of Finland Corpus (versions 1 and 2)
  • la-murre: The Finnish Dialect Corpus of the Syntax Archive, Helsinki LAT Version
  • PeWi-corpus (the original authoritative copy is offered by MPI; the identical version will be removed from the Language Bank of Finland)
  • puheen-analyysi: Learning material for speech analysis
  • reittidemo-lat: The Helsinki LAT Version of the Route to A wing Corpus
  • seuruu: Follow-up Study of Dialects of Finnish
  • skn-lat: The Helsinki LAT Version of Samples of Spoken Finnish
  • snowfrog: ProGram data. The stories Snowman and Frog, where are you?
  • ssdc-2016: Skolt Saami Documentation Corpus (2016)

In order to see where each corpus will be located in the future, please refer to the corresponding metadata page where this information will be updated. The relevant metadata links are provided on the list above.

I came to this page via a PID, how do I know where the file is?

LAT assigned 25000 PIDs to individual files. We have no automatic mapping of these PIDs but we can help you find the file if you need it. We aimed to structure downloadable packages similar to the dataset structure on LAT. To locate a file look into the URL field of this page where you find a ”?path=...” parameter. Example: ?path=demo/TRASH/2017-01/526/v7556__.C_4.4_Viittomakielisten_kielelliset_oikeudet.imdi

This should help you locate your file, please contact us if you have any questions.

Schedule

August-September 2020:

  • The (numerous) persistent identifiers assigned by LAT to individual files will be redirected to stopover/tombstone pages. There will be one tombstone page per dataset.
  • The LAT corpora that are not yet offered in Download will be moved there and their persistent identifiers will be updated to point to their new home.
  • Assuming that too many errors are not detected, this process should be complete by October.

September-October 2020:

  • Decisions are to be taken as to what kind of streaming services the Language Bank can implement for audio and video materials in the year 2021.

30th November 2020:

  • The support for the server where LAT is located will be discontinued and the service will be shut down.
  • All the corpora that were previously offered via LAT will continue to be offered at least for download.

Year 2021 (and later):

  • Given that there are sufficient resources, more functionalities for browsing, searching, and analyzing speech and sign language corpora can be added.

Further details on the schedule of the aforementioned process will be updated on this page. In case you need additional information at this point, please contact FIN-CLARIN directly.

Plenary Sessions of the Parliament of Finland

Suomeksi

The latest versions:  
Plenary Sessions of the Parliament of Finland, Kielipankki Korp Version 1.5
icon-info-circle Metadata and license
icon-quote-right How to cite this version
Open the corpus in Korp icon-question-circle
Plenary Sessions of the Parliament of Finland, Downloadable Version 1.5
icon-info-circle Metadata and license
icon-quote-rightHow to cite this version
Download the corpus
Locate other versions of the same resource  

Plenary Sessions of the Parliament of Finland contains audio and video recordings of the parliamentary sessions and the transcripts that have been aligned with the audio. Both the media files and the original transcripts have been obtained directly from the online public services of the Parliament. The content is openly available via the Language Bank of Finland without logging in.

Via the Korp service in the Language Bank of Finland, it is possible to perform various kinds of content searches on the corpus and to calculate statistics from the results. The turns of different speakers have been separated in the text. In the Extended search tab in Korp, it is possible to delimit searches on the basis of the speaker’s name, the parliamentary group or the role of the speaker.

In the search results of this corpus version in Korp, there are also links to the corresponding utterances in the original video. If you wish, you may download the ELAN/EAF annotation files and the audio files in the downloadable version of the corpus for further processing. Moreover, the original videos and transcripts can also be located in the online services of the Parliament of Finland.

The text in the original transcripts has been aligned with the audio recordings by automatic methods. The technological expertise in the alignment process was provided by Aalto University. In those audio portions where a matching text was not found in the transcript, an automatic speech recognizer was used in order to provide a tentative transcript. Thus, it is important to remember that the text in the Korp version of the corpus is not error-free and it may not always fully correspond to the original transcript.

Further information about the contents of the different corpus versions can be found in their metadata records.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-201407305

Workshop “Digital Parliamentary data and research”

Friday 3 May at 12.00
Aalto University (Otaniemi), CS-Building, Room T4 / A238 (Konemiehentie 2)

The aim of the workshop is to discuss the novel digital parliamentary datasets—in particular those of Parliament of Finland—their use in research, the related research resources and tools, and their future development for researchers, but also for citizens and the media. FIN-CLARIN and the Korp version 1.1 of the Plenary Sessions of the Parliament of Finland, available in the Language Bank of Finland, will be presented during the afternoon.

Mietta Lennes: FIN-CLARIN and Parliamentary Data in Kielipankki – the Language Bank of Finland (PowerPoint / PDF slides)

Further information including the programme of the workshop can be found at https://www.helsinki.fi/en/helsinki-centre-for-digital-humanities/workshop-digital-parliamentary-data-and-research.

Eduskunnan täysistunnot

In English

Viimeisimmät versiot: 
Eduskunnan täysistunnot, Kielipankin Korp-versio 1.5
icon-info-circle Kuvailutiedot ja lisenssi
icon-quote-right Viittausohje tähän versioon
Esimerkkihakuja
Avaa aineisto Korp-palvelussa icon-question-circle
Eduskunnan täysistunnot, ladattava versio 1.5
icon-info-circle Kuvailutiedot ja lisenssi
icon-quote-right Viittausohje tähän versioon
Lataa aineisto
Etsi muut saatavilla olevat versiot 

Eduskunnan täysistunnot -aineisto sisältää Suomen eduskunnan täysistuntokeskustelujen äänitteitä, videoita ja niihin kohdistettuja keskustelupöytäkirjoja. Sekä mediatallenteet että pöytäkirjat on saatu suoraan eduskunnan julkisten palveluiden kautta. Aineisto on Kielipankin kautta avoimesti saatavilla eikä sen käyttäminen edellytä kirjautumista (ks. tarkemmat tiedot aineistoon sovellettavista lisensseistä).

Aineistosta voi tehdä monenlaisia hakuja ja tilastoida tuloksia Kielipankin Korp-palvelussa. Eri puhujien puheenvuorot on merkitty aineistoon erikseen. Korpin laajennettu haku -välilehdellä voi rajata hakuja esimerkiksi puhujan, hänen edustamansa eduskuntaryhmän tai hänen roolinsa perusteella.

Lähes kaikista Korp-hakutuloksista on myös linkki vastaavaan kohtaan alkuperäisessä videossa (videolinkit on lisätty aineiston Korp-versioon 1.5). Lisäksi käyttäjä voi tarvittaessa ladata Korp-versiota 1.5 vastaavat VRT-muotoiset tekstitiedostot sekä keskustelujen äänitteet ja EAF-muotoiset annotaatiotiedostot aineiston ladattavasta versiosta sekä noutaa eduskunnan palvelimelta alkuperäisiä videoita.

Pöytäkirjojen sisältämä teksti on kohdistettu äänitteisiin automaattisilla menetelmillä. Kohdistustyöstä on teknisesti vastannut Aalto-yliopisto. Ne kohdat, joissa pöytäkirjasta ei ole löytynyt äänitettä vastaavaa tekstiä, on pyritty tunnistamaan automaattisesti ja tekstiehdotus on lisätty litteraattiin. Tämän vuoksi kannattaa huomata, ettei tunnistettu teksti ole kaikilta osin virheetöntä. Teksti on myös jäsennetty suomen kielen jäsentimellä, joten alkuperäisten pöytäkirjojen ruotsinkieliset kohdat on yleensä merkitty sanaluokaltaan vierassanoiksi.

Tarkempaa tietoa eri aineistoversioiden sisällöstä löytyy niiden kuvailutiedoista.

Esimerkkihakuja aineiston Korp-versiosta

Kuva Korp-hakutuloksista eduskunta-aineistosta
Yksinkertainen haku Korpissa, kaikki esiintymät sanan ’maahanmuuttaja’ kaikista eri muodoista koko aineistossa
Videolinkin sijainti Korp-hakutuloksissa (sivun oikea alakulma)

Laajennettu haku Korpissa: Sanan ’maahanmuuttaja’ kaikki eri muodot, jotka esiintyvät joko Keskustan tai Kansallisen kokoomuksen eduskuntaryhmien edustajien puheissa ja joiden jälkeen samassa virkkeessä enintään 10 sanan päässä esiintyy mikä tahansa muoto sanasta ’opetus’ tai sanasta ’koulutus’.

Laajennettu haku Korpissa, kaikki ’Pori’-paikannimen maininnat.

Tämän sivun pysyvä tunniste: urn:nbn:fi:lb-2021111721