Suomeksi

Data protection terms and conditions

Title of Resource: Speech and EGG (Electroglottography) Simultaneous Recordings (aku-egg)

Metadata: urn:nbn:fi:lb-2020112923
License: urn:nbn:fi:lb-2015041301

This page describes the specific conditions regarding the processing of the personal data in this Resource. In addition to these conditions, see the guidelines for processing personal data in the Language Bank of Finland.

Controller of the data stored in the Language Bank of Finland

University of Helsinki
PO Box 3
00014 University of Helsinki
Phone: 02941 911

For further details on the data protection of the resources in the Language Bank of Finland, please contact FIN-CLARIN helpdesk.

Data Protection Officer of the University of Helsinki

Email: tietosuoja@helsinki.fi

Description of the personal data

Types of personal data in the Resource (currently in Finnish only)

Aineisto on kerätty Teknillisen korkeakoulun akustiikan ja äänenkäsittelytekniikan laboratorion kaiuttomassa huoneessa joulukuussa 2005. Tilaisuuksissa tallennettiin yhtäaikaisesti puhujan äänisignaalia mikrofonin avulla sekä puheeseen liittyvää elektroglottografiasignaalia (EGG) kurkunpään läheisyyteen kiinnitettävien elektrodien avulla. Tilaisuuksissa tallennettiin kolmea eri puhenäytettä: sanaa ”hääyöaie”, vokaaleja ”a e i o u y ä ö” eri ääntötavoilla (puristeinen, normaali, vuotoisa) eri järjestyksessä sekä ns. luonnollista tekstiä, joka koostui sääilmiöitä käsittelevistä suomen kielen virkkeistä. Yksittäisiä puhujia saattaa olla mahdollista tunnistaa puheäänitteen perusteella, mutta Aineisto ei sisällä muita heitä koskevia tunnistetietoja.

Rekisteröityjen ryhmät

Tutkimukseen osallistuneet puhujat olivat aikuisia, vapaaehtoisia henkilöitä. Puhujia oli yhteensä 12 (6 naista, 6 miestä).

Data protection terms and conditions for this Resource

In these data protection terms and conditions, End-User means the party acting as the Controller for the Resources received, in accordance with the General Data Protection Regulation (EU) 2016/679. Depending on the case and the purpose of Resource use, End-User may therefore mean the CLARIN service user’s employer or organisation (e.g., a university, university of applied sciences or other research organisation) or the service user personally.

The End-User understands that when receiving the Resource, it becomes a Controller, as referred to in the data protection legislation. The End-User must ensure that it complies with the applicable data protection legislation when processing personal data.

Prior to making this agreement and starting to process the Resource, the End-User must ensure that it is authorized by its home organization to approve these data protection terms and conditions.

The purpose of use of personal data

  • The Resource may only be used for non-commercial research or teaching purposes.

Location and transfer of the personal data (in Finnish only)

Aineisto on vanhaa, joten henkilötietojen sijaintiin liittyvistä käytänteistä ei ole voitu kertoa tutkittaville tietosuoja-asetuksen edellyttämällä tavalla. Aineisto on ollut vuodesta 2014 lähtien saatavilla ja käytettävissä ainoastaan luottamusverkostojen tunnuksilla kirjautuneille suomalaisille ja ulkomaisille käyttäjille, joiden työtehtäviin sisältyy tutkimusta.

Publish a link to your Privacy Notice

When you start using this Resource, share the title of your project that is understandable to the general public as well as the link to the publicly available privacy notice by using this form. This information will be published on the website of the Language Bank of Finland.

Updates

This page was last updated on 21.6.2021.

Persistent identifier of this page: urn:nbn:fi:lb-2021062229

In English

Aineistokohtaiset tietosuojaehdot

Aineiston nimi: Puheen ja EGG:n samanaikaiset tallenteet (aku-egg)

Aineiston kuvailutiedot: urn:nbn:fi:lb-2020112923
Lisenssi: urn:nbn:fi:lb-2015041301

Aineisto sisältää henkilötietoja, joiden käsittelyä koskevat erityisehdot on kuvattu tällä sivulla. Tutustu lisäksi henkilötietoja sisältävien Kielipankin aineistojen käsittelyohjeisiin.

Kielipankkiin talletetun aineiston rekisterinpitäjä

Helsingin yliopisto
PL 3
00014 Helsingin yliopisto
Puhelin: 02941 911

Lisätietoja Kielipankin aineistojen tietosuojasta voit pyytää FIN-CLARINin asiakaspalvelusta.

Helsingin yliopiston tietosuojavastaava

Sähköpostiosoite: tietosuoja@helsinki.fi

Henkilötietojen kuvaus

Aineiston sisältämien henkilötietojen tyypit

Aineisto on kerätty Teknillisen korkeakoulun akustiikan ja äänenkäsittelytekniikan laboratorion kaiuttomassa huoneessa joulukuussa 2005. Tilaisuuksissa tallennettiin yhtäaikaisesti puhujan äänisignaalia mikrofonin avulla sekä puheeseen liittyvää elektroglottografiasignaalia (EGG) kurkunpään läheisyyteen kiinnitettävien elektrodien avulla. Tilaisuuksissa tallennettiin kolmea eri puhenäytettä: sanaa ”hääyöaie”, vokaaleja ”a e i o u y ä ö” eri ääntötavoilla (puristeinen, normaali, vuotoisa) eri järjestyksessä sekä ns. luonnollista tekstiä, joka koostui sääilmiöitä käsittelevistä suomen kielen virkkeistä. Yksittäisiä puhujia saattaa olla mahdollista tunnistaa puheäänitteen perusteella, mutta Aineisto ei sisällä muita heitä koskevia tunnistetietoja.

Rekisteröityjen ryhmät

Tutkimukseen osallistuneet puhujat olivat aikuisia, vapaaehtoisia henkilöitä. Puhujia oli yhteensä 12 (6 naista, 6 miestä).

Aineistoon liittyvät erityiset tietosuojaehdot

Käyttäjällä tarkoitetaan näissä tietosuojaehdoissa sitä tahoa, joka toimii vastaanotettavan aineiston rekisterinpitäjänä yleisen tietosuoja-asetuksen (EU) 2016/679 mukaan. Käyttäjä voi siis tapauksesta ja aineiston käyttötarkoituksesta riippuen olla Kielipankin käyttäjän työnantaja tai organisaatio, jossa hän toimii (esimerkiksi yliopisto, ammattikorkeakoulu tai muu tutkimusorganisaatio) tai palvelun käyttäjä henkilökohtaisesti.

Käyttäjä ymmärtää, että aineistoa vastaanottaessaan siitä tulee soveltuvan tietosuojalainsäädännön mukaan rekisterinpitäjä. Käyttäjä on vastuussa siitä, että se noudattaa henkilötietojen käsittelyssä soveltuvaa tietosuojalainsäädäntöä.

Ennen tämän sopimuksen hyväksymistä ja Aineiston käsittelyn aloittamista Käyttäjän on varmistettava, että sillä on oman organisaationsa myöntämät valtuudet hyväksyä nämä tietosuojaehdot.

Henkilötietojen käyttötarkoitus

  • Aineistoa saa käyttää ainoastaan ei-kaupalliseen tutkimus- tai opetustarkoitukseen.

Henkilötietojen sijainti

Aineisto on vanhaa, joten henkilötietojen sijaintiin liittyvistä käytänteistä ei ole voitu kertoa tutkittaville tietosuoja-asetuksen edellyttämällä tavalla. Aineisto on ollut vuodesta 2014 lähtien saatavilla ja käytettävissä ainoastaan luottamusverkostojen tunnuksilla kirjautuneille suomalaisille ja ulkomaisille käyttäjille, joiden työtehtäviin sisältyy tutkimusta.

Julkaise linkki omaan tietosuojailmoitukseen

Kun ryhdyt käyttämään tätä Aineistoa, ilmoita hankkeesi yleistajuinen otsikko sekä avoimesti saatavilla olevan tietosuojaselosteen linkki Kielipankille tällä lomakkeella. Ilmoitetut tiedot julkaistaan Kielipankin verkkosivuilla.

Päivitykset

Tätä sivua on viimeksi päivitetty 21.6.2021.

Tämän sivun pysyvä tunniste: urn:nbn:fi:lb-2021062230

North Saami Corpus (Literature) (UHLCS)

The North Saami Corpus contains Kerttu Vuolab’s novel Cheppari cháráhus written in Northern Sami. The corpus is a part of the UHLCS corpus collection.

UHLCS has many different IPR holders. Should you have any questions regarding the collection, please contact Pirkko Suihkonen (suihkonen.pirkko@gmail.com).

Latest versions/subcorpora:
North Saami Corpus (Literature) (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
The data is available upon request via CSC’s computing environment
North Saami Corpus (Literature) (UHLCS), Helsinki Korp Version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
The resource will be available soon
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021061605

Finnish Corpus (Literature) (UHLCS)

The Finnish Corpus is a part of the UHLCS corpus collection.

UHLCS has many different IPR holders. Should you have any questions regarding the collection, please contact Pirkko Suihkonen (suihkonen.pirkko@gmail.com).

Latest versions/subcorpora:
Finnish Corpus (Literature) (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
The data is available upon request via CSC’s computing environment
Finnish Corpus (Literature) (UHLCS), Helsinki Korp Version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
The resource will be available soon
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021061604

Testipiste Corpus

Testipiste is a language assessment centre for adult migrants. This corpus contains texts written by 2397 different persons, 3 texts from each person. It also contains assignments and other related texts. The essays contain i.a. information on the starting level of their authors, as defined by Testipiste.

Latest versions/subcorpora:
Testipiste Corpus, source
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
The resource will be available soon
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021061603

DIALUKI – Diagnosing reading and writing in a second or foreign language

The project studies the diagnosis of reading and writing abilities in a second or foreign language. It seeks to identify the cognitive features which predict a learner’s strengths and weaknesses in those areas. The project brings together scholars from applied linguistics, psychology and assessment to engage in multidisciplinary work and to develop innovative ways of diagnosing the development of second and foreign language abilities.

More information on the corpus: https://www.jyu.fi/dialuki

Latest versions/subcorpora:
DIALUKI – Diagnosing reading and writing in a second or foreign language
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
The resource will be made available in Korp
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021061602

CEFLING Project Corpus

Finnish as a second language and English as a foreign language writing performances collected from comprehensive school students (grades 7 – 9) in the project CEFLING – Linguistic Basis of the Common European Framework for L2 English and L2 Finnish. Data from several hundred learners; 4-5 writing tasks from each learner; background information, self-assessments of proficiency.

More information:
https://www.jyu.fi/hytk/fi/laitokset/kivi/tutkimus/hankkeet/paattyneet-tutkimushankkeet/cefling/en/cefling

Latest versions/subcorpora:
CEFLING Project Corpus
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
The resource will be available soon
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021061601

Corpus of Historical American English

The Corpus of Historical American English (COHA) contains about 385 million words and 115 000 texts from the years 1810-2009. Each decade has roughly the same balance of fiction, popular magazine, newspaper, and non-fiction books.

For general terms and conditions for this and other corpora from BYU please see https://www.corpusdata.org/restrictions.asp

More information on the BYU corpora at Kielipankki

Latest versions/subcorpora:
Corpus of Historical American English – Kielipankki Korp version 2017H1
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Corpus of Historical American English – Kielipankki download version 2017H1
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2017061924

Corpus of Contemporary American English

The Corpus of Contemporary American English (COCA) contains about 440 million words and 190 000 texts from the years 1990-2012. The corpus is evenly divided into spoken, fiction, magazine, newspaper, academic genres (~88 million words each).

For general terms and conditions for this and other corpora from BYU please see https://www.corpusdata.org/restrictions.asp

More information on the BYU corpora at Kielipankki

Latest versions/subcorpora:
Corpus of Contemporary American English – Kielipankki Korp version 2017H1
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Corpus of Contemporary American English – Kielipankki download version 2017H1
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2017061921

Corpus of Global Web-Based English

The Corpus of Global Web-Based English (GloWbE) contains about 1.8 billion words and 1 800 000 texts from web pages in the United States, Great Britain, Australia, India, and 16 other countries. About 60 % of the texts come from blogs.

For general terms and conditions for this and other corpora from BYU please see https://www.corpusdata.org/restrictions.asp

More information on the BYU corpora at Kielipankki

Latest versions/subcorpora:
Corpus of Global Web-Based English – Kielipankki Korp version 2017H1
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Corpus of Global Web-Based English – Kielipankki download version 2017H1
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2017061927

SFNET Corpus

The corpus contains written discussion in the SFNET Internet discussion forum in Finnish from 2002-2003.

Latest versions/subcorpora:
SFNET Corpus
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
SFNET Corpus, Helsinki Korp Version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Resource will be available soon
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021052501

The Corpus of Beserman Udmurt, Kielipankki Version

The Corpus of Beserman Udmurt comprises 65 000 tokens. The Beserman dialect of Udmurt is used in daily communication approximately by 2 000 speakers (according to the 2010 census). The Beserman live in the basin of the Cheptsa river in the Republic of Udmurtia and in the Kirov Oblast of the Russian Federation. In the scientific literature Beserman is considered to be a dialect of the Udmurt language which is characterized by an unusual combination of specifically Beserman phenomena (concentrated in vocabulary and phonetics) with certain traits of Northern and Southern Udmurt dialects, mostly morphological and phonological. The dialect remains the main means of everyday communication in Beserman villages, at least for the older generation.

The texts contained in the corpus have been collected in the villages of Shamardan (109 texts of 117), Vortsa (4 of 117), Malaya Yunda (1 of 117) and Zhuvam (3 of 117) in the Republic of Udmurtia in the years 2003-2015. There are 33 informants in total. The texts have been recorded, transcribed and grammatically annotated in the SIL FieldWorks software. The corpus contains narratives, life stories, dialogues, recipes, and recordings of psycholinguistic experiments. Each sentence is provided with interlinear glossing (according to the Leipzig Glossing Rules) and translation. Both the full text version with audio files and the corpus version are available at http://beserman.ru/corpus/search/?interface_language=en

Latest versions/subcorpora:  
The Corpus of Beserman Udmurt, Kielipankki Version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Search for all versions in META-SHARE  

Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021052406

Corpus of Age-related Voice Disguise

This corpus includes normal and age-related disguised speech uttered by 60 native Finnish speakers (31 females and 29 males). The speakers were asked to read the same text fragments several times, in their modal voice and in two disguised voices, first pretending to be an elderly speaker and then pretending to be a child. The texts consisted of the Finnish translations of The Rainbow Passage and The North Wind and the Sun, and two selected English sentences from the TIMIT[1] corpus (SA1, SA2). The corpus includes samples of 78 different sentences per speaker (66 Finnish, 12 English). The speech was recorded simultaneously with a portable recorder with close-talking microphone, and two smartphones applications, yielding a total of 14040 audio files (3 * 4680). The material was recorded in summer 2015 in order to study the effect of voice disguise on automatic speaker recognition.

Data protection policy for this corpus: http://urn.fi/urn:nbn:fi:lb-2018121021

Guidelines for processing corpora containing personal data in the Language Bank of Finland: http://urn.fi/urn:nbn:fi:lb-2020081522

Latest versions/subcorpora:
Corpus of Age-related Voice Disguise
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021052405

ArkiSyn Database of Finnish Conversational Discourse

The Arkisyn corpus contains Finnish everyday conversations which have been morphologically and syntactically annotated. The data comes from the Conversation Analysis Archive at the University of Helsinki and the Finnish language Recording Archive at the University of Turku.

Latest versions/subcorpora:
ArkiSyn Database of Finnish Conversational Discourse, Helsinki Korp Version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2014073026

The Helsinki Korp JRC-Acquis Bilingual Parallel Corpora

The Helsinki Korp JRC-Acquis Bilingual Parallel Corpora are:

The Helsinki Korp JRC-Acquis Finnish-English Corpus
The Helsinki Korp JRC-Acquis Finnish-Swedish Corpus
The Helsinki Korp JRC-Acquis Finnish-German Corpus
The Helsinki Korp JRC-Acquis Finnish-French Corpus
The Helsinki Korp JRC-Acquis Finnish-Spanish Corpus
The Helsinki Korp JRC-Acquis Finnish-Italian Corpus
The Helsinki Korp JRC-Acquis Finnish-Estonian Corpus
The Helsinki Korp JRC-Acquis Finnish-Hungarian Corpus
The Helsinki Korp JRC-Acquis Finnish-Polish Corpus

The corpora contain texts of the JRC-Acquis Multilingual Parallel Corpus. The Acquis Communautaire (AC) is the total body of European Union (EU) law applicable in the the EU Member States.

For more information on the JRC-Acquis Multilingual Parallel Corpus see http://urn.fi/urn:nbn:fi:lb-20140730162 or https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis

Latest versions/subcorpora:
The Helsinki Korp JRC-Acquis Bilingual Parallel Corpora
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
The Finnish Sub-corpus of the JRC-Acquis Multilingual Parallel Corpus
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
The Finnish Sub-corpus of the JRC-Acquis Multilingual Parallel Corpus, Downloadable Version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021052404

The Helsinki Korp Europarl Bilingual Corpora

The Helsinki Korp Europarl Bilingual Corpora are:

The Helsinki Korp Europarl Finnish-English Corpus
The Helsinki Korp Europarl Finnish-Swedish Corpus
The Helsinki Korp Europarl Finnish-German Corpus
The Helsinki Korp Europarl Finnish-French Corpus
The Helsinki Korp Europarl Finnish-Spanish Corpus
The Helsinki Korp Europarl Finnish-Estonian Corpus

The corpora contain texts of the Europarl Parallel Corpus v7.

The Europarl parallel corpus is extracted from the proceedings of the European Parliament. The goal of the extraction and processing was to generate sentence aligned text for statistical machine translation systems. For this purpose matching items were extracted and labeled with corresponding document IDs. By using a preprocessor, sentence boundaries were identified. The data was sentence aligned by using a tool based on the Church and Gale algorithm.

For more information on the Europarl Parallel Corpus see http://urn.fi/urn:nbn:fi:lb-20140730195 and http://www.statmt.org/europarl/

Latest versions/subcorpora:
The Helsinki Korp Europarl Bilingual Corpora
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021052403

Opus, Helsinki Korp Version

The Helsinki Korp version of the Opus open parallel corpus (http://opus.lingfil.uu.se/), containing scrambled sentences, has been published in Kielipankki.

Latest versions/subcorpora:
Opus, Helsinki Korp Version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Search for all versions in META-SHARE

The subcorpora of Opus, Helsinki Korp Version are:

OPUS Finnish–Czech
OPUS Finnish–Danish
OPUS Finnish–Dutch
OPUS Finnish–English
OPUS Finnish–Estonian
OPUS Finnish–French
OPUS Finnish–German
OPUS Finnish–Greek
OPUS Finnish–Hungarian
OPUS Finnish–Italian
OPUS Finnish–Polish
OPUS Finnish–Portuguese
OPUS Finnish–Russian
OPUS Finnish–Swedish
OPUS Finnish–Spanish
OPUS Finnish–Turkish

Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021052402

The Karelian Finnish Newspaper Corpus

The corpus contains issues of the ’Karjalan Sanomat’ newspaper published in 2012-2014.

Latest versions/subcorpora:
The Karelian Finnish Newspaper Corpus
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021052401

Suomeksi

Data protection terms and conditions

Title of resource: Yves Montand in the USSR interviews (MONTINT)

Metadata and license: http://urn.fi/urn:nbn:fi:lb-2020081501

This page describes the specific conditions regarding the processing of the personal data in this resource. In addition to these conditions, see the guidelines for processing personal data in the Language Bank of Finland.

Controller of the data stored in the Language Bank of Finland

University of Helsinki
PO Box 3
00014 University of Helsinki
Phone: 02941 911

For further details on the data protection of the resources in the Language Bank of Finland, please contact FIN-CLARIN helpdesk.

Data Protection Officer of the University of Helsinki

Email: tietosuoja@helsinki.fi

Description of the personal data

Types of personal data in the resource

Depending on what information each interviewee agreed to provide, the resource includes the first and last names and the patronyms of some interviewees; only the first and last name of some others; and just the first name or an alias for the rest. For each interviewee, the (approximate) year of birth is also included. In addition, the location where the interview took place and the places of residence of the interviewees may be disclosed in the resource.

Even though the interviewees may be identifiable in the resource, the topic of the interviews is very general and does not reveal any sensitive or private matters concerning the participants, and it is not possible to perform surveillance of the interviewees. Thus, it is unlikely that any significant risks or harm could be caused to the participants as a result of the processing of their personal data.

Categories of data subjects

The resource includes interviews of Russians who had either personal or ”inherited” memories about the Yve Montand tour. The interviewees were found either in the social networks of the students who contributed in the resource as interviewers or via social media.

Data protection terms and conditions

In these data protection terms and conditions, Licenceholder means the party acting as the Controller for the Resources received, in accordance with the General Data Protection Regulation (EU) 2016/679. Depending on the case and the purpose of Resource use, Licenceholder may therefore mean the CLARIN service user’s employer or organisation (e.g., a university, university of applied sciences or other research organisation) or the service user personally.

The Licenceholder understands that when receiving the Resources, it becomes a Controller, as referred to in the data protection legislation. The Licenceholder must ensure that it complies with the applicable data protection legislation when processing personal data.

The purpose of use of personal data

  • This resource may only be used for non-commercial research or teaching purposes.

Location and transfer of the personal data

  • Any personal data processing outside the European Economic Area must comply with the requirements laid out in Chapter V of the General Data Protection Regulation.

Information security

The requirements for information security are based on the severity of risks that personal data processing may cause to data subjects and on the sensitivity of data. No specific information security conditions are provided for this resource.

Other conditions for data processing

Publish a link to your Privacy Notice

When you start using this resource, share the title of your project that is understandable to the general public as well as the link to the publicly available privacy notice by using this form. This information will be published on the website of the Language Bank of Finland.

Updates

This page was last updated on 21.6.2021.

Persistent identifier of this page: urn:nbn:fi:lb-2021050624

The HS.fi News and Comments Corpus

The HS.fi News and Comments Corpus contains the domestic news of the Helsingin Sanomat website and their comments from 5.9.2011 to 4.9.2012. The corpus starts with the first news of 5.9.2011 and ends with a news published in the morning on 3.9.2012 and the comments published on the website by 5.9.2012.

Latest versions/subcorpora:
The HS.fi News and Comments Corpus
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp
Search for all versions in META-SHARE

Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021051910