Forthcoming resources

Suomeksi

>> See the published corpora

100suom
Hundred Finnish Linguistic Life Stories
a
b
MP4,DOCX,XLSX
b
MP4,DOCX,XLSX
a
BHanna Lappalainenhttps://blogs.helsinki.fi/100suomalaista/
Akkala
The Corpus of Spoken and Written Akkala Saami
a
a
a
VRT
a
Korp
Michael Riessler
amph-korp
amph-Corpus, Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
Antti Arppe
coca-korp-2020
Corpus of Contemporary American English - Kielipankki Korp version 2020
ACA
A
a
Korp
Bicon-question-circleFIN-CLARIN
coronavirus-2021-05-src
The Coronavirus Corpus - Kielipankki version 2021-05, source
RES
R
a
BFIN-CLARIN
DIALUKI
DIALUKI - Diagnosing reading and writing in a second or foreign language
RES
R
c
a
TXT
a
VRT
a
Korp
Ari Huhta
dma-v2
Digital Morphology Archives, new version
PUB
P
c
VRT
a
Korp
dma-wn-fn-src
The Word Notes of the Morphology Archives with field reports, source
RES
R
c
PDF
icon-question-circle
dma-wn-src
The Word Notes of the Digital Morphology Archives, source
RES
R
c
a
PDF
a
icon-question-circle
DSPCON2013-2015-korp
Aalto University DSP Course Conversation Corpus 2013-2015, Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
Mikko Kurimo,
Seppo Enarvi
eduskunta-v2-dl
Plenary Sessions of the Parliament of Finland, Downloadable Version 2
a
a
MP4, WAV, TXT
a
WAV, ELAN, VRT
a
icon-question-circle
eduskunta-v2-korp
Plenary Sessions of the Parliament of Finland, Kielipankki Korp Version 2
a
c
TXT
a
WAV, ELAN, VRT
a
Korp
icon-question-circle
enets
Enets Corpus
a
a
MP4, WAV, ELAN
a
MP4, WAV, ELAN
a
Download, Korp
Olesya Khanina
english-uhlcs-korp
English Corpus (UHLCS), Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
erme-dl
ERME Erzya and Moksha Extended Corpora, full text/download version
c
b
XML
b
VRT
Jack Rueter
Ersä
Corpus of Colloquial Erzya
c
c
a
ELAN
a
Riho Grünthal
erzya-moksha-komi-uhlcs-korp
Corpus of Erzya and Moksha Mordvin Literature and Journals and Komi Zyrian Literature (UHLCS), Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
erzya-moksha-uhlcs-korp
Erzya and Moksha Mordvin Word List Corpus (UHLCS), Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
estonian1-uhlcs-korp
Estonian Corpus 1 (UHLCS), Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
estonian2-uhlcs-korp
Estonian Corpus 2 (UHLCS), Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
fcaa
Finnish Conversation Analysis Archive
a
a
WAV,MP3,MP4,RTF,PDF
a
Mari Siiroinenhttps://metashare.csc.fi/repository/browse/finnish-conversation-analysis-archive/65669f5eb7e611eb9cdefa163ec5ae3e69c8f5f510064ad999f16144700b1156/
fedidi
Citation Database of Fennistic Dialect Dissertations
a
c
TXT
c
TXT
a
findarc
Finnish Dark Web Marketplace Corpus
RES
R
a
c
JSONLINE
a
VRT
a
Tuomas Harviainen
finears
Finnish electroacoustic music interviews
a
b
WAV,DOCX
b
WAV,TXT
a
Mikko Ojanenhttps://blogs.helsinki.fi/finnish-electroacoustic-resources/
FinIntas
The FinINTAS Corpus of Spontaneous and Read-aloud Finnish Speech
a
c
wav + Praat
a
ELAN
a
Mietta Lennes
finlangus
Spoken language and linguistic tasks of Finnish-American immigrants and controls
a
Nana Lehtinen
finnish-bibles-uhlcs-korp
Finnish Corpus (Bibles) (UHLCS), Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
finnish-literature-uhlcs-korp
Finnish Corpus (Literature) (UHLCS), Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
FinnTreeBank1-korp
The Helsinki Korp Version of the Finnish TreeBank 1
PUB
P
c
c
TXT
a
VRT
a
Korp
ha-korp
Ha Language Corpus, Helsinki Korp Version
PUB
P
c
c
TXT
a
VRT
a
Korp
Lotta Aunio
hanty-uhlcs-korp
Khanty Corpus (North Khanty, Corpora and Translations) (UHLCS), Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
helpuhe-2010txt
The Longitudinal Corpus of Finnish Spoken in Helsinki (2010 in text form)
c
c
TextGrid
a
VRT
a
Korp
icon-question-circleHanna Lappalainen
helpuhe-v2-korp
The Longitudinal Corpus of Finnish Spoken in Helsinki (1970s, 1990s and 2010s), Helsinki Korp Version 2
RES
R
c
a
a
a
icon-question-circleHanna Lappalainen
helpuhe-v2-lat
The Longitudinal Corpus of Finnish Spoken in Helsinki (1970s, 1990s and 2010s), Helsinki LAT Version 2
RES
R
c
c
TextGrid
b
ELAN
a
Bicon-question-circleHanna Lappalainen
HS
The Helsingin Sanomat Archive Corpus
c
a
a
VRT
a
Korp
Jarkko Rahkonen
ingrian-uhlcs-korp
Ingrian Corpus (UHLCS), Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
Inkerin murteet
The Corpus of Ingrian Finnish
a
a
WORD-DOC, MP3, WAV
a
VRT, ELAN
a
Marjatta Palanderwww, muuta
iweb-src
The Intelligent Web Corpus - Kielipankki version, source
a
BFIN-CLARIN
kikosa-haa
University of Oulu Kikosa Collection: Group interviews
a
c
WAV, EAF, TXT
c
WAV, EAF, TXT
a
Maria Frick
kikosa-kok
University of Oulu Kikosa Collection: Student meetings
a
c
WAV, EAF, TXT
c
WAV, EAF, TXT
a
Maria Frick
Kiltinänsaame
The Corpus of Written Kildin Saami
PUB
P
a
a
a
VRT
b
Korp
Mikael Riessler
Kiltinänsaame (UHLCS)
Kildin Saami Corpus (UHLCS)
c
c
PDF
b
PDF
a
Pirkko Suihkonen
komi-ikdp
Spoken Komi Corpus: IKDP
a
b
MP4, WAV, ELAN
a
MP4, WAV, ELAN
a
Niko Partanen
komi-uhlcs-korp
Komi Zyrian Corpus (UHLCS), Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
kra-korp
Jyväskylä Corpus of Middle French, Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
latin-uhlcs-korp
Latin Corpus (UHLCS), Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
long-second
The Long Second Corpus: LONGitudinal Classroom Data about Children’s Development in Finnish as a SECOND Language
b
c
ELAN, MP4
a
VRT, ELAN
a
Download, Korp
Maria Ahlholm
Lönnrot
Elias Lönnrot Letters Online
PUB
P
c
a
XML
a
VRT
a
Korp
Kirsi Keravuoriwww
lude-uhlcs-korp
Lude (Ludian) Corpus (UHLCS), Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
mlcca
MLCCA, Multilingual Corpus of Contracts and Agreements
RES
R
a
c
XML, VRT
c
XML, VRT
a
AMikhail Mikhailov
movie-src
The Movie Corpus - Kielipankki version, source
a
BFIN-CLARIN
mutable-src
Multimodal Translation with the Blind
a
c
MP4, EAF, TXT
b
MP4, EAF, TXT
a
BMaija Hirvonenhttps://projects.tuni.fi/mutable/the-mutable-corpus/
nenets-uhlcs-korp
Nenets Corpus (Tundra Nenets) (UHLCS), Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
Nganasan
Nganasan Speech Corpus
c
a
ELAN
a
VRT, ELAN
a
Larisa Leisiö
nmk-korp
Changes in Place Names Corpus, Helsinki Korp Version
ACA
A
c
a
a
VRT
a
Korp
Elisa Stenvall
nmk-lat
Changes in Place Names Corpus, Helsinki LAT Version
ACA
A
c
a
a
ELAN
a
Elisa Stenvall
NorDiga
The Nordica Digital Archive
a
a
a
VRT
a
Korp
Jan Lindströmwww
north-saami-literature-uhlcs-korp
North Saami Corpus (Literature) (UHLCS), Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
north-saami-report-uhlcs-korp
North Saami Corpus (Sámikultuvradoaibmagotti smiehttamush) (UHLCS), Helsinki Korp Version Corpus
c
c
TXT
a
VRT
a
Korp
now-2021-05-src
News on the Web - Kielipankki version 2021-05, source
a
BFIN-CLARIN
nzadi
Nzadi Corpus
a
a
WAV, PDF, TXT
a
WAV, PDF, TXT
a
Download, Korp
Thera Marie Crane
ona
The Audio Recordings Archive of Oulu (ONA)
RES
R
c
b
a
ELAN
a
Niina Kunnas
Opus ECB
Opus ECB Corpus
PUB
P
a
a
a
Jörg Tiedemann
Opus EU
Opus EU Corpus
PUB
P
a
a
a
Jörg Tiedemann
Opus Localization
Opus Localization Corpus
a
a
a
Jörg Tiedemann
Opus Subtitles
Opus Subtitles Corpus
PUB
P
a
a
a
Jörg Tiedemann
oulu-korp
Oulu Corpus, Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
parole-fi-korp
The Finnish Parole Corpus, Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
PERSO
PERSO Databases for Finnish Speech Synthesis
c
c
TXT, WAV
a
ELAN
a
Martti Vainio, Heini Kallio
ProoF
ProoF - Pronunciation of Finnish by Immigrants in Finland
a
a
wav + Praat
a
ELAN
a
Mietta Lennes
Prosodiakorpus
Corpus of Prosodic Variation of Finnish
a
a
a
ELAN
a
Tommi Kurki, Tommi Nieminen
puhelahjat-annotated
Donate Speech: Annotated dataset (for commercial use)
c
c
WAV, FLAC, JSON
c
FLAC, CSV, TXT, TextGrid, ELAN
b
Aicon-question-circleFIN-CLARINhttps://www.kielipankki.fi/lahjoita-puhetta/
puhelahjat-complete
Donate Speech: Complete dataset (version 1, for commercial use)
c
c
WAV, FLAC, JSON
c
FLAC, CSV, TXT, TextGrid, ELAN
b
Aicon-question-circleFIN-CLARINhttps://www.kielipankki.fi/lahjoita-puhetta/
puhelahjat-dev
Donate Speech Corpus: Development data (10h)
RES
R
c
c
WAV, FLAC, JSON
c
FLAC, CSV, TXT, TextGrid, ELAN
b
Aicon-question-circleAnssi Moisiohttps://www.kielipankki.fi/lahjoita-puhetta/
puhelahjat-korp
Donate Speech Corpus, Korp
RES
R
a
c
TXT, TextGrid
a
VRT
a
Korp
Aicon-question-circleFIN-CLARIN
puhelahjat-sample
Donate Speech Corpus: Sample (for commercial use)
c
c
WAV, FLAC, JSON
c
FLAC, CSV, TXT, TextGrid, ELAN
b
Aicon-question-circleAnssi Moisiohttps://www.kielipankki.fi/lahjoita-puhetta/
puhelahjat-selected
Donate Speech: Selected dataset (for commercial use)
c
c
WAV, FLAC, JSON
c
FLAC, CSV, TXT, TextGrid, ELAN
b
Aicon-question-circleFIN-CLARINhttps://www.kielipankki.fi/lahjoita-puhetta/
puhelahjat-test
Donate Speech Corpus: Test data (10h)
RES
R
c
c
WAV, FLAC, JSON
c
FLAC, CSV, TXT, TextGrid, ELAN
b
Aicon-question-circleAnssi Moisiohttps://www.kielipankki.fi/lahjoita-puhetta/
puhelahjat-test-mtr
Donate Speech Corpus: Multi-transcriber test data (1h)
RES
R
c
c
WAV, FLAC, JSON
c
FLAC, CSV, TXT, TextGrid, ELAN
b
Aicon-question-circleAnssi Moisiohttps://www.kielipankki.fi/lahjoita-puhetta/
puhelahjat-test-mtrs
Donate Speech Corpus: Test data from multi-transcriber speakers (10h)
RES
R
c
c
WAV, FLAC, JSON
c
FLAC, CSV, TXT, TextGrid, ELAN
b
Aicon-question-circleAnssi Moisiohttps://www.kielipankki.fi/lahjoita-puhetta/
puhelahjat-train
Donate Speech Corpus: Training data (100h)
RES
R
c
c
WAV, FLAC, JSON
c
FLAC, CSV, TXT, TextGrid, ELAN
b
Aicon-question-circleAnssi Moisiohttps://www.kielipankki.fi/lahjoita-puhetta/
quantlang-uhlcs-korp
Quantifiers and Quantification in Finnish and Languages Spoken in the Central Volga–Kama Region (UHLCS), Helsinki Korp Version
c
c
PDF
a
VRT
a
Korp
Saamen kielen korpus
Giellagas Corpus of Spoken Saami Languages
c
c
a
ELAN
b
Marko Jouste
sapu
The Corpus of Sociolinguistic Variation in the Province of Satakunta
RES
R
a
b
WAV, TextGrid, TXT
b
WAV, TextGrid, TXT
a
Tommi Kurki
sfnet-korp
SFNET Corpus, Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
SignWiki
The SignWiki Project of the Sign Languages in Finland
a
a
a
ELAN
a
Leena Savolainenwww
skk-vrt
Classics of Finnish Literature, VRT
PUB
P
c
VRT
icon-question-circlePetri Lauerma
soap-src
Corpus of American Soap Operas - Kielipankki version, source
a
BFIN-CLARIN
stat-fi-en
Statistics Finland Translation Memory Finnish-English
c
TMX
b
TMX
a
stat-fi-sv
Statistics Finland's Finnish to Swedish Translation Memory
c
TMX
b
TMX
a
stt-fi-1992-2018-korp
Finnish News Agency Archive 1992-2018, Kielipankki Korp Version
PUB
P
c
a
VRT
a
Korp
Olli Viitala
sus-fieldwork
The Finno-Ugrian Society Fieldwork Corpus
PUB
P
c
a
a
VRT
a
Korp
Jack Rueter
Suvi
Suvi Finnish Sign Language Online Dictionary
a
a
a
ELAN
a
Leena Savolainenwww
TAITO
Written and Oral Data of the TAITO-project
a
a
TXT
a
ELAN
a
Marjo Vesalainenwww
tampuhe
Longitudinal data of Tampere spoken language
RES
R
a
b
WAV, TextGrid
b
WAV, TextGrid
a
Liisa Mustanoja
tboneslim-src
T-Bone Slim Corpus, source
PUB
P
a
b
PDF, JPG, TIFF
a
PDF, TXT
a
AKirsti Salmi-Niklanderhttps://blogs.helsinki.fi/tboneslim
testipiste
Testipiste Corpus
c
a
VRT
a
Korp
Janne Laitinen
Turjansaame
The Corpus of Spoken and Written Ter Saami
PUB
P
a
a
TXT
a
VRT
a
Korp
Michael Riessler
tv-src
The TV Corpus - Kielipankki version, source
a
BFIN-CLARIN
tver-1980
The Corpus of Tver Karelian 1957-1971
a
c
WAV, EAF, TXT
c
WAV, EAF, TXT
a
BMarjatta Palander
tver-2020
The Corpus of Tver Karelian 2016-2019
a
c
WAV, EAF, TXT
c
WAV, EAF, TXT
a
BMarjatta Palander
ume-saami-uhlcs-korp
Ume Saami Corpus (UHLCS), Helsinki Korp Version Corpus
c
c
TXT
a
VRT
a
Korp
uralic-uhlcs-korp
Uralic, Turkic, Indo-Iranian and Mongol languages; languages of Siberia and Caucasia (UHLCS), Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
uzbek-uhlcs-korp
Uzbek-English Dictionary (UHLCS), Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
VVKS
Virtual Old Literary Finnish (VVKS) - Kielipankki Korp version
PUB
P
c
a
a
VRT
a
Korp
Mari Siiroinen
wikipedia-fi-2017-korp
Finnish Wikipedia 2017, Korp
PUB
P
c
c
VRT
b
VRT
a
Korp
Tatu Huovilainen
wordlists-uhlcs-korp
Lists of Words Corpus (UHLCS), Helsinki Korp Version
c
c
TXT
a
VRT
a
Korp
Yle-subtitle
The Finnish Broadcasting Company Corpus of Subtitles
a
a
TXT
a
VRT
a
Korp
Jukka Mäkisalo
ylenews-fi-2019-2020-selko-par-src
Parallel Corpus of Finnish and Easy-to-read Finnish from the Yle News Archive 2019-2020, source
a
b
XLSX, TXT
a
CSV, TXT
a
Anna Dmitrieva
ylenews-fi-2019-2021-selko-korp
Yle News Archive Easy-to-read Finnish 2019-2021, Korp
ACA
A
a
Korp
Aicon-question-circle
ylenews-fi-2019-2021-selko-s-korp
Yle News Archive Easy-to-read Finnish 2019-2021, scrambled, Korp
PUB
P
a
Korp
Aicon-question-circle