Finnish conversational chat corpus (finchat)

Suomeksi


Currently available versions of this resource

ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level
ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level

Upcoming versions of this resource

These resource versions are not yet available in the Language Bank of Finland.

ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information
ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information

Resource information

The corpus contains 85 Finnish chat dialogs which have been collected during 2019-2020. 62 Participants were university staff, university students and high schoolers. For more detailed information, see the article listed below.

Please cite the following paper when using the corpus: K. Leino, J. Leinonen, M. Singh, S. Virpioja and M. Kurimo. ”FinChat: Corpus and evaluation setup for Finnish chat conversations on everyday topics.” INTERSPEECH. 2020.

Link: https://github.com/aalto-speech/FinChat

License and access

  • This resource is available publicly (PUB).
  • Click on the license image to see the resource-specific license text.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2022060901

Suomi 24 resource group

Suomeksi


Currently available versions of this resource

ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level
ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level

Upcoming versions of this resource

These resource versions are not yet available in the Language Bank of Finland.

ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information
ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information

 

Resource information

The resource consists of the discussions posted on the Suomi 24 discussion forum. The content was parsed and annotated with automatic methods.

Further details of each version of the resource are maintained in the metadata record, findable via the persistent identifier (see the link at the resource title).

License and access

  • Some versions of this resource are available publicly (PUB), whereas others require you to log in as an academic user (ACA). Click on the license icon to see the resource-specific license text.
  • Some/all versions of this resource may contain personal data (license condition +PRIV). The license may then include additional data protection terms and conditions that you must follow. If processing personal data, maintain a public Privacy Notice regarding your project and provide the link to the Language Bank of Finland, see instructions.
  • A copy of some versions of this resource may be readily available in the computing environment (see column ’Location’). icon-question-circle

Important notes

The Language Bank of Finland aims to expand the Suomi 24 resource approximately every two years. The material update published in 2025 includes messages up to the end of 2023. Please note that when updates are made, some messages or message threads from previous versions may be deleted if they have also been deleted from the original Suomi24 platform.

For details of changes in version updates please see the Change history.

Additional instructions

Via the Korp service, it is possible to perform versatile search queries from the content and to obtain various statistics and visualizations (see Korp instructions).

Without logging in via Korp, you can see the items matching your search criteria as brief excerpts only. At each word token in the concordance, you can find a link to the original message and discussion thread on the original Suomi 24 discussion platform, in case they are still available there. If required, researchers can also log in in case they need to view the wider context around the matching items.

In addition to the corpus versions that are available in Korp, the corresponding full text documents are available for logged-in researchers in VRT format either on the CSC computing environment or as downloadable packages via the download service of Kielipankki (see the column ’Location’). In order to use the computing environment, researchers need a CSC user account. Please note, however, that in order to use the full text data efficiently, some technical and programming skills are usually required. The Korp service provides many opportunities for studying and analyzing the Suomi 24 corpus, so it is recommended that you first make sure whether Korp is suitable for your purpose.

 

 


This page has a persistent identifier: http://urn.fi/urn:nbn:fi:lb-2022011221

FinnSentiment (finsen)

Suomeksi


Currently available versions of this resource

ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level
ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level

Upcoming versions of this resource

These resource versions are not yet available in the Language Bank of Finland.

ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information
ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information

Resource information

FinnSentiment is a Finnish social media corpus for sentiment polarity annotation. 27,000 sentence data set annotated independently with sentiment polarity by three native annotators. The corpus and its creation has been documented in https://arxiv.org/pdf/2012.02613.pdf.

License and access

  • The versions of this resource are available publicly (PUB).
  • Click on the license image to see the resource-specific license text.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021081106

Corpus of Global Web-Based English (GloWbE)

Suomeksi


Currently available versions of this resource group

ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level
ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level

Upcoming versions of this resource group

These resource versions are not yet available in the Language Bank of Finland.

ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information
ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information

Resource information

The Corpus of Global Web-Based English (GloWbE) contains about 1.8 billion words from web pages in the United States, Great Britain, Australia, India, and 16 other countries. About 60 % of the texts come from blogs. It is unique in the way that it allows you to carry out comparisons between different varieties of English. The original, frequently updated version of GloWbE is provided by Mark Davies via the corpus interface at english-corpora.org. The Language Bank of Finland offers a ”snapshot” version of GloWbE under a restricted academic license that is available for users affiliated with a university in Finland. 

More information about all corpora from english-corpora.org that are available via the Language Bank

License and access

For the license text of an individual corpus, click on the license image in the corpus list, or see the metadata record (click on the link at the corpus title). Note that there are specific additional terms and conditions that apply on this and other corpora from BYU, see https://www.corpusdata.org/restrictions.asp. The link is included in the official license.

Korp versions

  • Some of the corpus versions are available for searching via the Korp concordancer tool (click on the link under ’Location’).
  • Access to the Korp versions requires academic login via a university in Finland.

Downloadable versions

  • Access to the downloadable corpora mentioned above is restricted to researchers affiliated to member universities of the FIN-CLARIN consortium in Finland. Download access can usually be provided to graduate or postgraduate students in case the applicant needs the corpora for an MA thesis or for a PhD dissertation.
  • To obtain access to restricted corpora, please submit an application via the Language Bank Rights (after logging in to the LBR service, search the catalogue for ’Mark Davies’ downloadable corpora at Kielipankki.’).
  • To access the download service, click on the link under ’Location’, or see the metadata record for the link.

This page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2017061927

 

SFNET Corpus

Suomeksi


Currently available versions of this resource

ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level
ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level

Upcoming versions of this resource

These resource versions are not yet available in the Language Bank of Finland.

ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information
ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information

Resource information

The corpus contains written discussion in the SFNET Internet discussion forum in Finnish from 2002-2003.

License and access

  • This resource requires you to log in as an academic user (ACA).
  • Click on the license image to see the resource-specific license text.
  • Some versions of this resource are available in the computing environment (see column ’Location’). icon-question-circle

 


This page has a persistent identifier: http://urn.fi/urn:nbn:fi:lb-2021052501

The HS.fi News and Comments Corpus (HS.fi)

Suomeksi


Currently available versions of this resource

ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level
ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level

Upcoming versions of this resource

These resource versions are not yet available in the Language Bank of Finland.

ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information
ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information

Resource information

The HS.fi News and Comments Corpus contains the domestic news of the Helsingin Sanomat website and their comments from 5.9.2011 to 4.9.2012. The corpus starts with the first news of 5.9.2011 and ends with a news published in the morning on 3.9.2012 and the comments published on the website by 5.9.2012.

Important: pseudonyms should be anonymized in publications referring to the corpus.

License and access

  • The versions of this resource require you to log in as an academic user (ACA).
  • Click on the license image to see the resource-specific license text.

 


This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021051910

Ylilauta Corpus

Suomeksi


Currently available versions of this resource

ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level
ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level

Upcoming versions of this resource

These resource versions are not yet available in the Language Bank of Finland.

ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information
ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information

Resource information

The corpus contains text from discussions of the Ylilauta online discussion board from 2012 to 2014. Short fragments from the discussions, e.g. sentences or paragraphs, are publicly available in Kielipankki – the Language Bank of Finland.

License and access

  • All versions of this resource are available publicly (PUB).
  • Click on the license image to see the resource-specific license text.
  • Some versions of this resource are available in the computing environment (see column ’Location’). icon-question-circle

 


This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021042602

Suomi 24 -aineistoryhmä

In English


Saatavilla olevat versiot

LyhenneNimi ja kuvailutiedotLisenssiSijaintiViiteAineistoryhmä ja ohjeHae käyttöoikeuttaJulkaisuvuosiTukitaso
LyhenneNimi ja kuvailutiedotLisenssiSijaintiViiteAineistoryhmä ja ohjeHae käyttöoikeuttaJulkaisuvuosiTukitaso

Tulossa olevat versiot

Nämä aineistoversiot eivät vielä ole saatavilla Kielipankin kautta.

LyhenneNimi ja kuvailutiedotLisenssiMuotoTukitasoYhteyshenkilöSijaintiAineistoryhmä ja ohjeMuu tieto
LyhenneNimi ja kuvailutiedotLisenssiMuotoTukitasoYhteyshenkilöSijaintiAineistoryhmä ja ohjeMuu tieto

Tietoa aineistosta

Aineisto koostuu Suomi 24 -foorumilta kerätyistä keskusteluista. Sisältö on jäsennetty automaattisin menetelmin.

Kunkin aineistoversion tarkemmat tiedot päivitetään kuvailutietueeseen, joka löytyy pysyvällä tunnisteella (ks. linkki aineiston otsikon kohdalla).

Lisenssit ja pääsy aineistoihin

  • Jotkin tämän aineiston versiot ovat saatavilla julkisesti (PUB), kun taas toisiin täytyy kirjautua akateemisena käyttäjänä (ACA). Lisenssikuvaketta napauttamalla näet tarkan aineistokohtaisen lisenssin.
  • Tämän aineiston versioihin voi sisältyä henkilötietoja (lisenssissä on merkintä +PRIV). Lisenssiin voi silloin sisältyä myös aineistokohtaisia tietosuojaehtoja, joita sinun on noudatettava. Jos käsittelet henkilötietoja, ylläpidä projektiasi koskevaa julkista tietosuojailmoitusta ja toimita sen linkki Kielipankille, ks. ohjeet.
  • Joidenkin tämän aineiston versioiden kopio voi olla saatavilla myös suoraan laskentaympäristössä (ks. Sijainti-sarake).

Huomautuksia

Kielipankki pyrkii kartuttamaan Suomi 24 -aineistoa noin kahden vuoden välein. Vuonna 2025 julkaistu aineistopäivitys sisältää viestit vuoden 2023 loppuun saakka. Huomaa, että päivitysten yhteydessä osa aiemmissa versioissa olleista viesteistä tai viestiketjuista saatetaan poistaa, jos ne on poistettu myös alkuperäiseltä Suomi24-alustalta.

Tarkemmat tiedot versiopäivityksiin tehdyistä muutoksista löydät Muutoshistoriasta (in English).

Ohjeita

Korpin kautta tarjottavasta Suomi 24 -korpuksesta voi tehdä monipuolisia hakuja ja tilastoida tai kuvantaa hakutuloksia eri tavoin (katso Korp-palvelun ohjeet).

Kirjautumattomille käyttäjille aineiston tekstisisällöstä löytyneet hakuosumat näytetään Korpissa lyhyinä otteina. Hakuosumien kohdalta on linkit alkuperäiseen viestiin ja keskusteluketjuun Suomi 24 -palvelimella, mikäli nämä ovat edelleen olemassa. Tarvittaessa tutkija saa näkyviin myös laajemman kontekstin kirjautumalla Korp-palveluun.

Korp-palvelussa näkyvän korpusversion lisäksi vastaava VRT-muotoinen kokotekstiaineisto on kirjautuneiden tutkijoiden käytettävissä CSC:n laskentaympäristössä tai ladattavissa omalle koneelle Kielipankin latauspalvelusta (ks. Sijainti-sarake). Laskentaympäristön käyttäminen edellyttää CSC:n myöntämää käyttäjätunnusta. Huomaa, että kokotekstiaineiston hallintaan ja tehokkaaseen käsittelyyn tarvitaan yleensä jonkin verran teknistä osaamista ja ohjelmointitaitoja. Korp-palvelu tarjoaa monia mahdollisuuksia myös Suomi 24 -aineiston tutkimiseen, joten kannattaa ensin varmistaa, sopiiko se omaan tarkoitukseesi.

 

 


Tämän sivun PID: http://urn.fi/urn:nbn:fi:lb-2017021630

Last modified on 2025-10-29

Search the Language Bank Portal:
Krista Ojutkangas
Researcher of the Month: Krista Ojutkangas

 

Upcoming events


Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information