<< List of all deliverables

D1.3.1: Corpora of non-standard language

Grant agreement: Academy of Finland no. 345610
Start date: 01-01-2022
Duration: 24 months

WP 1.3: Report on Corpora of non-standard language
Date of reporting: 2022-09

Report author: Veronika Laippala (UTU)
Contributors: Veronika Laippala, Filip Ginter, Sampo Pyysalo, Anni Eskelinen, Anna Salmela (UTU)
Deliverable location: turkunlp.org | github.com/TurkuNLP


1) Text quality data

2) Register (genre) annotations for Oscar

3) Toxic language use for Finnish

  • Toxic language can be defined as rude, disrespectful language, likely to make someone leave a discussion
  • Toxic language data and models for Finnish to be published in early 2023 (submitting to Nodalida)
  • Will be available at github.com/TurkuNLP and as a Huggingface dataset
Search the Language Bank Portal:
Harri Uusitalo
Researcher of the Month: Harri Uusitalo


Upcoming events


The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information