<< List of all deliverables

D1.3.2: System for detecting toxic language

Grant agreement: Academy of Finland no. 345610
Start date: 01-01-2022
Duration: 24 months

WP 1.3: Report on System for detecting toxic language
Date of reporting: 2023-05-24

Report author: Veronika Laippala (UTU)
Contributors: Anni Eskelinen, Laura Silvala, Filip Ginter, Sampo Pyysalo, Veronika Laippala (UTU)
Deliverable location: https://github.com/TurkuNLP/toxicity-classifier


Data and a fine-tuned model for detecting toxic language use for Finnish. The datasets include a machine translated version of the English Jigsaw dataset and a native Finnish dataset composed of discussion forum comments and their annotations.

The datasets are available at https://github.com/TurkuNLP/toxicity-classifier and as Huggingface datasets at https://huggingface.co/datasets/TurkuNLP/jigsaw_toxicity_pred_fi and https://huggingface.co/datasets/TurkuNLP/Suomi24-toxicity-annotated.

The fine-tuned model is available at https://huggingface.co/TurkuNLP/bert-large-finnish-cased-toxicity.

The work is presented and evaluated in the following article:

Anni Eskelinen, Laura Silvala, Filip Ginter, Sampo Pyysalo, and Veronika Laippala 2023. Toxicity Detection in Finnish Using Machine Translation. In theProceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa). https://openreview.net/forum?id=X5DCw7mXz4

Search the Language Bank Portal:
Juraj Šimko
Researcher of the Month: Juraj Šimko


Upcoming events


The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information