D1.3.2: System for detecting toxic language

Project: FIN-CLARIAH
Grant agreement: Academy of Finland no. 345610
Start date: 01-01-2022
Duration: 24 months

WP 1.3: Report on System for detecting toxic language
Date of reporting: 2023-05-24

Report author: Veronika Laippala (UTU)
Contributors: Anni Eskelinen, Laura Silvala, Filip Ginter, Sampo Pyysalo, Veronika Laippala (UTU)
Deliverable location: https://github.com/TurkuNLP/toxicity-classifier

Description

Data and a fine-tuned model for detecting toxic language use for Finnish. The datasets include a machine translated version of the English Jigsaw dataset and a native Finnish dataset composed of discussion forum comments and their annotations.

The datasets are available at https://github.com/TurkuNLP/toxicity-classifier and as Huggingface datasets at https://huggingface.co/datasets/TurkuNLP/jigsaw_toxicity_pred_fi and https://huggingface.co/datasets/TurkuNLP/Suomi24-toxicity-annotated.

The fine-tuned model is available at https://huggingface.co/TurkuNLP/bert-large-finnish-cased-toxicity.

The work is presented and evaluated in the following article:

Anni Eskelinen, Laura Silvala, Filip Ginter, Sampo Pyysalo, and Veronika Laippala 2023. Toxicity Detection in Finnish Using Machine Translation. In theProceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa). https://openreview.net/forum?id=X5DCw7mXz4

Search the Language Bank Portal:

Researcher of the Month: Heidi Niva

Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information

D1.3.2: System for detecting toxic language

Description

News

Contact