Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
---|---|---|---|---|---|---|---|---|
Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
These resource versions are not yet available in the Language Bank of Finland.
Shortname | Name and metadata | License | Formats | Support level | Contact Person | Resource group and help | Location | Other information |
---|---|---|---|---|---|---|---|---|
Shortname | Name and metadata | License | Formats | Support level | Contact Person | Resource group and help | Location | Other information |
FinEst BERT is a Bidirectional Encoder Representations from Transformers (BERT) multilingual model trained from scratch, covering three languages: Finnish, Estonian, and English. Used for various NLP classification tasks on the mentioned three languages, supporting both monolingual and multilingual/crosslingual (knowledge transfer) tasks. Whole-word masking used during data preparation and training; trained for 40 epochs with sequence length 128 and another 4 epochs with sequence length 512. FinEst BERT model published here is in pytorch format.
Corpora used:
Finnish – STT articles, CoNLL 2017 shared task, Ylilauta downloadable version
Estonian – Ekspress Meedia articles, CoNLL 2017 shared task
English – English wikipedia
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091402
Last modified on 2025-05-15