ELG-compatible HeLI-OTS language identifier released on hub.docker.com in collaboration with Lingsoft and the University of Helsinki

HeLI-OTS is a general-purpose language identifier that can automatically detect the language used in a text. HeLI-OTS selects the most suitable option from a list of 200 languages. The current docker version is based on HeLI-OTS version 1.3, which was released earlier last month and is available on Zenodo.

New features compared to the first version 1.0 (released in June last year) are included in the latest version:
– A value can be printed for the confidence score of language identification.
– The possibility to print a list of the most likely identified languages instead of only the most likely one.
– The possibility to use as part of a text processing process: the text to be identified does not always have to be read from a file.
– The repertoire of languages used for identification can be limited both at start-up and later.
– Several language models for dialectal Finnish are used.

The Docker release was done by Lingsoft as part of the Microservices at your service project.

The project will organize two workshops for NLP tool developers and users to introduce ELG and its potential on 22 and 25 March under the title ”ELG, a bridge for NLP development”.

HeLI-OTS has been developed as part of a collaborative project between University of Helsinki and Lingsoft on text and speech recognition, funded by the Finnish Research Impact Foundation.

Links:
HeLI-OTS language identifier on hub.docker.com
HeLI-OTS version 1.3
HeLI-OTS version 1.0
Lingsoft
”Microservices at your service” project
European Language Grid (ELG)
Finnish Research Impact Foundation