
| Lyhenne | Nimi ja kuvailutiedot | Lisenssi | Sijainti | Viite | Aineistoryhmä ja ohje | Hae käyttöoikeutta | Julkaisuvuosi | Tukitaso |
|---|---|---|---|---|---|---|---|---|
| Lyhenne | Nimi ja kuvailutiedot | Lisenssi | Sijainti | Viite | Aineistoryhmä ja ohje | Hae käyttöoikeutta | Julkaisuvuosi | Tukitaso |
Nämä aineistoversiot eivät vielä ole saatavilla Kielipankin kautta.
| Lyhenne | Nimi ja kuvailutiedot | Lisenssi | Muoto | Tukitaso | Yhteyshenkilö | Sijainti | Aineistoryhmä ja ohje | Muu tieto |
|---|---|---|---|---|---|---|---|---|
| Lyhenne | Nimi ja kuvailutiedot | Lisenssi | Muoto | Tukitaso | Yhteyshenkilö | Sijainti | Aineistoryhmä ja ohje | Muu tieto |
ERME contains predominantly original Erzya and Moksha literature. It consists of several media publications from the 19th to the 20th century. ERME was mapped in Saransk in 1997-2004, while in Helsinki it has been mapped since 2004. The most basic format used is XML, with a granularity extending to chapter level. The goal is to create corpora with a granularity extending to word level with bibliographic reference to the sentence level.
The new version contains the literature found in the older instance and has grown markedly. While the old version was merely text divided to sentence level, the new version has lemmatization and dependencies. At sentence level contextual translation may be present (English or Finnish translation), while at word level there is morphological encoding, corresponding to each context. Preliminary morpho-syntactic analysis is carried out using HFST-based transducers and Constraint Grammar disambiguation, function and dependency tagging, which have been developed in the Giellatekno infrastructure of the University of Tromsø.
The grammatical analysis and labeling comply with the practices developed in the Giellatekno infrastructure of the University of Tromsø. These practices are applied in the documentation of several Uralic languages.
The amount of the processed material is to be increased subsequently.
While ERME contains predominantly if not solely original Erzya and Moksha literature, ERME-psla (Paragraph segementation low annotation) contains both original and translated texts. The most basic format used is XML, with a granularity set at the piece level and then automatically extended to the sentence level. The goal is to create corpora with source meta indicating authors, titles, translators, genre and collectors, etc., which where possible have geo-indendifiers and time stamps, so that the language of each individual piece (article) can be readily compared to fieldwork documentation of these language forms from various eras.
Content of ERME-PSLA:
Moksha-language texts from the Mokša journal
Time range 1956 – 2000
Download a list of all works
Erzya-language texts from the Surań tolt and Sâtko journals
Time range 1956 – 2001
Download a list of all works
Tämän sivun pysyvä tunniste: http://urn.fi/urn:nbn:fi:lb-2025101702
Viimeksi muokattu 2026-06-02