ERME Ersän ja mokšan laajennettu korpus (erme)

Saatavilla olevat versiot

Lyhenne	Nimi ja kuvailutiedot	Lisenssi	Sijainti	Viite	Aineistoryhmä ja ohje	Hae käyttöoikeutta	Julkaisuvuosi	Tukitaso
Lyhenne	Nimi ja kuvailutiedot	Lisenssi	Sijainti	Viite	Aineistoryhmä ja ohje	Hae käyttöoikeutta	Julkaisuvuosi	Tukitaso

Tulossa olevat versiot

Nämä aineistoversiot eivät vielä ole saatavilla Kielipankin kautta.

Lyhenne	Nimi ja kuvailutiedot	Lisenssi	Muoto	Tukitaso	Yhteyshenkilö	Sijainti	Aineistoryhmä ja ohje	Muu tieto
Lyhenne	Nimi ja kuvailutiedot	Lisenssi	Muoto	Tukitaso	Yhteyshenkilö	Sijainti	Aineistoryhmä ja ohje	Muu tieto

Tietoa aineistosta

ERME

ERME contains predominantly original Erzya and Moksha literature. It consists of several media publications from the 19th to the 20th century. ERME was mapped in Saransk in 1997-2004, while in Helsinki it has been mapped since 2004. The most basic format used is XML, with a granularity extending to chapter level. The goal is to create corpora with a granularity extending to word level with bibliographic reference to the sentence level.

The new version contains the literature found in the older instance and has grown markedly. While the old version was merely text divided to sentence level, the new version has lemmatization and dependencies. At sentence level contextual translation may be present (English or Finnish translation), while at word level there is morphological encoding, corresponding to each context. Preliminary morpho-syntactic analysis is carried out using HFST-based transducers and Constraint Grammar disambiguation, function and dependency tagging, which have been developed in the Giellatekno infrastructure of the University of Tromsø.

The grammatical analysis and labeling comply with the practices developed in the Giellatekno infrastructure of the University of Tromsø. These practices are applied in the documentation of several Uralic languages.

The amount of the processed material is to be increased subsequently.

ERME-PSLA

While ERME contains predominantly if not solely original Erzya and Moksha literature, ERME-psla (Paragraph segementation low annotation) contains both original and translated texts. The most basic format used is XML, with a granularity set at the piece level and then automatically extended to the sentence level. The goal is to create corpora with source meta indicating authors, titles, translators, genre and collectors, etc., which where possible have geo-indendifiers and time stamps, so that the language of each individual piece (article) can be readily compared to fieldwork documentation of these language forms from various eras.

Content of ERME-PSLA:
Moksha-language texts from the Mokša journal
Time range 1956 – 2000
Download a list of all works

Erzya-language texts from the Surań tolt and Sâtko journals
Time range 1956 – 2001
Download a list of all works

Lisenssi ja pääsy aineistoon

Tämän aineiston versiot ovat saatavilla julkisesti (PUB).
Lisenssikuvaketta napauttamalla näet tarkan aineistokohtaisen lisenssin.

Tämän sivun pysyvä tunniste: http://urn.fi/urn:nbn:fi:lb-2025101702

Viimeksi muokattu 2026-07-13