UHLCS corpus collection

The University of Helsinki Language Corpus Server (UHLCS) is a multilingual data bank and data server which has been located at the Department of General Linguistics, the University of Helsinki. In Septemberg 2007, the UHLCS was moved to CSC (the Finnish IT Center for Science). The UHLCS, which is maintained by the University of Helsinki, was founded late in 1980. At present, the UHLCS contains computer corpora from more than 50 languages, including samples of minority languages and extensive corpora representing different text types. In 2000, the corpora from the Uralic, Turkic, Tungusic, Mongolic, Chukotko-Kamchatkan, Iranian and North-East Caucasian languages were edited for public use with the financial support of the Max Planck Institute for Evolutionary Anthropology, Leipzig. In summer 2003, the basis for the metadata descriptions of the corpora were prepared with the financial support of the ECHO-project (ECHO = European Cultural Inheritance Online). There are also tools at the UHLCS which can be used in analyzing the corpora. The use of most of the corpora is restricted for research and teaching.

The following corpora are available in Kielipankki – the Language Bank of Finland (puhti.csc.fi, access rights instructions).

Latest versions/subcorpora:  

Chuvash Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

English Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

Corpus of Erzya and Moksha Mordvin Literature and Journals and Komi Zyrian Literature (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

Erzya and Moksha Mordvin Word List Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

Estonian Corpus 1 (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

Estonian Corpus 2 (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

Finnish Corpus (Bibles) (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

Finnish Corpus (Literature) (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

The Helsinki Korp Version of the Finland-Swedish Text Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Korp

The Taito Version of the Finland-Swedish Text Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

Ingrian Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

Khanty Corpus (North Khanty, Corpora and Translations) (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

Komi Zyrian Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

Latin Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

Lude (Ludian) Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

Nenets Corpus (Tundra Nenets) (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

North Saami Corpus (Literature) (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

North Saami Corpus (Sámikultuvradoaibmagotti smiehttamush) (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

Quantifiers and Quantification in Finnish and Languages Spoken in the Central Volga–Kama Region (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

The Susanne Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

Ume Saami Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

Uralic, Turkic, Indo-Iranian and Mongol languages; languages of Siberia and Caucasia (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

Uzbek-English Dictionary (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti

Lists of Words Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions

Access the corpus in Puhti
Search for all versions in META-SHARE  

 

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2023030901

Search the Language Bank Portal:
Aleksi Sahala
Researcher of the Month: Aleksi Sahala

 

Upcoming events


Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information