Collection of corpora from The University of Helsinki Language Corpus Server (UHLCS)

Suomeksi

The University of Helsinki Language Corpus Server (UHLCS) is a multilingual data bank founded in late 1980. The UHLCS collection includes text corpora of more than 50 languages, including minority languages and various text types. There are also tools specifically developed for analyzing the UHLCS corpora. The use of most corpora is restricted for research and teaching. Read more…

Subcorpora:
Chuvash Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
English Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
Corpus of Erzya and Moksha Mordvin Literature and Journals and Komi Zyrian Literature (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
Erzya and Moksha Mordvin Word List Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
Estonian Corpus 1 (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
Estonian Corpus 2 (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
Finnish Corpus (Bibles) (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
Finnish Corpus (Literature) (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
The Helsinki Korp Version of the Finland-Swedish Text Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Korp
The Finland-Swedish Text Corpus (UHLCS), source
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
Ingrian Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
Khanty Corpus (North Khanty, Corpora and Translations) (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
Komi Zyrian Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
Latin Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
Lude (Ludian) Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
Nenets Corpus (Tundra Nenets) (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
North Saami Corpus (Literature) (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
North Saami Corpus (Sámikultuvradoaibmagotti smiehttamush) (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
Quantifiers and Quantification in Finnish and Languages Spoken in the Central Volga–Kama Region (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
The Susanne Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
Ume Saami Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
Uralic, Turkic, Indo-Iranian and Mongol languages; languages of Siberia and Caucasia (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
Uzbek-English Dictionary (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
Lists of Words Corpus (UHLCS)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Apply for access rights
Access the corpus in Puhti
Search for all versions in META-SHARE

Corpus contents

The University of Helsinki Language Corpus Server (UHLCS) is a multilingual data bank founded in late 1980 and maintained by the Department of General Linguistics at the University of Helsinki until September 2007. When the old server was taken out of use, the UHLCS corpora were moved to servers maintained by CSC – IT Center for Science, and the corpora were made available via the Language Bank of Finland.

At present, the UHLCS collection includes text corpora of more than 50 languages, including samples of minority languages and extensive corpora representing different text types. There are also tools specifically developed for analyzing the UHLCS corpora.

The use of most corpora is restricted for research and teaching. Resource-specific information and license conditions can be found in the metadata record of the corpus in question.

In 2000, the corpora from the Uralic, Turkic, Tungusic, Mongolic, Chukotko-Kamchatkan, Iranian and North-East Caucasian languages were edited for public use with the financial support of the Max Planck Institute for Evolutionary Anthropology, Leipzig. In summer 2003, the basis for the metadata descriptions of the corpora were prepared with the financial support of the ECHO project (ECHO = European Cultural Inheritance Online).


Last updated: 28.2.2024

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2023030901

Search the Language Bank Portal:
Juraj Šimko
Researcher of the Month: Juraj Šimko

 

Upcoming events


Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information