Collection of corpora from The University of Helsinki Language Corpus Server (UHLCS)

Collection of corpora from The University of Helsinki Language Corpus Server (UHLCS)

The University of Helsinki Language Corpus Server (UHLCS) was a multilingual data bank founded in the late 1980s. The UHLCS collection includes text corpora of more than 50 languages, including minority languages and various text types. There are also tools specifically developed for analyzing the UHLCS corpora. The use of most corpora is restricted for research and teaching. Read more…

Currently available versions of this resource

Shortname	Name and metadata	License	Location	Cite	Resource group and help	Apply	Publication year	Support level
Shortname	Name and metadata	License	Location	Cite	Resource group and help	Apply	Publication year	Support level

Upcoming versions of this resource

These resource versions are not yet available in the Language Bank of Finland.

Shortname	Name and metadata	License	Formats	Support level	Contact Person	Resource group and help	Location	Other information
Shortname	Name and metadata	License	Formats	Support level	Contact Person	Resource group and help	Location	Other information

Resource information

The University of Helsinki Language Corpus Server (UHLCS) is a multilingual data bank founded in the late 1980s and maintained by the Department of General Linguistics at the University of Helsinki until September 2007. When the old server was taken out of use, the UHLCS corpora were moved to servers maintained by CSC – IT Center for Science, and the corpora were made available via the Language Bank of Finland.

At present, the UHLCS collection includes text corpora of more than 50 languages, including samples of minority languages and extensive corpora representing different text types. There are also tools specifically developed for analyzing the UHLCS corpora.

The use of most corpora is restricted for research and teaching. Resource-specific information and license conditions can be found in the metadata record of the corpus in question.

In 2000, the corpora from the Uralic, Turkic, Tungusic, Mongolic, Chukotko-Kamchatkan, Iranian and North-East Caucasian languages were edited for public use with the financial support of the Max Planck Institute for Evolutionary Anthropology, Leipzig. In summer 2003, the basis for the metadata descriptions of the corpora were prepared with the financial support of the ECHO project (ECHO = European Cultural Inheritance Online).

License and access

Some versions of this resource are available publicly (PUB), whereas others require you to log in as an academic user (ACA) or to apply for individual access rights (RES).
Click on the license image to see the resource-specific license text.
Some versions of this resource may contain personal data (license condition +PRIV). The license may then include additional data protection terms and conditions that you must follow. If processing personal data, maintain a public Privacy Notice regarding your project and provide the link to the Language Bank of Finland, see instructions.
All versions of this resource are available in the computing environment (see column ’Location’).

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2023030901

Last modified on 2025-12-18