Examples of the folder structures of downloadable resources

The Language Bank offers a wide variety of text and speech corpora. Many of them are available for download in our download service in their original source format and/or in VRT format (the text file format extracted from Korp). In addition, some of the downloadable resources are directly available in an uncompressed form in CSC’s computing environment (see instructions for locating resources in the Language Bank).

On this page, you can find examples of the folder structures of downloadable resources. They will hopefully give you an idea about what to expect after downloading a resource or when accessing datasets on the computing environment. The examples may also help you in designing the structure of your own datasets if you wish to make them available via the Language Bank of Finland.

A simple structure with only one data file and a README.txt:

tree view of corpus nlfcl-fi-vrt
A tree view of the contents of the Finnish sub-corpus of the Classics Library of the National Library of Finland – Kielipankki version, VRT (nlfcl-fi-vrt)


A complex structure with files in various formats, ordered by date:

tree view of corpus eduskunta (top of tree)
A view of the top of the directory tree under the Plenary Sessions of the Parliament of Finland, Downloadable Version 1 (eduskunta-v1-dl)


Two variants of the same resource: the original, unannotated text documents, and the annotated version in VRT format, where the individual text documents from four categories are included and described in a smaller number of files:

tree view of corpus hcs-na-v2 (top of tree)
tree view of corpus hcs-na-v2 (bottom of tree)
Views of the top and bottom of the directory tree of Helsinki Corpus of Swahili 2.0 (HCS 2.0) Not Annotated Version (hcs-na-v2)


tree view of corpus hcs-a-v2-dl
A tree view of the Helsinki Corpus of Swahili 2.0 (HCS 2.0) Downloadable Annotated Version (hcs-a-v2-dl)


Last updated: 2023-05-19

Search the Language Bank Portal:
Krister Lindén
Researcher of the Month: Krister Lindén


Upcoming events


The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information