VRT format

(Under construction)

This page will describe the contents of a VRT formatted document in simple terms, e.g., for a user who downloads and needs to use resources in VRT format.

VRT (VeRticalized Text) is the input format for the IMS Open Corpus Workbench (CWB) software underlying Korp. VRT is a token-oriented columnar text format: each token (word) is on its own line together with its possible annotation attributes (positional attributes), such as lemma, part of speech, morphological analysis and syntactic relation, separated by tabs. The structure of the text is represented with XML-style tags (structural attributes) on their own lines. Start tags may contain XML-style attributes for the structure. In contrast to XML, VRT does not require a single root element (structural attribute), so a VRT input may consist of a sequence of texts, for example.

(There is another, more technical description of VRT documents for internal use and for resource depositors at https://www.kielipankki.fi/development/korp/corpus-input-format/.)

 

 

This page has a persistent identifier: urn:nbn:fi:lb-2023020121