The Downloadable Version of the Finnish TreeBank 2

Suomen puupankki FinnTreeBank 2:n ladattava versio

shortname: finntreebank2-src

metadata: http://urn.fi/urn:nbn:fi:lb-2016042505

IPR holder: University of Helsinki

license: CC BY SA 4.0 (Attribution-ShareAlike 4.0 International)
The complete license is available at http://urn.fi/urn:nbn:fi:lb-2023103101

A copy of the license is included in LICENSE-CC-BY-SA.txt. The license details
may be subject to change, so before downloading the resource, please
refer to the latest version of the license at the above link.

This resource is the downloadable version of the Finnish TreeBank 2 (FTB2).


The data consists of a sample from the Finnish Wikipedia,
manually annotated in the FinnTreeBank project around 2012,
to supplement the grammar examples that constitute FTB1:
- wikipedia-samples_tab.txt (57 sentences, 887 tokens)

(A couple of similarly annotated running-text samples had to be left
out due to uncertain or denied permissions.)

The 10 tab-separated fields correspond to the CoNLL-X format (with
fields 6, 9, 10 unused and morphological tags misplaced to field 5,
separated by vertical bars within the field). Sentences are separated
by empty lines and start with a VISK-number as a comment line.

FinnTreeBank is a result of a collaborative effort by two parties:
- Language Technology Unit, Department of Modern Languages, University of Helsinki
- The Research Institute for the Languages of Finland.


For further information, contact fin-clarin@helsinki.fi .
