The Downloadable Version of the Finnish TreeBank 3 Suomen puupankki FinnTreeBank 3:n ladattava versio shortname: finntreebank3-src metadata: http://urn.fi/urn:nbn:fi:lb-2016042601 License: CC BY http://urn.fi/urn:nbn:fi:lb-2023111301 The Finnish TreeBank3 shares data with two Finnish corpora: The Finnish part of The Helsinki Korp Europarl Bilingual Corpora (Europarl) and the Finnish part of The Helsinki Korp JRC-Acquis Bilingual Parallel Corpora (JRC-Acquis). Please note that these two corpora have different rightholders. Resource: The Helsinki Korp Europarl Bilingual Corpora (URN: urn:nbn:fi:lb-2015043011) Rightholders: European Parliament, Philipp Koehn, University of Helsinki License: CC BY http://urn.fi/urn:nbn:fi:lb-2023110701 Resource: The Helsinki Korp JRC-Acquis Bilingual Parallel Corpora (URN: urn:nbn:fi:lb-2015061210) Rightholders: European Commission - Joint Research Centre (JRC), University of Helsinki License: CC BY http://urn.fi/urn:nbn:fi:lb-2023110703 A copy of the license is included in LICENSE.txt. The license details may be subject to change, so before downloading the resource, please refer to the latest version of the license at one of the above links. This resource is the downloadable version of the Finnish TreeBank 3 (FTB3). It was created from an automatically annotated morpho-syntactic parsebank of Finnish sentences from 2012, with some light post-processing to make it conform to a grammatical model. * The data and the annotations The treebank consists of 76 million (76,369,439) tokens in 4.4 million (4,366,955) sentences from these two sources: - 2.6 million (2,613,868) sentences from JRC Acquis - 1.8 million (1,753,087) sentences from EuroParl. The format of the data is similar to CoNLL-X formats. - sentences are delimited with XML-like tags (example below) - each token record (10 tab-separated fields) is on its own line - tokens are numbered within sentence (ID, 1st field) - POSTAG is redundant with CPOSTAG (4th and 5th fields) - FEATS (6th field) is *space-separated* (also in a fixed order) - projective dependencies (9th and 10th field) are not used - most punctuation is not linked to the dependency tree The annotations were intended to follow the model defined by a manually annotated treebank (Finnish TreeBank 1 and 2). For further information, contact fin-clarin@helsinki.fi .