The Finnish Wikipedia 2017 source material corpus contains all Finnish articles from the online encyclopedia Wikipedia available in 1 January 2018. The text parts of the articles have been extracted from Wikipedia Dumps with WikiExtractor.
|Finnish Wikipedia 2017, source
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
|icon-sign-in Download the resource
icon-sign-in Access the data on Puhti
|Search for all versions in META-SHARE|
Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.
How to access a specific corpus in the Language Bank of Finland
Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.
A version of this corpus is directly available in an uncompressed form in CSC’s computing environment. The data can be found in the directory /appl/data/kielipankki. You can open a connection to the environment by using an ssh application on your local machine, or via a browser interface. See further instructions on connecting to the computing environment (CSC).
Snapshot of the structure of the data on Puhti:
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091411