This resource contains a copy of the original News on the Web corpus (NOW), provided by Mark Davies on 4th June 2021 via the corpus service at https://www.english-corpora.org. The corpus contains data from web-based newspapers and magazines in 20 different English-speaking countries from Jan 2010 to 31 May 2021. The corpus is related to many other corpora of English, formerly known as the ”BYU Corpora”.
More information on Mark Davies’ corpora at Kielipankki.
Latest versions/subcorpora: | |
News on the Web – Kielipankki version 2021-05, source Metadata and license Attribution instructions |
The corpus will be available soon |
Search for all versions in META-SHARE |
Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.
Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2022112405
This resource contains a copy of the original Coronavirus Corpus, provided by Mark Davies on 4th June 2021 via the corpus service at https://www.english-corpora.org. The corpus contains data on the medical, social, cultural, and economic impact of the coronavirus (COVID-19) from online magazines and newspapers in 20 different English-speaking countries from 1 Jan 2020 to 31 May 2021. The corpus is related to many other corpora of English, formerly known as the ”BYU Corpora”.
More information on Mark Davies’ corpora at Kielipankki.
Latest versions/subcorpora: | |
The Coronavirus Corpus – Kielipankki version 2021-05, source Metadata and license Attribution instructions |
The corpus will be available soon |
Search for all versions in META-SHARE |
Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.
Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2022111705
The corpus contains different volumes of four magazines: Suomen Kuvalehti, Historiallinen aikakauskirja, Lakimies and Suomi.
Suomen Kuvalehti’s volumes: 1917, 1925, 1935, 1945, 1955, 1965, 1972 (approximately 5,4 million tokens).
Historiallinen Aikakauskirja’s volumes : 1917, 1920, 1925, 1935, 1945.
Lakimies’ volumes: 1917, 1920, 1925, 1935, 1945, 1955, 1965, 1972.
Suomi’s volumes: 1917, 1920, 1923, 1935, 1938.
The corpus is made up of two parts: one whose OCR (optical character recognition) has been checked and another one whose OCR hasn’t been checked. The former part’s size is 670 000 tokens and contains one 1935 issue from Historiallinen Aikakauskirja, Lakimies and Suomi, as well as 4 issues of Suomen Kuvalehti from each of the years mentioned above (1917, 1925, 1935, 1945, 1955, 1965 and 1972). These issues were chosen so that there would be an equal amount of texts from all year round.
Latest versions/subcorpora: | |
The Magazine Corpus of the Institute for the Languages of Finland, revised Metadata and license Attribution instructions | Select the corpus in Korp |
The Magazine Corpus of the Institute for the Languages of Finland, unrevised Metadata and license Attribution instructions | Select the corpus in Korp |
The Downloadable Version of the Magazine Corpus of the Institute for the Languages of Finland, revised Metadata and license Attribution instructions | The resource will be available soon |
The Downloadable Version of the Magazine Corpus of the Institute for the Languages of Finland, unrevised Metadata and license Attribution instructions | The resource will be available soon |
Search for these versions in META-SHARE |
Of this language corpus different versions are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.
Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-201407301
Jehovah’s Witnesses’ bible-based magazines: ’Awake!’, ’The Watchtower’, ’The Watchtower – Study Edition’ and ’The Watchtower – Study Edition (Simplified)’
Harvested from https://www.jw.org/en/library/magazines/ for the years 2010-2016, for all available languages.
Latest versions/subcorpora: | |
Jehovah’s Witnesses The Watchtower – Study Edition (Simplified) 2011-2016, Korp Metadata and license Attribution instructions | Resource will be available soon |
Jehovah’s Witnesses Magazines 2010-2016, Korp Metadata and license Attribution instructions | Resource will be available soon |
Search for all versions in META-SHARE |
Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.
Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021061821
The corpus contains issues of the ’Karjalan Sanomat’ newspaper published in 2012-2014.
Latest versions/subcorpora: | |
The Karelian Finnish Newspaper Corpus Metadata and license Attribution instructions | Select the corpus in Korp |
Search for all versions in META-SHARE |
Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.
Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021052401
The HS.fi News and Comments Corpus contains the domestic news of the Helsingin Sanomat website and their comments from 5.9.2011 to 4.9.2012. The corpus starts with the first news of 5.9.2011 and ends with a news published in the morning on 3.9.2012 and the comments published on the website by 5.9.2012.
Latest versions/subcorpora: | |
The HS.fi News and Comments Corpus Metadata and license Attribution instructions | Select the corpus in Korp |
Search for all versions in META-SHARE |
Of this language corpus different versions/subcorpora are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.
Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021051910
This corpus contains newspapers and magazines from Finland starting from 1770, compiled by the National Library of Finland.
A list of the newspapers and magazines published in Finnish: https://www.kielipankki.fi/wp-content/uploads/klk-lehdet-fi.pdf
A list of the newspapers and magazines published in Swedish: https://www.kielipankki.fi/wp-content/uploads/klk-lehdet-sv.pdf.
NB: The Finnish acronym for the corpora The Newspaper and Periodical OCR Corpus of the National Library of Finland used to be ”Digilib”, but the acronym ”klk” and the short names klk-fi-1874-dl and klk-fi-1920-dl are recommended instead from 23.11.2021 onwards. These corpora can be found on this resource group page now as well.
Latest versions/subcorpora: | |
The Finnish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland, Kielipankki Version Metadata and license Attribution instructions |
Select the corpus in Korp |
The Swedish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland, Kielipankki Version Metadata and license Attribution instructions |
Select the corpus in Korp |
The Newspaper and Periodical OCR Corpus of the National Library of Finland (1771-1874) Metadata and license Attribution instructions |
Download the resource |
The Newspaper and Periodical OCR Corpus of the National Library of Finland (1875-1920) Metadata and license Attribution instructions |
Download the resource |
The Newspaper and Periodical Corpus of the National Library of Finland, Swedish sub-corpus, 1771–1879, VRT Metadata and license Attribution instructions |
Download the resource |
The Newspaper and Periodical Corpus of the National Library of Finland, Swedish sub-corpus, 1880–1948, scrambled, VRT Metadata and license Attribution instructions |
Download the resource |
Search for these versions in META-SHARE |
Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.
Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021092404
NB: The Finnish acronym for this corpus used to be ”Digilib”, but the acronym ”klk” and the short names klk-fi-1874-dl and klk-fi-1920-dl are recommended instead from 23.11.2021 onwards. These corpora can be found on the resource group page of The Newspaper and Periodical Corpus of the National Library of Finland, Kielipankki Version now.
This corpus consists of the OCR results of the material in the corpus of publications digitized by the National Library of Finland.
The material published before 1875 is so old that any copyrights in it must have expired before 2015. For the material published from 1875 to 1920, note that parts of the resource are copyright-protected.
Latest versions/subcorpora: | |
The Newspaper and Periodical OCR Corpus of the National Library of Finland (1771-1874) Metadata and license Attribution instructions |
Download the resource |
The Newspaper and Periodical OCR Corpus of the National Library of Finland (1875-1920) Metadata and license Attribution instructions |
Download the resource |
Search for these versions in META-SHARE |
Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.
Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-202104142
This resource contains entire newspaper and magazine articles published in Finnish in the 1990s and 2000s. The goal was to create a contemporary dataset of magazines and newspapers of various origins, such as scientific journals, regional newspapers, company internal circulations, and trade union member journals. A detailed list of all magazines and newspapers contained in this resource can be found here.
This resource contains Easy-to-read content: ’Leija’ and ’Selkosanomat/Selkouutiset’.
Latest versions/subcorpora: | |
Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s, Version 2 Metadata and license Attribution instructions |
Select the corpus in Korp |
Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s, Downloadable Version 2 Metadata and license Attribution instructions |
Download the resource |
Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s (VRT), Version 2 Metadata and license Attribution instructions |
Download the resource A copy of this version is available in the computing environment. |
Search for these versions in META-SHARE |
Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.
Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021032304
The Finnish Language Text Collection (Suomen kielen tekstikokoelma) is a selection of electronic Finnish texts from the 1990s. The collection contains texts from newspapers, journals as well as books. See the content details in Finnish.
All of the material is available for academic research use. A large part of the texts is also available for commercial use.
The collection was compiled by the Institute for the Languages of Finland, the Department of General Linguistics of the University of Helsinki and the Foreign Languages Department of the University of Joensuu.
Latest versions/subcorpora: | |
The Downloadable Version of the Finnish Text Collection Metadata and license Attribution instructions | Download the resource |
The Downloadable Version of the Finnish Text Collection – Commercial Use Metadata and license Attribution instructions | Download the resource |
The Helsinki Korp Version of the Finnish Text Collection Metadata and license Attribution instructions | Select the corpus in Korp |
Search for these versions in META-SHARE |
Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.
Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-201403268
Suomen Tietotoimiston (STT) uutisarkisto sisältää uutisjakelun suomenkieliset artikkelit, jotka STT on lähettänyt media-asiakkaidensa käytettäväksi vuodesta 1992 lähtien.
Valtaosa artikkeleista on uutisjuttuja, joiden pituus vaihtelee hyvin lyhyistä ”viivauutisista” uutissähkeisiin ja pidempiin uutisjuttuihin. Artikkelit on luokiteltu osastoittain (kotimaa, ulkomaat, talous, politiikka, kulttuuri, viihde ja urheilu) sekä sisältää metadataa (IPTC-asiasanat tai avainsanat sekä tietyiltä osin paikkaluokitukset). Arkisto sisältää myös muuta STT luomaa tai välittämää materiaalia kuten asiakkaille lähetettäviä uutislupauksia, urheilutuloksia, vieraskynäartikkeleita ja tiedotteita.
Viimeisimmät versiot: | |
STT:n uutisarkisto 2019-2021, lähdemateriaali Kuvailutiedot ja lisenssi Tämän version viittausohje |
Lataa aineisto |
STT:n uutisarkisto 1992-2018, lähdemateriaali Kuvailutiedot ja lisenssi Tämän version viittausohje |
Lataa aineisto |
STT:n uutisarkisto 1992-2018, CoNLL-U, lähdemateriaali Kuvailutiedot ja lisenssi Tämän version viittausohje |
Lataa aineisto |
lisenssi kaikille STT:n uutisarkiston kokoteksti-aineistoille | |
Etsi versiot META-SHARE haun kautta |
Aineistoista työstetään Kielipankissa erilaisia versioita, jotka ovat saatavilla Kielipankin latauspalvelussa ja/tai Korp-konkordanssipalvelussa. Linkit aineistoihin löytyvät versiolistauksesta yllä.
Tarkempaa tietoa eri aineistoversioiden sisällöstä löytyy niiden kuvailutiedoista. Kuvailutiedoista löytyvät myös tiedot aineiston käyttöoikeuksista ja lisensseistä.
Tämän aineistoryhmäsivun PID: http://urn.fi/urn:nbn:fi:lb-2018121001
Ylen suomenkielinen uutisarkisto sisältää uutisartikkelit vuodesta 2011 ja ruotsinkielinen uutisarkisto vuodesta 2012 alkaen. Aineistot ovat karttuvia ja tiedot niiden uusimmista versioista julkaistaan tällä aineistosivulla.
Versiot 2011-2018: | |
Ylen suomenkielinen uutisarkisto 2011-2018, Korp Kuvailutiedot ja lisenssi Tämän version viittausohje |
Aineisto Korpissa |
Ylen suomenkielinen uutisarkisto 2011-2018, sekoitettu, Korp Kuvailutiedot ja lisenssi Tämän version viittausohje |
Aineisto Korpissa |
Ylen suomenkielinen uutisarkisto 2011-2018, lähdeaineisto Kuvailutiedot ja lisenssi Tämän version viittausohje |
Lataa aineisto |
Ylen suomenkielinen uutisarkisto 2011-2018, VRT Kuvailutiedot ja lisenssi Tämän version viittausohje |
Lataa aineisto |
Ylen suomenkielinen uutisarkisto 2011-2018, sekoitettu VRT Kuvailutiedot ja lisenssi Tämän version viittausohje |
Lataa aineisto |
Ylen suomenkielisen uutisarkiston selkouutiset 2011-2018, Korp Kuvailutiedot ja lisenssi Tämän version viittausohje |
Aineisto Korpissa |
Ylen suomenkielisen uutisarkiston selkouutiset 2011-2018, sekoitettu, Korp Kuvailutiedot ja lisenssi Tämän version viittausohje |
Aineisto Korpissa |
Ylen suomenkielisen uutisarkiston selkouutiset 2011-2018, lähdemateriaali Kuvailutiedot ja lisenssi Tämän version viittausohje |
Lataa aineisto |
Ylen suomenkielisen uutisarkiston selkouutiset 2011-2018, VRT Kuvailutiedot ja lisenssi Tämän version viittausohje |
Lataa aineisto |
Ylen suomenkielisen uutisarkiston selkouutiset 2011-2018, sekoitettu, VRT Kuvailutiedot ja lisenssi Tämän version viittausohje |
Lataa aineisto |
Yle svenska webbartiklar 2012-2018, Korp Kuvailutiedot ja lisenssi Tämän version viittausohje |
Aineisto Korpissa |
Yle svenska webbartiklar 2012-2018, blandad, Korp Kuvailutiedot ja lisenssi Tämän version viittausohje |
Aineisto Korpissa |
Yle svenska webbartiklar 2012-2018, källmaterial Kuvailutiedot ja lisenssi Tämän version viittausohje |
Lataa aineisto |
Yle svenska webbartiklar 2012-2018, VRT Kuvailutiedot ja lisenssi Tämän version viittausohje |
Lataa aineisto |
Yle svenska webbartiklar 2012-2018, blandad, VRT Kuvailutiedot ja lisenssi Tämän version viittausohje |
Lataa aineisto |
Etsi versiot META-SHARE haun kautta |
Aineistoista työstetään Kielipankissa erilaisia versioita, jotka ovat saatavilla Kielipankin latauspalvelussa ja/tai Korp-konkordanssipalvelussa. Linkit aineistoihin löytyvät versiolistauksesta yllä.
Tarkempaa tietoa eri aineistoversioiden sisällöstä löytyy niiden kuvailutiedoista. Kuvailutiedoista löytyvät myös tiedot aineiston käyttöoikeuksista ja lisensseistä.
Tämän aineistoryhmäsivun PID: http://urn.fi/urn:nbn:fi:lb-2021020901