Oikeudenhaltija on ilmoittanut, että STT:n uutisarkiston kokotekstiaineistoja koskeva lisenssi päättyy 21.2.2025. Mikäli olet saanut Kielipankin kautta käyttöoikeuden STT:n uutisarkiston kokotekstiaineistoihin, sinun on lisenssiehtojen mukaisesti lopetettava kyseisten aineistojen käyttö ja poistettava ne laitteiltasi kolmen kuukauden siirtymäajan kuluessa eli 21.2.2025 mennessä (ks. lisenssin linkki edellä). Aiemmin luvan saaneille käyttäjille on ilmoitettu asiasta myös sähköpostitse.
Huomaathan, että käyttöoikeus päättyy vain STT:n uutisarkiston kokotekstiversioiden osalta! Niitä STT:n uutisarkiston versioita, joissa on saatavilla vain rajallisia konteksteja kerrallaan (esim. Kielipankissa olevat STT:n uutisarkiston Korp-versiot) tai joissa tekstisisällön virkejärjestys on sekoitettu, on edelleen sallittua käyttää.
Ett stort tack till alla donatorer!
Från och med den 16 juni 2020 har Yle, tidigare Vake Oy (Valtion kehitysyhtiö; för närvarande Ilmastorahasto Oy) och Helsingfors universitet drivit kampanjen Lahjoita puhetta för insamling av finskt tal. I en mindre Donera prat -kampanj som startade 2021 har även finlandssvenskt tal samlats in. Under det första året av den finska kampanjen donerades mer än 3000 timmar tal. På senare tid har dock mycket få donationer kommit in.
Donationskampanjerna för finskt och finlandssvenskt tal är nu avslutade. Datamängderna kommer att organiseras och lagras av Språkbanken i Finland (Kielipankki). Via den finska Språkbanken kan forskare och företag få tillgång till Donate Speech-datamängder på särskilda villkor. Vi hoppas att data kommer att hjälpa både forskare och företag att skapa bättre modeller av finskt och finlandssvenskt tal och att utveckla framtida tjänster som lätt kan användas på finska och finlandsvenska.
A big thank you to all donors!
Starting from 16th June 2020, Yle (The Finnish Broadcasting Company), the former Vake Oy (Valtion kehitysyhtiö; currently, Ilmastorahasto Oy) and the University of Helsinki have been running the Donate Speech campaign (Lahjoita puhetta) for collecting Finnish speech. In a smaller campaign (Donera prat) that started in 2021, Finland-Swedish speech has also been collected. During the first year of the Finnish campaign, more than 3000 hours of speech were donated. More recently, however, very few donations have been rolling in.
The Finnish and Finland-Swedish speech donation campaigns are now closed. The datasets will be organized and stored by the Language Bank of Finland (Kielipankki). Via the Language Bank, researchers and companies can obtain access to Donate Speech datasets under specific conditions. We hope that the data will help both researchers and companies in creating better models of Finnish speech and in developing future services that can be readily used in Finnish.
Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Terhi Ainiala tells us about her research on city and place names and how multidisciplinary onomastics research is done.
I am Terhi Ainiala, a researcher of onomastics and university lecturer of Finnish language at the University of Helsinki. I am also the head of the department of Finnish, Finno-Ugrian and Scandinavian Studies at the Faculty of Arts.
I have researched place names for most of my career, starting from my student days. Towards the 2000s, I started to focus on urban place names, as the earlier research had concentrated on rural place names. I have studied particularly the layering of urban place names and the roles of names as part of the conceptualization of the environment and the identities of urban dwellers. Urban place names have many layers: formal and informal names, names from different eras and from different languages, and names used by different groups. Names play an important part in guiding and conceptualizing the urban space, as well as in constructing urban meanings and mental images.
The official urban nomenclature is readily available to the researcher in town plans and other official documents. There are even published books on street names. However, the informal names used in city dwellers’ everyday speech have been collected only sporadically. In my own research, I have also wanted to access the contexts in which places are spoken about, not just the lists of names. I have therefore collected the data for my studies primarily through questionnaires and interviews. The main focus of my research has been on qualitative analysis.
And how do people talk about places on the numerous social media channels? I wanted to address this question with more extensive data, and this is where my collaboration with professor Jarmo Jantunen has proved fruitful. We have combined the starting points and tools of onomastics, statistical methods and corpus-based discourse research to explore what kind of city or other place names are used in digital discourses, and how these names are used. This kind of corpus-onomastic research is a new opening in onomastics research.
In our first joint study (Ainiala, T. & Jantunen, J. H., 2019), we found that the common slang names used for Helsinki, Hesa and Stadi, share common discourses. However, they also have their own discourses. Hesa is used when Helsinki is viewed from the outside and is, for example, a destination of migration or travel. Stadi, on the other hand, reflects a native and authentic Helsinki identity, which supports previous findings on the use of the name. Stadi is most often used when referring to ”us Stadians” and looking at the city from the inside.
We have continued our research in even more multidisciplinary way and in a group of four researchers, as geographer Salla Jokela and linguist Jenny Tarvainen have joined us. In our recent article (Jantunen, J. H., Ainiala, T., Jokela, S. & Tarvainen, J., 2022), we explore the ways in which Finns talk about the cities of the Helsinki metropolitan area and the meanings attached to them. According to our results, the most common discussion topics related to Espoo, Helsinki and Vantaa are places and directions, living and mobility. However, there are differences between the cities. For instance, Helsinki and Helsinkians are often compared to Finland in general and even to the rest of Europe, but Espoo and Vantaa are not discussed in this way. Also the names of provinces and foreign cities, such as Savo, Lapland, Stockholm and London, are only associated with Helsinki in the data.
Our research provides more insight into the meanings associated with cities. The results can be used for urban planning and development, and branding.
The data used in our corpus-onomastics research comes from the extensive Suomi24 corpus, consisting of about 2.7 billion words. The corpus is formed from the Suomi24 discussion forum and is available in Kielipankki. As the data is not compiled for research purposes but consists of spontaneously generated online discussion, it provides a comprehensive view on civil discourses.
Ainiala, Terhi & Jarmo Harri Jantunen 2019: Korpusonomastinen tutkimus slanginimistä Hesa ja Stadi digitaalisissa diskursseissa. Sananjalka 61(61), 57–79. https://doi.org/10.30673/sja.80312
Ainiala, Terhi (2021: Nimet kaupunkimaisemassa: Kerrostumat, merkitykset ja mielikuvat. In T. Vahtikari, T. Ainiala, A. Kivilaakso, P. Olsson, & P. Savolainen (toim.), Humanistinen kaupunkitutkimus, 119-142. Tampere: Vastapaino.
Ainiala, T. & Sjöblom, P. 2020, Nimistöntutkimus. in M Luodonpää-Manni, M Hamunen, R Konstenius, M Miestamo, U Nikanne & K Sinnemäki (eds), Kielentutkimuksen menetelmiä I-IV. Suomalaisen Kirjallisuuden Seuran Toimituksia, Suomalaisen Kirjallisuuden Seura, pp. 800-830. https://doi.org/10.21435/skst.1457
Jantunen, Jarmo Harri, Terhi Ainiala, Salla Jokela & Jenny Tarvainen 2022: Mapping Digital Discourses of the Capital Region of Finland: Combining Onomastics, CADS, and GIS. Names 70:1, 20–39. https://doi.org/10.5195/names.2022.2289
The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.
All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Humanities of the University of Helsinki.
The short name for the following two corpora, available in our download service, used to be digilib. Since the National Library has a service called ”Digi” that offers partly the same content but in different forms or versions, we have decided to change the acronym of the Kielipankki versions from digilib into klk.
All the currently available corpora included in the collection titled The Newspaper and Periodical Corpus of the National Library of Finland, Kielipankki Version (KLK) can be found on the resource group page.
The two data sets where the name change applies:
klk-fi-1874-dl (= former digilib-pub-1771-1874): The Newspaper and Periodical OCR Corpus of the National Library of Finland (1771-1874): corpus description, corpus in download
klk-fi-1920-dl (= former digilib-pub-1875-1920): The Newspaper and Periodical OCR Corpus of the National Library of Finland (1875-1920): corpus description, corpus in download
Kahdesta Kielipankin latauspalvelussa saatavilla olevasta Kansalliskirjaston suomenkielisiä lehtiä sisältävästä aineistosta on aikaisemmin käytetty nimilyhennettä digilib. Koska Kansalliskirjastolla on oma Digi-niminen palvelu, jossa on saatavilla osittain samoja aineistoja hieman eri versioina, olemme sekaannusten välttämiseksi päätyneet vaihtamaan Kielipankista ladattavien pakettien nimilyhenteet muotoon KLK. Suosittelemme jatkossa käyttämään kaikista Kielipankissa olevista Kansalliskirjaston sanoma- ja aikakauslehtikokoelmaan kuuluvista korpuksista KLK-lyhennettä.
Kaikki Kielipankin kautta käytettävissä olevat Kansalliskirjaston sanoma- ja aikakauslehtikokoelmaan kuuluvat suomen- ja ruotsinkieliset aineistoversiot löytyvät nyt kootusti KLK-aineistoryhmän sivulta.
Nimilyhenteen vaihdos koskee seuraavia korpusversioita:
klk-fi-1874-dl (= ent. digilib-pub-1771-1874): Kansalliskirjaston sanoma- ja aikakauslehtikokoelman OCR-korpus (1771-1874): Korpuksen kuvailutiedot, korpus latauspalvelussa
klk-fi-1920-dl (= ent. digilib-pub-1875-1920): Kansalliskirjaston sanoma- ja aikakauslehtikokoelman OCR-korpus (1875-1920): Korpuksen kuvailutiedot, korpus latauspalvelussa
Suomi24-aineistokokoelmaa on nyt laajennettu vuosien 2018–2020 keskusteluilla (Suomi24 virkkeet -korpus 2018–2020, Korp-versio).
Huomaa, että laajennusosa on vielä toistaiseksi beetatestivaihessa, joten siihen voi vielä tulla pieniä muutoksia ilman eri ilmoitusta.
Kuvailutiedot: http://urn.fi/urn:nbn:fi:lb-2021101521
Avaa laajennettu osa Korpissa (testikäytössä)
Koko nykyinen Suomi24-aineisto Korpissa (vuodet 2001-2017 ja 2018-2020) on mukana kokoelmassa:
Kuvailutiedot: http://urn.fi/urn:nbn:fi:lb-2021101525
Avaa vuosien 2001-2020 aineistot Korpissa yhtä aikaa
Are you familiar with the scientific research tools available for you and your university? The Ministry of Education and Culture together with CSC provides researchers with tools that you can mostly use free of charge. Have a look and test them in practice at http://okm-palvelut.csc.fi/en.
Tunnetko jo korkeakoulusi käytössä olevat opetus- ja kulttuuriministeriön tukemat tieteen tekemisen välineet sekä koulutuksen tukipalvelut? Ota käyttöösi monet tutkijan elämää helpottavat palvelut ja asiantuntijaosaaminen. Löydät palvelut ja lisätiedot osoitteesta http://okm-palvelut.csc.fi.