Newsletter of the Language Bank of Finland 2/2023

Suomeksi

Researchers of the Month in 2023

  1. Therese Lindström Tiedemann – Swedish as a second language, pseudonymisation of linguistic data
  2. Maria Sarhemaa – appellativization of first names in Finnish language
  3. Noora Hoffrén – constructed action in Finnish Sign Language and Finnish language
  4. Johanna Vaattovaara – language awareness, language attitudes
  5. Rosa González Hautamäki – within-speaker variation, human-induced voice modifications
  6. Mikael Varjo – zero-subject constructions in Finnish everyday conversation
  7. Niina Kunnas – corpus of spoken Meänkieli
  8. Nobufumi Inaba – language change, research on Old Literary Finnish
  9. Sampo Pyysalo – natural language processing (NLP), large language models (LLMs)
  10. Anna Dmitrieva – text simplification
  11. Aleksi Sahala – research on ancient text data
  12. Tiina Onikki-Rantajääskö – the Helsinki Term Bank for the Arts and Sciences

All previous researchers of the month can be found in the archive.

Do you know researchers who use the Language Bank of Finland and who might be good candidates for Researcher of the Month? Would you be one of them? Inform us: https://www.kielipankki.fi/support/contact-us/

New, updated or extended corpora in 2023

Extensions of the Newspaper and Periodical Corpus of the National Library of Finland (KLK)

The Finnish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland version 2, Korp (klk-fi-v2-korp) contains newspapers and periodicals of the digital collections of the National Library of Finland from the years 1771–2021. The corpus is available in Korp and contains over 22 billion words in total. The language of the sentences of the corpus has been identified with the HeLI-OTS language identifier. We are currently working on the corresponding Swedish sub-corpus in order to make it available via Korp during spring 2024. Read more about the KLK update here.

Korp is moving to a new server

The Korp service is currently moving to a new server, to increase performance. The process will be completed in January 2024. The available corpora and functionalities will remain the same, but searches will be faster.

In addition, a significant upgrade to the Korp software in use via the Language Bank of Finland is forthcoming during spring 2024. After the upgrade, the Korp system offered in Finland will be easier to keep in synchrony with the Korp system that is developed at Språkbanken in Sweden.

Would you like to offer your own resource to be distributed via Kielipankki?

Submit the basic details about your own resource to the Language Bank of Finland: http://urn.fi/urn:nbn:fi:lb-2021121421

Instructions: Publishing a privacy notice of processing personal data for research purposes

When you obtain a resource containing personal data via the Language Bank of Finland and start processing it for a new purpose, you must prepare a privacy notice regarding the purpose of processing, publish the notice openly in electronic format, and provide a link to the notice to the Language Bank. The purpose of a privacy notice is to help data subjects understand the purposes for which their data is used. You should always primarily follow the data protection guidelines of your own organisation. In addition, the Language Bank offers some instructions to help you collect the pieces of information that are usually required for a privacy notice regarding research purposes. Read more

Courses and training materials

The online course Corpus Linguistics and Statistical Methods (5 ECTS) will be offered again in Jan-Feb 2024 and it can be taken either in Finnish or in English. The course is open to all universities and you can also participate in it from outside Finland. Course details

The FIN-CLARIAH research infrastructure received funding for 2024–25

FIN-CLARIAH, the national research infrastructure for Social Sciences and Humanities in Finland, received FIRI funding from the Research Council of Finland for continuing its work in the period of 2024–25. FIN-CLARIAH consists of two components, FIN-CLARIN and DARIAH-FI. On December 1st, the FIN-CLARIAH people gathered together in Tampere to discuss their achievements during the past two years. You can find the deliverables produced in the FIN-CLARIAH project on the Language Bank website.

ParlaCLARIN IV in May 2024 (Turin, Italy): workshops, demos and a call for papers

During the past few years, extensive parliamentary datasets from different countries have been processed within CLARIN with the aim of compiling them in a format that allows for research in various disciplines. Researchers and developers of parliamentary resources are invited to join the ParlaCLARIN workshop to be held in Turin, Italy in May as part of the LREC2024 conference. Deadline for submissions: 19.2.2024. Read more: https://www.clarin.eu/ParlaCLARIN-IV

CLARIN funding opportunities

Did you know that CLARIN offers grants for, e.g., researcher and teacher mobility, events and training activities? Check out the funding opportunities and current calls: https://www.clarin.eu/funding

New video: ”Introduction to the Language Bank of Finland”

The brief introductory video (4 min 40 s) offers a summary of the corpora, tools, other services and opportunities for depositing your own resource that are available via the Language Bank of Finland. The video comes with Finnish and English subtitles and it can be found on our YouTube channel. Another version of the video will soon be available with examples in English.

The Language Bank Of Finland is on vacation during 23.12.2023–7.1.2024

We wish you a relaxing holiday season!

Mietta Lennes and Wilhelmina Dyster
Project Planners
fin-clarin@helsinki.fi

 


Subscribe/unsubscribe to this newsletter: https://www.kielipankki.fi/language-bank/newsletter-subscription/

See also the CLARIN Newsflash: https://www.clarin.eu/content/newsflash

 

Newsletter of the Language Bank of Finland 1/2023

Suomeksi

Researchers of the Month in 2023

  1. Therese Lindström Tiedemann – Swedish as a second language, pseudonymisation of linguistic data
  2. Maria Sarhemaa – appellativization of first names in Finnish language
  3. Noora Hoffrén – constructed action in Finnish Sign Language and Finnish language
  4. Johanna Vaattovaara – language awareness, language attitudes
  5. Rosa González Hautamäki – within-speaker variation, human-induced voice modifications
  6. Mikael Varjo – zero-subject constructions in Finnish everyday conversation

All previous researchers of the month can be found in the archive.

Do you know researchers who use the Language Bank of Finland and who might be good candidates for Researcher of the Month? Would you be one of them? Inform us: https://www.kielipankki.fi/support/contact-us/

New, updated or extended corpora in 2023

The Newspaper and Periodical Corpus (KLK) has been considerably extended

The Finnish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland version 2, Korp (klk-fi-v2-korp) is now available in Korp as a beta test version. The corpus contains newspapers and periodicals of the digital collections of the National Library of Finland from the years 1771–2021. The corpus contains over 22 billion words in total, which is over four times as many as in the previous version of the corpus. The language of the sentences of the corpus has been identified with the HeLI-OTS language identifier. Read more about the KLK update here.

Would you like to offer your own resource to be distributed via Kielipankki?

Submit the basic details about your own resource to the Language Bank of Finland: http://urn.fi/urn:nbn:fi:lb-2021121421

Instructions: Publishing a privacy notice of processing personal data for research purposes

When you obtain a resource containing personal data via the Language Bank of Finland and start processing it for a new purpose, you must prepare a privacy notice regarding the purpose of processing, publish the notice openly in electronic format, and provide a link to the notice to the Language Bank. The purpose of a privacy notice is to help data subjects understand the purposes for which their data is used. You should always primarily follow the data protection guidelines of your own organisation. In addition, the Language Bank offers some instructions to help you collect the pieces of information that are usually required for a privacy notice regarding research purposes. Read more

Courses and training materials

The online course Corpus Linguistics and Statistical Methods (5 ECTS) will be offered again in Sep-Oct 2023 and it can be taken either in Finnish or in English. The course is open to all universities and you can also participate in it from outside Finland. Course details

FIN-CLARIAH research infrastructure meeting in June

FIN-CLARIAH, the national research infrastructure for Social Sciences and Humanities in Finland, is funded by the Research Council of Finland. FIN-CLARIAH consists of two components, FIN-CLARIN and DARIAH-FI. The FIN-CLARIAH people gathered together in a workshop day held at CSC in Espoo on June 6th. On the Language Bank website, you can find a number of deliverables produced in the FIN-CLARIAH project.

CLARIN funding opportunities

Did you know that CLARIN offers grants for, e.g., researcher and teacher mobility, events and training activities? Check out the funding opportunities and current calls: https://www.clarin.eu/funding

Language Bank summer vacations are taking place during 23.6.2023–13.8.2023

but we are at your service also in summer and messages will be answered as soon as possible.

We wish you a relaxing summer!

Mietta Lennes
Projet Planning Officer
fin-clarin@helsinki.fi

 


Subscribe/unsubscribe to this newsletter: https://www.kielipankki.fi/language-bank/newsletter-subscription/

See also the CLARIN Newsflash: https://www.clarin.eu/content/newsflash

A dog laying on a pier by the water

 

Newsletter of the Language Bank of Finland 2/2022

Suomeksi

Researchers of the Month in 2022

  1. Jussi Ylikoski – Finno-Ugric languages, grammar, etymology
  2. Tuisku Vilenius – online discussions related to the Saami people
  3. Ari Huhta – language assessment, foreign language learning
  4. Terhi Ainiala – urban place names, digital discourses
  5. Mika Hämäläinen – computational creativity, language technology for endangered languages
  6. Jack Rueter – morpho-syntactic description of minority languages
  7. Sampsa Holopainen – history of the Uralic languages, etymological dictionary
  8. Filip Ginter – language technology, deep learning, NLP
  9. Mikko Laitinen – sosiolinguistics, language use in social networks
  10. Benjamin Schweitzer – special language of art music, corpus linguistics
  11. Marjatta Palander – Karelian language speech corpora
  12. Marja-Liisa Helasvuo – Finnish grammar, human interaction, resources for Finno-Ugric languages

All previous researchers of the month can be found in the archive.

Do you know researchers who use the Language Bank of Finland and who might be good candidates for Researcher of the Month? Would you be one of them? Inform us: https://www.kielipankki.fi/support/contact-us/

New corpora in 2022

Updated or extended corpora in 2022

Would you like to offer your own resource to be distributed via Kielipankki?

Submit the basic details about your own resource to the Language Bank of Finland: http://urn.fi/urn:nbn:fi:lb-2021121421

The first set of data accumulated in the Donate Speech campaign is available for academic research use – paid licenses to be offered for companies

The first version of the complete dataset includes the speech samples that were donated during 16.6.2020-14.9.2021. The total duration of the recordings in this version is approximately 3200 hours, out of which approximately 1,600 hours have been manually transcribed.

Researchers may already apply for access to Puhelahjat data. Research use in academic organizations is free of charge. Read more about using the data for research

Companies and other non-academic organizations may acquire a paid license for using one of the Puhelahjat datasets. Some of the data packages intended for commercial use are still in preparation. For further details, organizations and companies interested in using the data may already contact us by email at lahjoita-puhetta@helsinki.fi. Read more about commercial use of the data

The Donate Speech campaign is still on and you can still donate your speech in Finnish or in Swedish at https://lahjoitapuhetta.fi/. When the campaign ends, all of the data will be made available via the Language Bank.

Are you using a resource that contains personal data, obtained via Kielipankki? Remember to publish a Privacy Notice

If using a resource obtained via Kielipankki that contains personal data (the license includes a ”+PRIV” tag), you are required to submit the title of your project and a public link to the Privacy Notice regarding the purpose for which you are using the resource. Submit the information via this e-form.

Write the Privacy Notice according to the instructions given by your home organization. It is a good idea to store the document in a place where you are able to update the information when needed.

See also the guidelines for processing corpora stored in the Language Bank of Finland that contain personal data.

New speech recognition service ready for test use – Tekstiks

A new automatic speech recognition service, Tekstiks, is now up and running for test users. The automated system can recognise spoken Estonian and Finnish and produce a transcript of the recording. The Tekstiks service is the result of a collaboration between the Tallinn University of Technology, the Language Bank of Finland and Aalto University. Read more about Tekstiks and try it out!

Kielipankki is now on Mastodon

Kielipankki – Language Bank of Finland has joined the open-source social network Mastodon. Welcome to follow us! @kielipankki@toot.community

Workshop of the FIN-CLARIAH research infrastructure

FIN-CLARIAH, the national research infrastructure for Social Sciences and Humanities in Finland, received funding from the Academy of Finland for the years 2022-23. FIN-CLARIAH consists of two components, FIN-CLARIN and DARIAH-FI. The FIN-CLARIAH people gathered together in a workshop day held in Jyväskylä on 18th November. On the Language Bank website, you can find the presentation materials and a number of deliverables produced in the FIN-CLARIAH project.

CLARIN funding opportunities

Did you know that CLARIN offers grants for, e.g., researcher and teacher mobility, events and training activities? Check out the funding opportunities and current calls: https://www.clarin.eu/funding

The Language Bank staff are on holiday during 23.12.2022-8.1.2023.

We wish you a relaxing holiday season!

Mietta Lennes
Projet Planning Officer
fin-clarin@helsinki.fi

 


Subscribe/unsubscribe to this newsletter: https://www.kielipankki.fi/language-bank/newsletter-subscription/

See also the CLARIN Newsflash: https://www.clarin.eu/content/newsflash

Water flowing under the ice

 

Newsletter of the Language Bank of Finland

Suomeksi

Researchers of the Month in 2022

  1. Jussi Ylikoski – Finno-Ugric languages, grammar, etymology
  2. Tuisku Vilenius – online discussions related to the Saami people
  3. Ari Huhta – language assessment, foreign language learning
  4. Terhi Ainiala – urban place names, digital discourses
  5. Mika Hämäläinen – computational creativity, language technology for endangered languages
  6. Jack Rueter – morpho-syntactic description of minority languages

All previous researchers of the month can be found in the archive.

Do you know researchers who use the Language Bank of Finland and who might be good candidates for Researcher of the Month? Would you be one of them? Inform us: https://www.kielipankki.fi/support/contact-us/

New corpora in 2022

Updated or extended corpora in 2022

Would you like to offer your own resource to be distributed via Kielipankki?

Submit the basic details about your own resource to the Language Bank of Finland: http://urn.fi/urn:nbn:fi:lb-2021121421

General-purpose HeLI-OTS language identifier released through industry-academia cooperation

HeLI-OTS is a general-purpose language identifier that can automatically detect the language used in a text. This ELG-compatible tool selects the most suitable option from a list of 200 languages. HeLI-OTS has been developed as part of a collaborative project between University of Helsinki and Lingsoft on text and speech recognition, funded by the Finnish Research Impact Foundation. Read more

Major Korp update

Korp has been updated to version 9. In addition to bug fixes, the new Korp has some new features, although some of them will be activated only when the required support for them has been added to corpora. Please report any bugs and deficiencies in the new Korp (and also wishes) either via feedback form or by email to fin-clarin (at) helsinki.fi.

The Donate Speech campaign data available via Kielipankki in autumn 2022

The Donate Speech campaign (Lahjoita puhetta) is still on. Of the 4000 hours of Finnish speech that were donated so far, 1500 hours have been manually transcribed. The donated speech material will be made available for restricted research and development purposes via the Language Bank of Finland in autumn 2022.

LUMI supercomputer in Kajaani, hosted by CSC, is now in action

LUMI is owned by the EuroHPC Joint Undertaking, and it is run by a consortium of 10 countries with long traditions and knowledge of scientific computing. LUMI is an ecosystem for high-performance computing, artificial intelligence, and data-intensive research, which enables breakthroughs in several branches of academic research. In addition, a fifth of LUMI’s capacity is targeted to companies. Read more

COST Action ”NexusLinguarum”: Virtual Mobility Grants to support research activities and networking in a virtual setting

Within the COST Action ”NexusLinguarum”, centered around linguistic data science, a new call for Virtual Mobility Grants (VMGs) has been issued with collection date 30th of June. VMGs are a networking tool launched by the COST Association and they aim to support individual participants to foster collaborative research activities, networking with other researchers and exchange of knowledge in a virtual setting. Moreover, you can still become a memher of one of the Working Groups within the Action. Read more

Apply for CLARIN funding

Did you know that CLARIN offers grants for, e.g., researcher and teacher mobility, events and training activities? Check out the funding opportunities and current calls: https://www.clarin.eu/funding

FIN-CLARIAH infrastructure introduced its goals in a poster exhibition

FIN-CLARIAH, the national research infrastructure for Social Sciences and Humanities in Finland received funding from the Academy of Finland for the years 2022-23. FIN-CLARIAH consists of two components, FIN-CLARIN and DARIAH-FI. We organized a kick-off event where posters were presented to introduce the goals and the work that will be done by the infrastructure. See the posters here.

Read more about FIN-CLARIN: FIN-CLARIN
Read more about DARIAH-FI: DARIAH-FI
For the roadmap of FIN-CLARIAH, see also: FIN-CLARIAH

The Language Bank of Finland wishes you a relaxing summer!

Mietta Lennes
Projet Planning Officer
fin-clarin@helsinki.fi

 


Subscribe/unsubscribe to this newsletter: https://www.kielipankki.fi/language-bank/newsletter-subscription/

See also the CLARIN Newsflash: https://www.clarin.eu/content/newsflash

 

Colourful butterfly with green and yellow background

 

Newsletter of the Language Bank of Finland

Researchers of the Month in 2021

  1. Mats Fridlund – research related to digital history
  2. Emmi Lahti – rhetoric and discourse studies
  3. Heikki Rasilo – speech production and its learning mechanisms
  4. Gwenaëlle Bauvois – research related to right-wing populism, countermedia, reinformation, hybrid media and post-truth
  5. Mila Oiva – Cultural History
  6. Karita Suomalainen – interactional linguistics
  7. Olli Kuparinen – variation and change in spoken Finnish
  8. Okko Räsänen – computational modeling of infant language development
  9. Juho Leinonen – automatic speech recognition, speech alignment and chatbots
  10. Veronika Laippala – large language resources and computational methods
  11. Mikko Kurimo – automatic speech recognition
  12. Jutta Salminen – expressing negation in Finnish

All previous researchers of the month can be found in the archive.

Do you know researchers who use the Language Bank of Finland and who might be good candidates for Researcher of the Month? Would you be one of them? Inform us: https://www.kielipankki.fi/support/contact-us/

Updates to resource-specific licenses and data protection terms and conditions

The resource-specific license terms and conditions will be updated in the near future. The most prominent change is that resource-specific data protection terms and conditions will be included in the licenses of those resources that contain personal data. Information about the license updates will be published on the Language Bank website. Read more about what to expect: https://www.kielipankki.fi/news/updates-to-resource-specific-licenses-and-data-protection-terms-and-conditions/

New corpora in 2021

Those corpora that were previously available via the LAT platform (discontinued in 2020) have been moved to the download service. The content of the downloadable corpora is essentially the same as in LAT, and the samples can be studied with, e.g., Praat or ELAN. At a later stage, we intend to make some speech corpora accessible via Korp as well. The current status and access location of each corpus can be seen on its metadata record and on the page of the resource group in question.

Would you like to offer your own resource to be distributed via Kielipankki?

Submit the basic details about your own resource to the Language Bank of Finland: http://urn.fi/urn:nbn:fi:lb-2021121421

New: Resource group pages

A given resource may be available as several different versions or variants that are provided for different purposes. The new resource group pages provide an overview of all the available versions. Read more

Korp will be updated soon

The official Korp update has been postponed until January. However, many of the new features and improvements can already be tested in Korplab. Your feedback is welcome! Read more

New Aalto-ASR module for automatic speech recognition and for aligning text with speech

The upgraded Aalto-ASR 2.1 is available for testing in the Puhti environment at CSC. If required, it is also possible to install the system on a local environment from a Docker container. Read more

Courses and awarded training materials

The online course Corpus Linguistics and Statistical Methods (5 ECTS) will be offered again in Jan-Mar 2021 and it can be taken either in Finnish or in English. The course is open to all universities and you can also participate in it from outside Finland. Course details

Apply for CLARIN funding

Did you know that CLARIN offers grants for, e.g., researcher and teacher mobility, events and training activities? Check out the funding opportunities and current calls: https://www.clarin.eu/funding

The Donate Speech campaign continues – Finnish and Finland-Swedish can be donated in parallel campaigns

The Donate Speech campaign (Lahjoita puhetta) is still on. You may now donate your speech in Swedish, too! Of the 4000 hours of Finnish speech that were donated so far, 1500 hours have been manually transcribed. Starting from spring 2022, the donated speech material will be made available for restricted research and development purposes via the Language Bank of Finland.

Kielipankki  – The Language Bank of Finland, Yle (Finnish Broadcasting Company) and the Donate Speech campaign were awarded three times in 2021

The new FIN-CLARIAH infrastructure receives funding from the Academy of Finland

FIN-CLARIAH, the national research infrastructure for Social Sciences and Humanities in Finland was granted 4.6 M€ by the Academy of Finland for the years 2022-23. FIN-CLARIAH consists of two components, FIN-CLARIN and DARIAH-FI.

Read more about FIN-CLARIN: FIN-CLARIN
Read more about DARIAH-FI: DARIAH-FI
For the roadmap of FIN-CLARIAH, see also: FIN-CLARIAH

The Language Bank of Finland wishes you nice and relaxing Christmas time!

Mietta Lennes
Projet Planning Officer
fin-clarin@helsinki.fi

 


Subscribe/unsubscribe to this newsletter: https://www.kielipankki.fi/language-bank/newsletter-subscription/

See also the CLARIN Newsflash: https://www.clarin.eu/content/newsflash

 

Luminen metsä ja ladut

 

Hae Kielipankki-portaalista:
Harri Uusitalo
Kuukauden tutkija: Harri Uusitalo

 

Tulevat tapahtumat


Yhteystiedot

Kielipankin tekninen ylläpito:
kielipankki (ät) csc.fi
p. 09 4572001

Aineistoihin ja muuhun sisältöön liittyvät asiat:
fin-clarin (ät) helsinki.fi
p. 029 4129317

Tarkemmat yhteystiedot