Ever since the spring of 2016, the Language Bank of Finland has regularly published the Researcher of the Month series online. In these blog texts, researchers from different fields present their work and tell about the language resources they use. Now, we are already celebrating the 100th researcher profile!
All previous researchers of the month can be found in the archive.
Do you know researchers who use the Language Bank of Finland and who might be good candidates for Researcher of the Month? Would you be one of them? Inform us: https://www.kielipankki.fi/support/contact-us/
The Language Bank of Finland maintains metadata records of all the resources it distributes. Each individual resource version has its own metadata record with a persistent identifier.
For providing the metadata records, the Language Bank has been using a platform called META-SHARE, but the system is no longer supported. All our currently existing metadata records have been moved to COMEDI, a service hosted by a Norwegian CLARIN centre, CLARINO Bergen. The persistent identifiers of the metadata records curated by the Language Bank of Finland now point to the corresponding records on the COMEDI system.
Please note that, although the metadata records now look a bit different, the content and location of the actual language resources remain unchanged.
To increase performance, the Korp service was moved to a new server early this year. A major upgrade to the Korp software in use via the Language Bank of Finland is to be completed in the autumn 2024. After the upgrade, the Korp instance in Finland will be easier to keep in synchrony with the Korp system that is developed at Språkbanken in Sweden.
Due to very low usage, the Mylly service was shut down. If you still have data in Mylly or in case you wish to utilise the Mylly tool scripts on other services, read the instructions here.
In the CLARIN Standards Information System, the Language Bank of Finland now provides a list of the file formats that are supported and recommended when depositing language resources. We would be happy to receive your feedback.
Submit the basic details about your own resource to the Language Bank of Finland: http://urn.fi/urn:nbn:fi:lb-2021121421
The online course Corpus Linguistics and Statistical Methods (5 ECTS) will be offered again in Sep-Oct 2024 and again in early 2025. In the second period, we organize the course Introduction to Speech Analysis. The Data Clinic kicks off in mid-November and is intended for students planning the data management for their MA or PhD thesis. These online courses are open to students from all universities and you can also participate in them from outside Finland.
FIN-CLARIAH consists of two components, FIN-CLARIN and DARIAH-FI. On June 10th, the FIN-CLARIAH people gathered together in Helsinki to discuss the potential of artificial intelligence in research within the humanities and social sciences. In addition, the objectives of the FIN-CLARIAH project were outlined in posters, which can be found on the Kielipankki website.
The staff of the Language Bank of Finland gave presentations in the workshops of the LREC-COLING 2024 conference in Turin, Italy. Jussi Piitulainen attended the ParlaCLARIN IV workshop and presented ”Investigating Multilinguality in the Plenary Sessions of the Parliament of Finland with Automatic Language Identification”. Tommi Jauhiainenpresented at the SIGUL 2024 workshop with the title ”Improving Language Coverage on HeLI-OTS”. The new languages and improved language models described in the presentations are part of the HeLI-OTS 2.0 language identifier, published this spring.
The European Language Data Space and the University of Helsinki brought together experts from the Finnish Industry, Public Administration and Research to discuss the importance of language data for the development of Language Technologies and AI-based tools in Finland. The LDS workshop aimed to raise awareness on the European Commission’s objectives as well as the new business opportunities for the commercialization of language data. The workshop programme can be found on the LDS event page, where links to the recordings and presentations will be added later.
The panel discussions in the LDS workshop featured representatives from the public and private sector. Many of them are involved in the LAREINA project, which aims to develop speech technology for Finnish, Finland-Swedish and the Sámi languages.
Did you know that CLARIN offers grants for, e.g., researcher and teacher mobility, events and training activities? Check out the funding opportunities and current calls: https://www.clarin.eu/funding
We wish you a relaxing summer!
Mietta Lennes and Wilhelmina Dyster
Project Planners
Subscribe/unsubscribe to this newsletter: https://www.kielipankki.fi/language-bank/newsletter-subscription/
See also the CLARIN Newsflash: https://www.clarin.eu/content/newsflash
All previous researchers of the month can be found in the archive.
Do you know researchers who use the Language Bank of Finland and who might be good candidates for Researcher of the Month? Would you be one of them? Inform us: https://www.kielipankki.fi/support/contact-us/
The Finnish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland version 2, Korp (klk-fi-v2-korp) contains newspapers and periodicals of the digital collections of the National Library of Finland from the years 1771–2021. The corpus is available in Korp and contains over 22 billion words in total. The language of the sentences of the corpus has been identified with the HeLI-OTS language identifier. We are currently working on the corresponding Swedish sub-corpus in order to make it available via Korp during spring 2024. Read more about the KLK update here.
The Korp service is currently moving to a new server, to increase performance. The process will be completed in January 2024. The available corpora and functionalities will remain the same, but searches will be faster.
In addition, a significant upgrade to the Korp software in use via the Language Bank of Finland is forthcoming during spring 2024. After the upgrade, the Korp system offered in Finland will be easier to keep in synchrony with the Korp system that is developed at Språkbanken in Sweden.
Submit the basic details about your own resource to the Language Bank of Finland: http://urn.fi/urn:nbn:fi:lb-2021121421
When you obtain a resource containing personal data via the Language Bank of Finland and start processing it for a new purpose, you must prepare a privacy notice regarding the purpose of processing, publish the notice openly in electronic format, and provide a link to the notice to the Language Bank. The purpose of a privacy notice is to help data subjects understand the purposes for which their data is used. You should always primarily follow the data protection guidelines of your own organisation. In addition, the Language Bank offers some instructions to help you collect the pieces of information that are usually required for a privacy notice regarding research purposes. Read more
The online course Corpus Linguistics and Statistical Methods (5 ECTS) will be offered again in Jan-Feb 2024 and it can be taken either in Finnish or in English. The course is open to all universities and you can also participate in it from outside Finland. Course details
FIN-CLARIAH, the national research infrastructure for Social Sciences and Humanities in Finland, received FIRI funding from the Research Council of Finland for continuing its work in the period of 2024–25. FIN-CLARIAH consists of two components, FIN-CLARIN and DARIAH-FI. On December 1st, the FIN-CLARIAH people gathered together in Tampere to discuss their achievements during the past two years. You can find the deliverables produced in the FIN-CLARIAH project on the Language Bank website.
During the past few years, extensive parliamentary datasets from different countries have been processed within CLARIN with the aim of compiling them in a format that allows for research in various disciplines. Researchers and developers of parliamentary resources are invited to join the ParlaCLARIN workshop to be held in Turin, Italy in May as part of the LREC2024 conference. Deadline for submissions: 19.2.2024. Read more: https://www.clarin.eu/ParlaCLARIN-IV
Did you know that CLARIN offers grants for, e.g., researcher and teacher mobility, events and training activities? Check out the funding opportunities and current calls: https://www.clarin.eu/funding
The brief introductory video (4 min 40 s) offers a summary of the corpora, tools, other services and opportunities for depositing your own resource that are available via the Language Bank of Finland. The video comes with Finnish and English subtitles and it can be found on our YouTube channel. Another version of the video will soon be available with examples in English.
We wish you a relaxing holiday season!
Mietta Lennes and Wilhelmina Dyster
Project Planners
Subscribe/unsubscribe to this newsletter: https://www.kielipankki.fi/language-bank/newsletter-subscription/
See also the CLARIN Newsflash: https://www.clarin.eu/content/newsflash
All previous researchers of the month can be found in the archive.
Do you know researchers who use the Language Bank of Finland and who might be good candidates for Researcher of the Month? Would you be one of them? Inform us: https://www.kielipankki.fi/support/contact-us/
The Finnish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland version 2, Korp (klk-fi-v2-korp) is now available in Korp as a beta test version. The corpus contains newspapers and periodicals of the digital collections of the National Library of Finland from the years 1771–2021. The corpus contains over 22 billion words in total, which is over four times as many as in the previous version of the corpus. The language of the sentences of the corpus has been identified with the HeLI-OTS language identifier. Read more about the KLK update here.
Submit the basic details about your own resource to the Language Bank of Finland: http://urn.fi/urn:nbn:fi:lb-2021121421
When you obtain a resource containing personal data via the Language Bank of Finland and start processing it for a new purpose, you must prepare a privacy notice regarding the purpose of processing, publish the notice openly in electronic format, and provide a link to the notice to the Language Bank. The purpose of a privacy notice is to help data subjects understand the purposes for which their data is used. You should always primarily follow the data protection guidelines of your own organisation. In addition, the Language Bank offers some instructions to help you collect the pieces of information that are usually required for a privacy notice regarding research purposes. Read more
The online course Corpus Linguistics and Statistical Methods (5 ECTS) will be offered again in Sep-Oct 2023 and it can be taken either in Finnish or in English. The course is open to all universities and you can also participate in it from outside Finland. Course details
FIN-CLARIAH, the national research infrastructure for Social Sciences and Humanities in Finland, is funded by the Research Council of Finland. FIN-CLARIAH consists of two components, FIN-CLARIN and DARIAH-FI. The FIN-CLARIAH people gathered together in a workshop day held at CSC in Espoo on June 6th. On the Language Bank website, you can find a number of deliverables produced in the FIN-CLARIAH project.
Did you know that CLARIN offers grants for, e.g., researcher and teacher mobility, events and training activities? Check out the funding opportunities and current calls: https://www.clarin.eu/funding
but we are at your service also in summer and messages will be answered as soon as possible.
We wish you a relaxing summer!
Mietta Lennes
Projet Planning Officer
Subscribe/unsubscribe to this newsletter: https://www.kielipankki.fi/language-bank/newsletter-subscription/
See also the CLARIN Newsflash: https://www.clarin.eu/content/newsflash
All previous researchers of the month can be found in the archive.
Do you know researchers who use the Language Bank of Finland and who might be good candidates for Researcher of the Month? Would you be one of them? Inform us: https://www.kielipankki.fi/support/contact-us/
Submit the basic details about your own resource to the Language Bank of Finland: http://urn.fi/urn:nbn:fi:lb-2021121421
The first version of the complete dataset includes the speech samples that were donated during 16.6.2020-14.9.2021. The total duration of the recordings in this version is approximately 3200 hours, out of which approximately 1,600 hours have been manually transcribed.
Researchers may already apply for access to Puhelahjat data. Research use in academic organizations is free of charge. Read more about using the data for research
Companies and other non-academic organizations may acquire a paid license for using one of the Puhelahjat datasets. Some of the data packages intended for commercial use are still in preparation. For further details, organizations and companies interested in using the data may already contact us by email at lahjoita-puhetta@helsinki.fi. Read more about commercial use of the data
The Donate Speech campaign is still on and you can still donate your speech in Finnish or in Swedish at https://lahjoitapuhetta.fi/. When the campaign ends, all of the data will be made available via the Language Bank.
If using a resource obtained via Kielipankki that contains personal data (the license includes a ”+PRIV” tag), you are required to submit the title of your project and a public link to the Privacy Notice regarding the purpose for which you are using the resource. Submit the information via this e-form.
Write the Privacy Notice according to the instructions given by your home organization. It is a good idea to store the document in a place where you are able to update the information when needed.
See also the guidelines for processing corpora stored in the Language Bank of Finland that contain personal data.
A new automatic speech recognition service, Tekstiks, is now up and running for test users. The automated system can recognise spoken Estonian and Finnish and produce a transcript of the recording. The Tekstiks service is the result of a collaboration between the Tallinn University of Technology, the Language Bank of Finland and Aalto University. Read more about Tekstiks and try it out!
Kielipankki – Language Bank of Finland has joined the open-source social network Mastodon. Welcome to follow us! @kielipankki@toot.community
FIN-CLARIAH, the national research infrastructure for Social Sciences and Humanities in Finland, received funding from the Academy of Finland for the years 2022-23. FIN-CLARIAH consists of two components, FIN-CLARIN and DARIAH-FI. The FIN-CLARIAH people gathered together in a workshop day held in Jyväskylä on 18th November. On the Language Bank website, you can find the presentation materials and a number of deliverables produced in the FIN-CLARIAH project.
Did you know that CLARIN offers grants for, e.g., researcher and teacher mobility, events and training activities? Check out the funding opportunities and current calls: https://www.clarin.eu/funding
We wish you a relaxing holiday season!
Mietta Lennes
Projet Planning Officer
Subscribe/unsubscribe to this newsletter: https://www.kielipankki.fi/language-bank/newsletter-subscription/
See also the CLARIN Newsflash: https://www.clarin.eu/content/newsflash
All previous researchers of the month can be found in the archive.
Do you know researchers who use the Language Bank of Finland and who might be good candidates for Researcher of the Month? Would you be one of them? Inform us: https://www.kielipankki.fi/support/contact-us/
Submit the basic details about your own resource to the Language Bank of Finland: http://urn.fi/urn:nbn:fi:lb-2021121421
HeLI-OTS is a general-purpose language identifier that can automatically detect the language used in a text. This ELG-compatible tool selects the most suitable option from a list of 200 languages. HeLI-OTS has been developed as part of a collaborative project between University of Helsinki and Lingsoft on text and speech recognition, funded by the Finnish Research Impact Foundation. Read more
Korp has been updated to version 9. In addition to bug fixes, the new Korp has some new features, although some of them will be activated only when the required support for them has been added to corpora. Please report any bugs and deficiencies in the new Korp (and also wishes) either via feedback form or by email to fin-clarin (at) helsinki.fi.
The Donate Speech campaign (Lahjoita puhetta) is still on. Of the 4000 hours of Finnish speech that were donated so far, 1500 hours have been manually transcribed. The donated speech material will be made available for restricted research and development purposes via the Language Bank of Finland in autumn 2022.
LUMI is owned by the EuroHPC Joint Undertaking, and it is run by a consortium of 10 countries with long traditions and knowledge of scientific computing. LUMI is an ecosystem for high-performance computing, artificial intelligence, and data-intensive research, which enables breakthroughs in several branches of academic research. In addition, a fifth of LUMI’s capacity is targeted to companies. Read more
Within the COST Action ”NexusLinguarum”, centered around linguistic data science, a new call for Virtual Mobility Grants (VMGs) has been issued with collection date 30th of June. VMGs are a networking tool launched by the COST Association and they aim to support individual participants to foster collaborative research activities, networking with other researchers and exchange of knowledge in a virtual setting. Moreover, you can still become a memher of one of the Working Groups within the Action. Read more
Did you know that CLARIN offers grants for, e.g., researcher and teacher mobility, events and training activities? Check out the funding opportunities and current calls: https://www.clarin.eu/funding
FIN-CLARIAH, the national research infrastructure for Social Sciences and Humanities in Finland received funding from the Academy of Finland for the years 2022-23. FIN-CLARIAH consists of two components, FIN-CLARIN and DARIAH-FI. We organized a kick-off event where posters were presented to introduce the goals and the work that will be done by the infrastructure. See the posters here.
Read more about FIN-CLARIN: FIN-CLARIN
Read more about DARIAH-FI: DARIAH-FI
For the roadmap of FIN-CLARIAH, see also: FIN-CLARIAH
The Language Bank of Finland wishes you a relaxing summer!
Mietta Lennes
Projet Planning Officer
Subscribe/unsubscribe to this newsletter: https://www.kielipankki.fi/language-bank/newsletter-subscription/
See also the CLARIN Newsflash: https://www.clarin.eu/content/newsflash
All previous researchers of the month can be found in the archive.
Do you know researchers who use the Language Bank of Finland and who might be good candidates for Researcher of the Month? Would you be one of them? Inform us: https://www.kielipankki.fi/support/contact-us/
The resource-specific license terms and conditions will be updated in the near future. The most prominent change is that resource-specific data protection terms and conditions will be included in the licenses of those resources that contain personal data. Information about the license updates will be published on the Language Bank website. Read more about what to expect: https://www.kielipankki.fi/news/updates-to-resource-specific-licenses-and-data-protection-terms-and-conditions/
Those corpora that were previously available via the LAT platform (discontinued in 2020) have been moved to the download service. The content of the downloadable corpora is essentially the same as in LAT, and the samples can be studied with, e.g., Praat or ELAN. At a later stage, we intend to make some speech corpora accessible via Korp as well. The current status and access location of each corpus can be seen on its metadata record and on the page of the resource group in question.
Submit the basic details about your own resource to the Language Bank of Finland: http://urn.fi/urn:nbn:fi:lb-2021121421
A given resource may be available as several different versions or variants that are provided for different purposes. The new resource group pages provide an overview of all the available versions. Read more
The official Korp update has been postponed until January. However, many of the new features and improvements can already be tested in Korplab. Your feedback is welcome! Read more
The upgraded Aalto-ASR 2.1 is available for testing in the Puhti environment at CSC. If required, it is also possible to install the system on a local environment from a Docker container. Read more
The online course Corpus Linguistics and Statistical Methods (5 ECTS) will be offered again in Jan-Mar 2021 and it can be taken either in Finnish or in English. The course is open to all universities and you can also participate in it from outside Finland. Course details
Did you know that CLARIN offers grants for, e.g., researcher and teacher mobility, events and training activities? Check out the funding opportunities and current calls: https://www.clarin.eu/funding
The Donate Speech campaign (Lahjoita puhetta) is still on. You may now donate your speech in Swedish, too! Of the 4000 hours of Finnish speech that were donated so far, 1500 hours have been manually transcribed. Starting from spring 2022, the donated speech material will be made available for restricted research and development purposes via the Language Bank of Finland.
FIN-CLARIAH, the national research infrastructure for Social Sciences and Humanities in Finland was granted 4.6 M€ by the Academy of Finland for the years 2022-23. FIN-CLARIAH consists of two components, FIN-CLARIN and DARIAH-FI.
Read more about FIN-CLARIN: FIN-CLARIN
Read more about DARIAH-FI: DARIAH-FI
For the roadmap of FIN-CLARIAH, see also: FIN-CLARIAH
The Language Bank of Finland wishes you nice and relaxing Christmas time!
Mietta Lennes
Projet Planning Officer
Subscribe/unsubscribe to this newsletter: https://www.kielipankki.fi/language-bank/newsletter-subscription/
See also the CLARIN Newsflash: https://www.clarin.eu/content/newsflash