Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Juhana Salonen, project researcher at the University of Jyväskylä tells us about publishing the resource Corpus of Finnish Sign Language.
My name is Juhana Salonen and I work as a project researcher in the Sign Language Centre of the University of Jyväskylä. I´m responsible for the corpus work of Finland´s national sign languages (Finnish and Finland-Swedish Sign Language). Majoring in Finnish Sign Language, I graduated with an M. Phil. in the fall of 2012.
Together with the team, I am working on an infrastructure for research on the corpora of both sign languages. I have been working in the corpus project since 2014, during which time we have filmed a total of 103 native sign language users from all over Finland. I acted as a guide in the filming sessions, where I was able to follow informants’ conversations and narrations up close while they were recorded by a total of seven different camera angles. The result was over 700 hours of video footage. After the data collection and editing, the video material was annotated using the ELAN program (Eudico Linguistic Annotator). The annotation was carried out by distinguishing utterances from the signed text stream at both the sign and sentence levels. The signs were identified with the help of ID-glosses that are connected online to a lexical database of the Finnish Signbank, and the sentences were translated into Finnish. We have tried to make the annotation of the large dataset as systematic as possible, so that the data can be applied and used by different researchers for a range of different research objectives.
The primary goals of our corpus work are to preserve the data in the long term, and to publish various parts of it, which will be done in accordance with the informants’ research consent and the terms of data protection legislation. The Language Bank has provided an excellent setting for achieving our goals, for which we are very grateful. The first subset of the Corpus of Finnish Sign Language (Corpus FinSL) was transferred to the Language Bank in March 2019. Corpus FinSL comprises approximately 14.5 hours of video material from 21 signers, together with textual annotations and metadata. The material is divided into two subcorpora (Corpus of Finnish Sign Language: elicited narratives and Corpus of Finnish Sign Language: conversations), the first of which is publicly available and the second of which requires a research plan and personal access rights, in accordance with the RES license of the Language Bank. The published data has already been exploited both in research on Finnish Sign Language and in teaching, which is only the prelude to a great leap forward in the field of sign language, for example in terms of the development of both learning materials and the social status of the language.
· Salonen, J., Puupponen, A., Takkinen, R. & Jantunen, T. (2019). Suomen viittomakielten korpusta rakentamassa [Building the corpus of Finland´s sign languages]. In Jantunen, Jarmo Harri; Brunni, Sisko; Kunnas, Niina; Palviainen, Santeri; Västi, Katja (Eds.) Proceedings of the Research data and humanities (RDHUM) 2019 conference: data, methods and tools, Studia Humaniora Ouluensia, 17. Oulu: Oulun yliopisto, 83-98. http://urn.fi/urn:isbn:9789526223216
· The Corpus of Finnish Sign Language (Corpus FinSL) in the Language Bank: http://urn.fi/urn:nbn:fi:lb-2019012321
· Homepages of the corpus work of Finland´s sign languages: http://r.jyu.fi/AB7
The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.
All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive.