FIN-CLARIN board meeting 1/2019

Time: Wed 20.2.2019 at 12:15-15.15
Place: Sokos Hotelli Vaakuna, Asema-aukio 2, 00100 Helsinki, 10th floor, cabinet Loisto

DISCUSSION MEMO

Participants:
Sisko Brunni (University of Oulu)
Ulla-Maija Forsberg (Kotus)
Nobufumi Inaba (University of Turku)
Ari Huhta (University of Jyväskylä)
Tommi Jauhiainen (FIN-CLARIN)
Merja Koskela (University of Vaasa)
Mikko Kurimo (Aalto-yliopisto, 14 onwards)
Mikhail Mikhailov (Tampere University)
Stefan Werner (University of Eastern Finland)
Per Öster (CSC)
Krister Lindén, chair
Hanna Westerlund, secretary

1. Participants present and information dissemination

  • Krister Lindén declared the meeting open
  • Tommi Jauhiainen and Per Öster presented themselves
  • The information on RDHum 2019 https://www.oulu.fi/suomenkieli/node/55261 was updated. https://www.oulu.fi/suomenkieli/node/56805 . There will be six workshops, and 60-70 abstracts are expected. (SB)
  • A questionnaire on using language resources in research and teaching will be distributed first for comments and then for distribution further within the FIN-CLARIN consortium universities. (HW)
  • FIN-CLARIN administrative documents have been transferred to the Language Bank portal https://www.kielipankki.fi/organisaatio/fin-clarin/kokoukset-ja-asiakirjat/ Those older than 3 years will be available as pdf and the attachments can be obtained per request.
  • ELEXIS https://elex.is/ (U-MF)
  • Crowdsourcing project TRANSBANK https://transbank.info/ (MM)
  • Some statistics: Follow up of the FIN-CLARIN board priority list for publication of the resources; 15 resources made public since last spring https://www.kielipankki.fi/aineistot/ . On the other hand, there are 35 upcoming new resources so the backlog is increasing. 17 resources are most likely to be ready this year https://www.kielipankki.fi/aineistot/tulevat/.  There are 23 tools currently available and 12 of them have been added to the tools webpage https://www.kielipankki.fi/tyokalut/ during 2018.
  • Research Data Alliance meets in Finland in October 2019. Language research interest group.  https://www.csc.fi/en/-/research-data-alliance-kokoontuu-suomessa-lokakuussa-2019 (PÖ)

2. Language Bank Roadshows 2019

  • Roadshow 26.2.2019 University of Turku
  • Roadshow in Joensuu by the XLVI Annual Conference of Linguistics 16.-18.5.2019 (Former link: https://www.uef.fi/en/-/16-18-5-2019-xlvi-kielitieteen-paivat-joensuu)
  • Roadshow in Oulu by RDhum 2019 on 14.-16.8.2019, with the workshop Tools and services in the Language Bank of Finland
  • Tampere University, University of Jyväskylä and University of Vaasa : possible during the autumn term. Dates will be finalized during the spring (minimum 3 months in advance).

3. Funding available for procurement, ideas:

  • NoDaLiDa’s approach for funding. (KL)
  • Agreed: RDHum conference Travelling expenses of the Finnish plenarists. (KL)
  • Terminology Forum created by Anita Nuopponen at UWasa. The general language resources could be made available via the Language Bank (Special language dictionaries are updated as part of classes). Requires resources for updating the links. (MK)
  • Compiling a corpus of bilateral agreements expanded into a corpus of treaties and agreements with web crawling and manual checking. (MM)
  • Prioritizing replacing LAT since Flash will not be available much longer. (SW)
  • Suggestions will be discussed at a Skype meeting, based on estimated budgets. Doodle poll will follow.

4. FIN-CLARIN UPGRADE

  • After an overview by KL of the existing plan, the board gave some comments on the application.

Ideas for the application?

  • uploading information from distance, e.g. University of Jyväskylä to upload datasets and maybe also parallel datasets, Korp interface etc. Standardized metadata (KL)
  • speech data, how much data can be added? Interviews, everyday speech etc. Dialogue? Depends on how the data is recorded and what its quality is, overlapping speech, overlap annotation, speaker segmentation works pretty well nowadays too (MiK)
  • video + transcription – > transcription of the image. Transcribed speech and transcribed image (MM) MeMAD memad.eu multimodal projects, some tools could be used, share to tools in the github but not all problems have been fixed. Forced alignment? Several languages, the models are typically quite big (MM)
  • uploading information from distance, e.g. University of Jyväskylä to upload datasets and maybe also parallel datasets, Korp interface etc. Standardized metadata (KL)
  • audio description toolkit available, other sounds than human speech, blog entry tagging for sound at the project website (MK)
  • translation data?
  • learner’s assessment and audio data. Data of test performances exist, not annotated but proficiency level marked, 6-7 languages -> perhaps available for other purposes.
  • extracting whole texts in Korp
  • contemporary literature (SW) audio books (MiK)
  • more detailed search interface
  • making resources available through meta-share (MiK)
  • emergency calls, pathological speech (SW), air speech (MK), University lectures, online lectures -> transcription for training data (MiK)
  • E-learning
  • Timetable: Decision from UH expected mid-March, board Skype-meeting before mid-April and submittance of the application to the Academy of Finland mid-May.

5. No other issues

6. The meeting was declared closed at 15.15

Travelling expenses form (docx):
matkalaskulomake_fi-en_01112017

Hae Kielipankki-portaalista:
Kuukauden tutkija: Emma Sepänaho

 

Tulevat tapahtumat

  1. Course: Data Clinic 2019-20

    1.11.201917.4.2020
  2. Kielipankki kiertueella Vaasan yliopistossa

    12.2.2020 16.0018.00

Yhteystiedot

Kielipankin tekninen ylläpito:
kielipankki (ät) csc.fi
p. 09 4572001

Aineistoihin ja muuhun sisältöön liittyvät asiat:
fin-clarin (ät) helsinki.fi
p. 029 4140599 / 029 4129317