Preferred file types

Data at the Language Bank of Finland is usually distributed using the file types and formats described below. While we accept that incoming data will not always be in our preferred formats, publication will be faster, if the data is already fully or partially in our preferred formats.

Text

  • UTF-8 for text encoding, combined characters are normalized to single characters if possible
  • VRT
  • PDF for rendered text (will be provided alongside original format if possible)
  • JSON and TEI are in preparation (will be generated from VRT)

Time-aligned annotations of audio and video

  • Praat TextGrid (text/praat-textgrid)
  • EAF / ELAN (Eudico Annotation Format, IMDI document type: text/x-eaf+xml; MIME type: text/xml)

Audio

  • WAV for uncompressed audio
  • AAC for compressed audio

Video

  • mp4/mpeg-4 for compressed video
Search the Language Bank Portal:
Tommi Kurki
Researcher of the Month: Tommi Kurki

 

Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4140599 / +358 29 4129317