We use the CEAL parallel corpus here as an example: urn:nbn:fi:lb-2020012801
Some corpora have their official name decided already in the agreement. This should be checked before actually publishing a resource. The metadata in metashare can be older than the agreement and thus have an incorrect name. The name in the agreement can be adjusted to suit the version of the resource being published.
The name of the corpus has been decided in the agreement to be: ”Englantilaisen ja amerikkalaisen kirjallisuuden klassikoita Kersti Juvan suomentamina, englanti-suomi rinnakkaiskorpus”.
No English version of the name was given, so the name was translated as ”Classics of English and American Literature as translated by Kersti Juva, English-Finnish parallel corpus”.
The different versions of the resource get various fixed terms attached to the name. The four basic versions we provide are named as follows:
If the version of the resource is somehow scrambled to allow less restrictive licencing, we add the word ”scrambled”, ”sekoitettu”, or ”blandad” to the name before the word indicating the basic version. Example:
Shortname is used in Metashare, Portal, the Download service, and IDA. Shortnames are written completely in lowercase letters. Characters allowed are ”a–z”, ”0–9” and ”-”.
The source version is indicated by ”-src”, Korp version by ”-korp”, and the VRT version by ”-vrt”.
In Korp, make the relation to shortname as clear as possible, for example: ”ceal_par”. Korp source in IDA: ceal-par/ceal_par_korp_20150323.tgz
In Some, the hashtags being with ”lb_”, indicating a language bank resource. ”lb_” is followed by the common name of the resource family, such as:
If a new version of the corpus is created by adding to or modifying the original texts, the version information is added after the name of the corpus. Example:
If we modify the attribute information in the vrt-file, for example re-parsing with new parser and not including the old one, we add the version information after the VRT (or Korp, etc.) word. Examples: