Status 12.11.2018: DRAFT


Publishing software is different from getting it to work on the programmer’s machine or on the machines of a small research group. This guideline is intended as a short guide to define the minimal steps necessary to prepare a software publication at the Language Bank of Finland.

Name, short name and version

The software needs

  • a name (e.g. ”Helsinki Finite State Technology”),
  • a short name (e.g. ”hfst”)
  • and a version (Major.Minor.Patch, e.g. 3.15.0).

If an older version of the same software exists, a decision needs to be made whether to update metadata in an existing description or to create new metadata. An update is recommended if the new version fixes bugs (a ”patch”), a separate metadata page is recommended if the new version offers new or concurrent functionality, i.e. if there is reason to keep the old version online.

The package

  • The package needs to contain only the relevant data, no .tmp directories, etc.
  • If it contains source code the code should cleanly compile with either no or as few warnings as possible.
  • The code also needs to be tested, the idea is not to publish a package only to immediately publish a patch.
  • The format is zip.
  • The zip name is usually ”short” (e.g. ””).
  • The package must extract into a  subdirectory named identical to the package name, without .zip (e.g. ”hfst-3.15.0/”).
  • Upload the ready package to and inform

The license

The package has to have a license to inform the user what he or she can and cannot do with the software. Less restrictive licenses are preferred, the license should be stated in the README.txt or a LICENSE.txt file.


Software without a manual and a description cannot be published. Both can be short, but they have to be present. The manual describes how to install and use the software. The manual can be a set of files in the package, like README.txt, INSTALL.txt, MANUAL.txt depending on the complexity of the software.

It should contain:

  • The intended audience
    • The operating systems the software runs on.
    • The level of expertise needed to run the software (e.g. compile from source in Linux vs. install in Mac/Windows from a package).
  • Installation instructions
    • Dependencies, if needed (e.g. compiler, other tools)
    • An installation and de-installation script.
  • Instructions how to run the software
    • All tools (if bundled)
    • All options of the tools (for example a man page or equivalent)
    • Examples of all tools
  • A reference to a tier 1 technical support address in case of problems.
  • A reference to the descriptive metadata in the form of a PID (URN and/or Handle).

Descriptive metadata

The descriptive metadata describes a specific instance of the software. It is not a manual, but helps a user searching for software to determine whether the software is worth downloading. The PID pointing to the metadata is the persistent identifiert of the software version in question. The metadata in turn points to the download location of the software and explains where the manual can be found (e.g. inside the package or on a separate web page). Every update gets a new version number. We follow ”Sematic Versioning”: Major.Minor.Patch. New patches can be updated without changing the PID of the metadata, Major and Minor update usually require a new metashare page and the retirement of the now obsolete version. The metadata should also contain the license information. The PID of the metadata needs to be mentioned in the README.txt of the downloadable file.

Significant updates

To update the major or minor version a new metashare page needs to be created describing the new version and a change log relative to the present version. 2 new PIDs need to be created, one pointing to the metadata and one to the new download location. The related versions need to be linked using Metashare’s relations feature, see the Language Bank’s Language Resource Life Cycle Model An example from our corpora: The older version should be kept for at 5 years, either online for download or offline in IDA. Software older than 10 years can be deleted, unless it has historical value.

Bug fixes

If the new version has no new functionality and is only a patch (eg. 1.1.1), no PIDs need to be updated, the publication of the new version needs to be marked in the Change Log of the metadata. The non-patched version should be kept in IDA for 5 years just in case.


Consider finnish-tagtools version 1.1: The metadata describes the software, the license and where more information about using the software and technical support can be found. A rough update shedule is also given. The update to Version 1.2 should happen as described in ”Significant updates” above: A new metashare page needs to be written, with a Change Log section in the description descibing the main new features/bug fixes. If the old version should not be kept online, the access location PID needs to be changed to a tombstone page describing that the software can be obtained from IDA.


A quick reminder of the topics above.

  • Name
  • Version
  • License
  • Intended audience
  • Manual
  • Installation instructions
  • clean package in zip format
  • Descriptive metadata (
  • PIDs (at least one to metadata)
  • README/LICENSE/INSTALL.txt, (can be only README.txt), contains
    • License
    • PID to metadata
    • Manual or link to manual
    • Installation instructions
  • Change Log
  • Ready packages to /proj/clarin/download/preview




Hae Kielipankki-portaalista:
Mila OivaKuukauden tutkija: Mila Oiva



Kielipankin tekninen ylläpito:
kielipankki (ät)
p. 09 4572001

Aineistoihin ja muuhun sisältöön liittyvät asiat:
fin-clarin (ät)
p. 029 4144036 / 029 4129317