Importing corpus data to Korp: technical documentation

Korp is the Web-based text corpus content search interface used in Kielipankki – The Language Bank of Finland. Korp has been (and is being) developed by Språkbanken at the University of Gothenburg. Korp is built on top of the IMS Open Corpus Workbench (CWB) corpus search software.

End users or corpus owners or compilers themselves cannot add corpora to Korp nor modify the existing corpora, so the staff of FIN-CLARIN and Kielipankki do that. The amount of work required depends on the format of the original corpus data: the closer the format is to the input format of CWB, the less work is required.

The actual Korp production server https://korp.csc.fi is at CSC. New corpora are prepared in CSC’s computing environment. At present, corpora and corpus configurations can be installed on the Korp production server by Kielipankki staff at CSC and by some members of the FIN-CLARIN staff at the University of Helsinki.

The following pages contain documentation for importing corpora to the Korp corpus search service. The documentation is mainly rather technical.

Note that this documentation is still under construction and at places outdated.

Search the Language Bank Portal:
Lotta Leiwo
Researcher of the Month: Lotta Leiwo

 

Upcoming events


Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information