<< List of all deliverables

FIN-CLARIAH D4.1.4: R/Python module

Grant agreement: Academy of Finland no. 345610
Start date: 01-01-2022
Duration: 24 months

WP 4.1: Report on R/Python module
Date of reporting: 22-11-2023

Report author: Julia Matveeva (University of Turku), Leo Lahti (University of Turku)
Contributors: Pyry Kantanen (University of Turku), Akewak Jeba (University of Turku)
Deliverable location: https://github.com/fennicahub/fennica


  1. Python module: We have developed a Python script utilizing Pandas, designed to selectively extract MARC fields from the raw data. This script allows for the extraction of fields individually or in batches, which are then saved in CSV format. The Python module is available at the following URL: https://github.com/fennicahub/fennica/tree/master/inst/examples/field_picking.
  1. R module, known as the Fennica-R package, functions as an algorithmic toolkit designed explicitly for transparent quantitative analysis of the Finnish national bibliography, Fennica, and its metadata. Initially deployed to harmonize a subset of 70,000 entries, the module has recently undergone updates to facilitate the analysis of a more extensive dataset, now encompassing 1 million entries, including a subset for the period 1809-1917. The CSV files generated by the Python module are instrumental in further harmonization processes via the Fennica package.

The Fennica-R package is publicly accessible at https://github.com/fennicahub/fennica. See the package README for an up-to-date link to outputs generated by the package.

Search the Language Bank Portal:
Krister Lindén
Researcher of the Month: Krister Lindén


Upcoming events


The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information