Grant agreement: Academy of Finland no. 345610
Start date: 01-01-2022
Duration: 24 months
WP 3.1: Report on Ingestion framework
Date of reporting: 2022-12
Report author: Johanna Lilja (National Library of Finland), Tuula Pääkkönen (National Library of Finland)
Contributors: Martin Matthiesen (CSC)
Deliverable location: https://github.com/CSCfi/kielipankki-nlf-harvester
Basic concept of how the data is downloaded exists. Technology defined (Apache airflow for workflow management) has been chosen. Script created for downloading METS XML, and then ALTO XML files via Airflow. CSC Project created with necessary data requests.
FIN-CLARIAH WP3.1 presentation from DARIAH-FI workshop on November 9th, 2022.