<< List of all deliverables

FIN-CLARIAH D4.3.2: Statistical overviews and bias detection

Project: FIN-CLARIAH
Grant agreement: Academy of Finland no. 345610
Start date: 01-01-2022
Duration: 24 months

WP 4.3: Report on Statistical overviews and bias detection
Date of reporting: 26-05-2023

Report author: Ville Vaara (University of Helsinki)
Contributors: Eetu Mäkelä (University of Helsinki)
Deliverable location: https://github.com/hsci-r/elasticsearch-openshift

Description

During Q1 and Q2 of 2023 we have been testing Elasticsearch as a tool to replace the current prototype, as described in deliverable 4.3.1. Overall we are satisfied with the features offered by Elasticsearch, as it offers most of the features that we deem critical out of the box.

Certain features, such as support for more complex linguistic data will possibly require developing additional extensions to the standard Elasticsearch solution. The platform offers a way to support such extension development through a plugin system, which will make such custom solutions relatively painless to maintain.

In addition to Elasticsearch, we have started testing an analysis tool called Kibana for statistical overviews and bias detection. Kibana has been developed to integrate with Elasticsearch as part of the “Elastic stack” (https://www.elastic.co/elastic-stack/). The original plan was to develop a number of standard analysis and overview web tools that would utilize the Elastic API, but the features offered by Kibana have turned out to be so extensive that we currently judge it to be more than sufficient as a solution. Kibana offers features to build statistical overviews with an easy to use graphical interface, and comprehensive user and access management.

The templates to deploy both Elasticsearch and Kibana in an Openshift environment (CSC’s Rahti in this specific case) are available at https://github.com/hsci-r/elasticsearch-openshift. This way of deploying the applications makes copying and modifying the solution very quick, easy and comparatively hands free.

We will continue exploring and developing the Elasticsearch + Kibana prototype we have taken into use, to further mature the prototype.

Search the Language Bank Portal:
Harri Uusitalo
Researcher of the Month: Harri Uusitalo

 

Upcoming events


Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information