Project: FIN-CLARIAH
Grant agreement: Academy of Finland no. 345610
Start date: 01-01-2022
Duration: 24 months
WP 4.3: Report on Representative Twitter dataset(s) of user-generated texts and metadata
Date of reporting: 25-11-2023
Report author: Mikko Laitinen (University of Eastern Finland)
Contributors: Masoud Fatemi, Mehrdad Salimi, Paula Rautionaho (all from the University of Eastern Finland)
Deliverable location: https://nts-csc.rahtiapp.fi/ The URL is currently open for researchers, and we will add authentication to it in the spring of 2024.
The WP’s main objective was to develop a representative dataset of social media data from Twitter from the five Nordic countries. The underlying idea is that various social media applications offer a promising and extremely large source of data for a range of disciplines in social sciences and the humanities (SSH) today, but research activities are often hindered by the lack of technical knowledge in collecting, pre-processing and analysing very large datasets. During the funding period, we expanded the data collection substantially, when it because clear that the future of the data collection route became more and more uncertain. All the materials were collected during the period when the academic application programming interface of this social media platform was still open, and later on when the company changed its name to X, the API was closed down. In the hindsight, the decision to store large amounts of material from various geographic settings turned out to be a wise move, because this subproject has now saved 12.5 years of material for future research.
The project activities so far have consisted of two parts:
This WP has reached its objectives and succeeded in creating a national niche within the Finnish DH sphere. We have a good team that combines expertise from sociolinguistics and computer sciences, and we are able to develop digital tools for a range of audiences.
For 2024–2025, we aim at continuing the work, and adding a graphic interface for accessing network information and combining this network information with textual searches.