Regards from the Transkribus User Conference in Vienna

Researchers at the University of Innsbruck have developed an application that can read the most illegible scrawl and convert it to Unicode. Naturally it also converts text that is easier to read, including printed text.

The core function of the software is called HTR, which stands for Handwritten Text Recognition. This application is the most prominent result of the EU-funded READ project (Recognition and Enrichment of Archival Documents), which was launched in 2014. There are 14 member organizations from seven EU countries and Switzerland contributing to the project. The Finnish participant is the National Archives of Finland. Other Finnish contributors include the Society of Swedish Literature in Finland, the Finnish Literature Society (SKS) and the Folklife Archives, who converted some of their archival material into text within the project.

I myself became aware of the HTR application only this year, and also attended a user conference in Vienna, even if I am still a beginner as a HRT user. The conference offered one inspiring presentation after another about the use of the application to convert documents found in the vaults of European archives into digital text.

The image shows how the philosopher Foucault’s notes convert into digital text.

Image 1. From Marie-Laure Massot’s PPT presentation “Papers of French philosopher Michel Foucalt”

In short, the idea is that a human transcriber first writes about 7,000 words, and the application takes care of the rest. It models itself and learns from the text transcribed by the human expert. The error ratio of the final text naturally varies but can be astonishingly low.

The bottleneck we still have is the amount of scanned material there is. The READ project has come up with a scanning solution to help overcome that problem. They have devised a tent in which the book or pile of papers to scan is put. The roof of the tent has an opening through with a smartphone can take a photograph every time it detects there is no movement, i.e. when the page has been turned completely. So all you need is a phone with the application (called DocScan) that detects motion and no motion, and you can start turning pages!

Image 2. ScanTent. Image: Mari Siiroinen

The Transkribus application is free of charge for the time being, at least until the end of 2019. After that, in 2020 there may already be a charge, so it is a good idea to start converting your hand-written texts now.

I believe the application has the potential to revolutionize historical research but it is useful for converting any old material into machine-readable format and distributing it online for open use. I myself intend to use it for converting paper-format linguistic material into digitally readable format.

Mari Siiroinen

The application is downloadable for free here

For additional information about the project, see:

For material about the conference in November 2018, see: