Skip to main content

Blog

Get to Know Our Research Projects – EnrichEuropeana+

Submitted on 20th July 2023

Painting of a woman from the 19th century writing a letter

The Digital Repository of Ireland (DRI) is a trusted digital repository (TDR), which provides reliable long-term preservation and access to Ireland’s humanities, cultural heritage, and social sciences digital data. DRI is also a research performing organisation engaged in a rich range of digital preservation-related research projects. In this new blog series, we invite you to get to know our research projects!

In the first blog in the ‘Get to Know Our Research Projects’ series, DRI’s Operations and Communications Manager Dr Áine Madden interviews DRI Senior Software Engineer Dr Kathryn Cassidy about DRI’s involvement in the EnrichEuropeana+ project (2021-2023), an exciting initiative that aimed to combine citizen science and artificial intelligence to unlock handwritten documents from the 19th century and make them available to researchers, students, amateur historians, and the public.

AM: Hi Kathryn, thank you for meeting with me today. Can you tell us a little bit about the EnrichEuropeana+ project?

KC: Hi Áine, thanks for giving me the opportunity to talk about the project. EnrichEuropeana+ is a project that has just recently finished which ran from 2021 until 2023. It built upon the results of a previous project EnrichEuropeana (2018-2020) and aimed to enhance the Transcribathon crowd-sourcing transcription platform created by the German company Facts&Files.

AM: What was DRI’s role in the project?

KC: Our role was two-fold. As a National Aggregator we were responsible for providing Irish content from the 19th century to Europeana. Over the course of the project we aggregated 20 collections from ten DRI members, coming to over 15,000 objects in total.

The second aspect of our role was to explore ways to integrate the Transcribathon platform with our Repository platform, and showcase the benefits of this. In this we were joined by the Polish National Aggregator Federacja Bibliotek Cyfrowych (FBC). Both aggregators implemented workflows within our respective platforms to retrieve transcriptions and metadata enrichments from the Transcribathon platform using Transcribathon’s Application Programming Interface (API). At DRI we developed a workflow to allow the collection owner to approve or reject these enrichments and add them to the metadata of the objects, giving us richer descriptive metadata. We hope to continue to enhance this functionality so that we can offer it as a service for other DRI members with handwritten content in DRI.

AM: As part of the project, DRI collaborated with our colleagues Dublin City Library and Archive (DCLA) on ‘Transcription Week’, a crowdsourcing campaign taking place from 28 March to 1 April 2022 to transcribe and enrich handwritten documents and maps. Can you tell us about the aims and outputs of this campaign?

KC: This was a very exciting part of the project. DCLA holds a large collection of historical maps of Dublin, as well as collections related to the development of Dublin city, such as the Wide Streets Commission minutes and Jury Books and Dublin City Council Minute Books. There’s such a wealth of information in these materials, but before this project not all of them had been digitised and they weren’t all available online. As handwritten documents, the information in them is also hard to access. Transcribathon gave us an opportunity to transcribe and annotate a large part of these collections.

Transcription Week was originally planned as a week-long event where members of the public, school students and other volunteers could drop in in person to DCLA and join in the crowd-sourced transcription effort. As it happened, the COVID pandemic meant that we had to move the campaign online, but we ran a series of ‘Transcribe along with us’ sessions on Zoom where people could ask questions and discuss the best ways to transcribe these documents and maps. We got a lot of interest with many volunteers continuing to work on the collections after Transcription Week. We now have almost 400,000 pages at least partially transcribed, which is an amazing result! We also have a lot of metadata enrichments that users added. These include adding locations and dates, and tagging the pages with the names of people mentioned in them.

Screenshot of a handwritten object on the Transcribathon platform

Image: Screenshot of an object on the Transcribathon Platform

‘We now have almost 400,000 pages at least partially transcribed, which is an amazing result! We also have a lot of metadata enrichments that users added. These include adding locations and dates, and tagging the pages with the names of people mentioned in them’.

Kathryn Cassidy, DRI Senior Software Engineer

AM: This project harnessed the power of ‘citizen science’ by inviting volunteers to transcribe and annotate digitised handwritten historical documents on the Transcribathon crowdsourcing platform. What role did handwriting recognition technology play in enhancing the Transcribathon platform during the EnrichEuropeana+ project?

KC: Yes, that was another area that the project explored. Facts&Files worked with other project partners including Read-Coop SCE who develop Transkribus, which is a AI-powered platform for handwritten text recognition and transcription, to integrate this technology into the Transcribathon platform.

In parallel with the crowd-sourcing activities, we tested using Artificial Intelligence to automate the transcription process. We were able to compare the results with the human-generated transcriptions, in some cases the results are very similar quality. We hope that in future crowd-sourcing campaigns we can start by performing an automated transcription and then get the volunteers to help to verify and correct these, which has the potential to speed up the whole transcription process.

AM: Thanks for your time today, Kathryn. Is there anything else that you would like to tell us about the project?

KC: As I said, we are continuing to develop the DRI and Transcribathon integration and hope to continue to offer Transcribathon services to our members. We’d be very happy to talk to anyone who has handwritten collections and is interested in using Transcribathon.

Find out more about DRI’s research projects on our projects page.

To find out more about DRI member benefits, please contact DRI’s Membership Manager Dr Maeve O’Brien at M.OBrien@ria.ie


DRI is funded by the Department of Further and Higher Education, Research, Innovation and Science (DFHERIS) via the Higher Education Authority (HEA) and the Irish Research Council (IRC).

Higher Education Authority Logo
Irish Research Council Logo
Core Trust Seal Logo
Digital Preservation Awards 2022 Winners Ribbon Logo

Subscribe to our newsletter and stay updated