Building Workflows for HTR to TEI Up-Conversion and Enhancement

Authors: Cummings, James / Jakacki, Diane / Healey, Alexandra / Pirmann, Carrie / Johnson, Ian / Jeffrey, Evie / Flex, Valentina

Date: Wednesday, 6 September 2023, 9:15am to 10:45am

Location: Main Campus, L 2.202 <campus:measure>


At the TEI2022 conference, we introduced the Evolving Hands project which was undertaking three case studies ranging across document forms to demonstrate how HTR-based workflows to develop TEI resources can be iteratively incorporated into curation. The case studies cover a wide range of textual forms, from handwritten work from the 19th-20th century UNESCO-recognised Gertrude Bell Archive of letters and diaries, or 20th century artistically-printed feminist zines, to highly structured and scholarly edited legacy printed material of the Records of Early English Drama (REED) project. The printed material used as samples are all in categories where using traditional OCR would result in a substantial loss of intellectual content. By covering a wide variety of periods and document forms the project wants to explore how best cultural institutions might use HTR to convert their collections into TEI files and then further approaches for up-conversion and enhancement of these into data-rich TEI editions.

Across the varying case studies different approaches have been used, such as those who attempted to do as much tagging as possible in Transkribus’ online app, those who used LEAF-Writer to add enhancements liked named entity LOD, or those who processed the output with XSLT to preserve or enhance structured information before editing it further.

This short paper will present some of the interim results of developing these workflows for converting HTR-processed text into TEI P5 XML and its further enhancement. Our goal in doing so is to pass on some of the insights and lessons learned during the development phase of the project, as well as to provide a starting point for further discussions and future collaborations in the field of HTR-based workflows for TEI curation. Ultimately, the project hopes to contribute to the ongoing development of best practices for using HTR for the creation of TEI editions.

About the authors

James Cummings is the Reader for Digital Textual Studies and Late Medieval Literature for the School of English Literature, Language, and Linguistics at Newcastle University. He is the Newcastle PI for the Evolving Hands Project and on the TEI Board of Directors.

Diane Jakacki is Digital Scholarship Coordinator and Associated Faculty in Comparative & Digital Humanities at Bucknell University. She is PI of the Mellon-funded LAB Cooperative, Bucknell PI of the Evolving Hands project, and the LEAF-VRE project. She is chair of the TEI-C and Chair-Elect of ADHO.

Ian Johnson is Head of Special Collections & Archives at Newcastle University Library. This includes interdisciplinary digital scholarship and co-curation of our UNESCO Gertrude Bell Archive. He is co-I for the Evolving Hands project.

Carrie Pirmann is the social sciences librarian at Bucknell University who is optimising existing Transkribus models for the Bucknell Case Study and working with the digital libraries community.

Alexandra Healey is a Project Archivist in Newcastle University Special Collections & Archives. She is coordinating the use of HTR and TEI within the Newcastle team as part of the Evolving Hands project.

Valentina Flex is the Stillman Project Archivist working on Gertrude Bell Archive: Bell and the Kingdom of Iraq at 100.

Evie Jeffrey is the postgraduate assistant for the project and started working on the project as a Robinson Bequest Bursary-holder.

Contribution Type