A (cautionary) tale of two texts

Authors: Thieberger, Nick / Turnbull, Robert / Russo-Batterham, Daniel / Lang, Birgit

Date: Friday, 8 September 2023, 11:15am to 12:45pm

Location: Main Campus, L 2.202 <campus:measure>


The Text Encoding Initiative (TEI) community is replete with wonderful encoded documents and images of manuscripts. Given the number of such projects and the effort that has gone into the TEI itself, there are many frameworks, tools, and workflows that can be used to meaningfully encode and digitally present manuscript sources. In this presentation, we reflect on our experiences working on two contrasting manuscripts in an institutional environment where TEI has little uptake. In particular, we explore some of the challenges and tradeoffs we encountered creating digital editions with only limited institutional support for sustainable Digital Humanities research software infrastructure and training.

We are humanities technologists (DR, RT) and humanities researchers (BL, NT) at the University of Melbourne in Australia. The first manuscript we worked with is a handwritten German text (BL) of some 100,000 words in 391 pages, to which we added a transcript, notes, facsimiles, and a translation. We used the Text Encoding Initiative to richly capture a German and English version of this manuscript, encoding in the TEI people, places, bibliographical references, and fictional characters. This was published online using TEI Publisher Web Components and required the team (DR, RT) to create a virtual machine, build a Django interface, provision a IIIF server, provision storage for images, and maintain the site over time, all of which incurs significant technical debt and requires specialised skills.

The second manuscript (NT) comprises 8,000 words of ethnographic notes from Vanuatu in 1914. Multiple versions of the same text were combined to make a single diplomatic edition that allows a reader to follow the content in a way that was not possible with the original documents. An HTML version built by NT presents the text and images of the manuscript originals, sometimes up to eleven different page images corresponding to the same text, with decisions required to arrive at a consensus document. It is housed on a site controlled by NT, and is picked up by the Internet Archive, as it is a simple HTML page. It requires no maintenance and has no dependencies, and NT was able to build the site himself.

We will discuss some of the following questions based on our experience at the University of Melbourne. How can we scope projects, understand the workload implications, and manage the expectations of academics who become excited after seeing completed TEI projects and want to apply the technology in their work? How can non-technical Humanities scholars use these technologies, and how can they ensure longevity of their work beyond their interaction with technical colleagues? What kind of ongoing support is required to keep a site like this going? This includes basic necessities like a domain name and site, but also encompasses the dependencies of the selected tools.

While some institutions have TEI support services that can guarantee ongoing access to encoded texts, what is the best strategy for an academic who does not have access to local TEI support services that can guarantee ongoing access to encoded texts?

Contribution Type