Networking Editions: TEI and the Semantic Web with TAG and RDF

Authors: Chestnova, Elena / Kuczera, Andreas / Armbruster, Stefan / Landolt, Balduin

Date: Friday, 8 September 2023, 9:15am to 10:45am

Location: Main Campus, L 1 <campus:stage>


Making text and metadata of digital editions addressable and linkable is a task that several recent and current projects in the field of scholarly editing have have focused on (e.g. Skaldic poetry projects, the LEAF writer and other LINCS projects, Unlocking Digital Texts, and others). These projects have been exploring ways to connect editions data with other resources on the web and associated issues: re-usability and sustainability of editions, interoperability of text, collaboration across fields and disciplines, and development of federated tools. Discussions around these topics have led to the development of a range of approaches and this panel will give insight into two of these: implementation of TEI in Text-as-graph and incorporation of TEI in an RDF triple store. The panelists will present experiences from real applications in the context of edition projects and a national infrastructure for humanities data.

The panel has been assembled as part of an ongoing project “Semantic TEI” which explores possible pathways towards a semantic ontology to bridge between TEI and existing widely-used ontologies. This is to be developed with a view to emergent potential for future applications and research methods and based on the needs of the research community. Understanding these needs is the main motivation of this panel. Its purpose is to create an opportunity for members of the TEI community to learn about recent work on TEI in the context of TAG and RDF and to question and discuss this work.

1. Andreas Kuczera: TEI in Graph-based digital scholarly editions

TEI is a quasi standard to express text and its meaning in digital scholarly editions. The main questions in my statement are:

  • Does TEI need the structure of XML to keep the semantic and if yes are there examples for that ?

  • Are there ways to structure textual information beyond the approach of Microsoft Word (which works with Layout elements) or TEI ?

  • What could this new structure look like ?

One possible way to make further steps in these directions could be the connection between TEI semantics and graph based modeling approaches for textual information.

2. Stefan Armbruster: Conversion of TEI-XML to LPG and TAG

Given my long term background in graph data modeling I am specifically interested in the graph data model used for imported TEI data. The data model is driven by its usage patterns (aka queries) and its extensibility for future use cases.

Some of these additional use cases I can think of:

  • adding/querying provenance information

  • more data driven / machine accessible model

  • Versioning of LPG

  • Another interesting aspect is long-time archival of LPG and ensuring it’s still accessible decades in the future

  • Applying classic data science or graph data science on LPG model

3. Balduin Landolt: TEI as RDF: Opportunities, Limitations and Useability

TEI is normally thought of as an XML specification, however it can also be understood as a general vocabulary serving as a formal ontology independent of any serialization format.

Stand-off-based approaches to text encoding can address some of the issues following the hierarchical structure of XML, and especially lend themselves to graph representations of text. Such graphs can be defined by a TEI ontology.

At the Swiss National Data and Service Center for the Humanities (DaSCH) we have gathered experience in using RDF graphs to represent text with stand-off markup, and are planning to expand this technology to be fully TEI compatible in the future. This approach has shown promise in terms of searchability of markup and its linkage applying semantic web technologies. However, it has also shown problems, specifically with regard to its useability.

Based on this experience, I plan to outline, how a “TEI as RDF” solution might be designed, so that it reaps as many of the benefits of stand-off mark-up and Linked Open Data technologies as possible, while trying to mitigate its inherent drawbacks.

About the authors

Elena Chestnova (Università della Svizzera italiana), chair of the session, is a researcher with the Institute for History and Theory of Art and Architecture, DH lead for the Semper Edition and project lead for the Semantic TEI project.

Andreas Kuczera is professor for applied digital methodology in the humanities and social sciences at the university of applied science Gießen. He works on graph technologies in the digital humanities and graph based digital scholarly editions.

Stefan Armbruster is a freelancer providing consulting services in the field of graph databases based on his 10+ years of experience.

Balduin Landolt is a software engineer at DaSCH. He is currently planning a PhD project in Scandinavian Studies/Digital Humanities at the University of Basel.

Contribution Type