ODD ODDities: some problems with the ODD language unearthed by an attempt to write a processor

Authors: Bauman, Syd / Bermúdez Sabel, Helena / Maus, David / Holmes, Martin

Date: Thursday, 7 September 2023, 11:15am to 12:45pm

Location: Main Campus, L 1.201 <campus:egXML>

Abstract

Not surprisingly, in the course of attempting to write an ODD processor, the ATOP Task Force has come across several areas where the ODD language is confusingly underspecified. This panel is centered on the ODD language, or more precisely, some of its shortcomings.

Framing

The session starts with an ~10 minute introduction to the major background concepts that are the infrastructure for the rest of the session. This entails a high-level overview. I.e., mostly circles and arrows, not code. Topics covered include what ODD is, how the Guidelines are built, how customization works, and how customization chaining works.

Individual Issues

no duplicate definitions of the same attribute

The ODD language permits an ODD writer to define two different attributes with the same name on the same element (even though XML does not allow this). See https://github.com/TEIC/TEI/issues/2371

Schematron contexts

It is difficult to impossible for an ODD processor to determine the intended context for Schematron assertions that do not have a context explicitly indicated. (In ODD contexts may be specified using the @context of <sch:rule>, but the <sch:rule> is optional.)

tagdoc elements allowed most anywhere

Currently the model.oddDecl class is a member of the model.inter class, meaning that, e.g., a <classSpec> can go inside a <rdg>, a <head>, or an <orig>. Should a processor gather up all tagdoc elements from anywhere in the customization file and assemble them, or should it only process those inside a <schemaSpec>? See https://github.com/TEIC/TEI/issues/2306.

Schematron query language binding

ODD provides no mechanism for an ODD writer to specify the query language binding. Currently the binding is generated by the software that processes the ODD. We think this is inappropriate: the ODD should specify the query language binding. See https://github.com/TEIC/TEI/issues/2330.

@generate and @expand of classes

The <classSpec> element has a @generate attribute which constrains how the members of the class may be combined; the <classRef> element has an @expand attribute which asserts how the members of the class should be combined for the given reference. The Guidelines do not:

  1. explain what either of these attributes means for an attribute class.
  2. define in what order elements should occur when a sequence is generated.
  3. explicitly require that the value of @expand refers to a pattern that was generated by a @generate.
  4. explain what should happen when a model class is a member of a model class.

See https://github.com/TEIC/TEI/issues/2369.

<interleave> (or <bag> or <verschachteln>?)

To express “all of these elements should be present, but order is unimportant” in a content model, an ODD writer needs to use <sequence preserveOrder='false'>. This is, at best, a counter-intuitive way to do this. See https://github.com/TEIC/TEI/issues/2154.

@require vs @except

The <anyElement> element has two attributes for controlling which elements may be represented: @require and @except. Either or both may be used; either may have one or more namespace URIs in its value (@except also allows QNames of elements). But the Guidelines make no mention of what a processor is to do if the same namespace URI is on both @except and @require. See https://github.com/TEIC/TEI/issues/2369.

About the authors

Syd Bauman is chair of the ATOP Task Force. He served as the N. American Editor of the TEI from 2001 to 2008, during which time the current ODD language was developed; thus he takes partial responsibility for some of the problems discussed herein. He has served as a member of the TEI Council since 2013. ORCID: 0000-0003-3288-443X Affiliation: Northeastern University Digital Scholarship Group

Helena Bermúdez Sabel is a Digital Humanities researcher and a software developer at JinnTec. She has been the technical lead of several corpus linguistics and digital editing projects. She has served as a member of the TEI Technical Council since 2021. ORCID: 0000-0002-8627-1367 Affiliation: JinnTec

David Maus works as head of research and development at the State and University Library Hamburg. He acts as liason to digital humanities projects at Hamburg University and other higher education institutions. He is author of SchXslt, a modern implementation of ISO Schematron. ORCiD: 0000-0001-9292-5673 Affiliation: State and University Library Hamburg

Martin Holmes is a programmer/consultant at the University of Victoria Humanities Computing and Media Centre. He is the lead programmer on several large digital edition projects including the Map of Early Modern London, and has served on the TEI Council and as Managing Editor of jTEI. ORCID: 0000-0002-3944-111 Affiliation: University of Victoria

Contribution Type

Keywords