ODD ODDities: some problems with the ODD language unearthed by an attempt to write a processor
Authors: Bauman, Syd / Bermúdez Sabel, Helena / Maus, David / Holmes, Martin
Date: Thursday, 7 September 2023, 11:15am to 12:45pm
Location: Main Campus, L 1.201 <campus:egXML>
Abstract
Not surprisingly, in the course of attempting to write an ODD processor, the ATOP Task Force has come across several areas where the ODD language is confusingly underspecified. This panel is centered on the ODD language, or more precisely, some of its shortcomings.
Framing
The session starts with an ~10 minute introduction to the major background concepts that are the infrastructure for the rest of the session. This entails a high-level overview. I.e., mostly circles and arrows, not code. Topics covered include what ODD is, how the Guidelines are built, how customization works, and how customization chaining works.
Individual Issues
no duplicate definitions of the same attribute
The ODD language permits an ODD writer to define two different attributes with the same name on the same element (even though XML does not allow this). See https://github.com/TEIC/TEI/issues/2371
Schematron contexts
It is difficult to impossible for an ODD processor to determine the intended context for Schematron assertions that do not have a context explicitly indicated. (In ODD contexts may be specified using the @context
of <sch:rule>
, but the <sch:rule>
is optional.)
tagdoc elements allowed most anywhere
Currently the model.oddDecl
class is a member of the model.inter
class, meaning that, e.g., a <classSpec>
can go inside a <rdg>
, a <head>
, or an <orig>
. Should a processor gather up all tagdoc elements from anywhere in the customization file and assemble them, or should it only process those inside a <schemaSpec>
? See https://github.com/TEIC/TEI/issues/2306.
Schematron query language binding
ODD provides no mechanism for an ODD writer to specify the query language binding. Currently the binding is generated by the software that processes the ODD. We think this is inappropriate: the ODD should specify the query language binding. See https://github.com/TEIC/TEI/issues/2330.
@generate
and @expand
of classes
The <classSpec>
element has a @generate
attribute which constrains how the members of the class may be combined; the <classRef>
element has an @expand
attribute which asserts how the members of the class should be combined for the given reference. The Guidelines do not:
- explain what either of these attributes means for an attribute class.
- define in what order elements should occur when a sequence is generated.
- explicitly require that the value of @expand refers to a pattern that was generated by a
@generate
. - explain what should happen when a model class is a member of a model class.
See https://github.com/TEIC/TEI/issues/2369.
<interleave>
(or <bag>
or <verschachteln>
?)
To express “all of these elements should be present, but order is unimportant” in a content model, an ODD writer needs to use <sequence preserveOrder='false'>
. This is, at best, a counter-intuitive way to do this. See https://github.com/TEIC/TEI/issues/2154.
@require
vs @except
The <anyElement>
element has two attributes for controlling which elements may be represented: @require
and @except
. Either or both may be used; either may have one or more namespace URIs in its value (@except
also allows QNames of elements). But the Guidelines make no mention of what a processor is to do if the same namespace URI is on both @except
and @require
. See https://github.com/TEIC/TEI/issues/2369.
About the authors
Syd Bauman is chair of the ATOP Task Force. He served as the N. American Editor of the TEI from 2001 to 2008, during which time the current ODD language was developed; thus he takes partial responsibility for some of the problems discussed herein. He has served as a member of the TEI Council since 2013. ORCID: 0000-0003-3288-443X Affiliation: Northeastern University Digital Scholarship Group
Helena Bermúdez Sabel is a Digital Humanities researcher and a software developer at JinnTec. She has been the technical lead of several corpus linguistics and digital editing projects. She has served as a member of the TEI Technical Council since 2021. ORCID: 0000-0002-8627-1367 Affiliation: JinnTec
David Maus works as head of research and development at the State and University Library Hamburg. He acts as liason to digital humanities projects at Hamburg University and other higher education institutions. He is author of SchXslt, a modern implementation of ISO Schematron. ORCiD: 0000-0001-9292-5673 Affiliation: State and University Library Hamburg
Martin Holmes is a programmer/consultant at the University of Victoria Humanities Computing and Media Centre. He is the lead programmer on several large digital edition projects including the Map of Early Modern London, and has served on the TEI Council and as Managing Editor of jTEI. ORCID: 0000-0002-3944-111 Affiliation: University of Victoria
Contribution Type
Keywords