Valid, but readable: writing D2d for constructing XML

Authors: Lepper, Markus / Trancón y Widemann, Baltasar

Date: Thursday, 7 September 2023, 2:15pm to 3:45pm

Location: Main Campus, L 1.202 <campus:note>

Abstract

D2d is a text format definition which enables technical authors, journalists, scholars, poets, essayists, novelists, and other non-it-experts, to notate a structured text in the creative flow, without intervening disruptive technical actions, or even without any technical devices at all. Nevertheless the notated text can represent an XML encoded document valid w.r.t. a particular document type. (Lepper et.al 2001) (Lepper-Trancón 2016, 2019)

D2d employs only two redefinable escape characters, for commands and comments. Therefore any d2d encoded document is easily readable and writable by computers and humans. It can be notated, with the same level of precision, using arbitrary text editors, pen and pencil, chalk on a blackboard. It can be communicated by speech, sign language and stenography. Its input format is inspired by m4 ( Kernighan-Ritchie 1977), LaTeX (Lamport 1986), and Lout (Kingston 1992).

With its compiler implementation, efficient and comfortable work-flow has been established in very different areas of application, like musicology (Lepper 2015), interactive HTML pages, accounting, relational data base input, technical configuration files, etc. (Lepper-Trancón 2016)

Nearly-WYSIWYG editors like Oxygen (Oxygen 2023) are excellent tools when one particular document type shall be edited with one particular rendering, and thus the XML as such can be made (almost) transparent to the domain expert. They perform less when the XML structure is itself the subject of editing, like in most contexts of TEI. Naturally, they do not cover the other media listed above. XML tags, escapes, entities, etc. have been designed for data transmission. Typing them manually requires either frequent technical intervention of the editing software or many redundant key strokes. Both can disturb or even destroy the creative flow of writing.

The d2d compiler is an alternative, which applies to plain text files advanced techniques on two levels of parsing: The upper uses explicit tagging. Nearly all closing tags and some opening tags are unambiguously inferred. The lower level infers all tags from character-oriented parser definitions, suitable for small complex data like embedded mathematical or chemical formulas, calendric dates, sigla, abbreviations, etc. Both levels directly translate to valid XML.

In the discussion of Diachronic Markup, Barney (2018) gives as an example a list of change activities of the poet, in a sequential order proposed by the scholar. In canonical XML encoding this looks like …

Listing 1
Listing 1

An equivalent d2d file is …

Listing 2
Listing 2

The author’s “For each <change> element I then added a <desc> containing human-readable prose” is made obsolete by mark-up which is immediately both: readable by humans and by computer.

In that article follows a rendering of the poem with all corrections. In d2d this can look like …

Listing 3
Listing 3

The slimness of d2d tags allows to arrange mark-up geometrically similar to the manuscript, to reflect deeply nested editing:

Listing 4
Listing 4

Indentation is for humans only. The computer executes normal left-to-right parsing; corrections appearing “above” in the manuscripts are now “below”. Nevertheless the readability is much better than genuine XML encoding, into which the data will be compiled for further processing:

Listing 5
Listing 5

(Both approaches currently lack a formal definition of their semantics, which must precede any comparison!)

In the first two d2d examples, the few lines at the beginning give the complete necessary document type definition: Both files are totally self-contained.

Both parsing levels employ advanced techniques for generating error messages to guide domain experts (non-computer-language-experts) through the intricacies of grammar definitions, and can accept partially invalid input for rapid prototyping, generating output with embedded error messages. (Lepper-Trancón 2011)

The compiler delivers standard XML representations: file content or SAX event stream. It can run directly on a DTD, or can use its proprietary type definition format. This employs rewriting for parametrization, a new technique allowing light-weight derivations of new document types. (Lepper-Trancón 2018)

A special mode supports XSLT program code for a given XML backend format, including approximate, light-weight type checking. (Lepper-Trancón 2015)

Bibliography

Barney, Brett (2018) TEI, the Walt Whitman Archive, and the Test of Time., in: Journal of the Text Encoding Initiative, 13, Selected Papers from the 2018 TEI Conference. https://doi.org/10.4000/jtei.3249

Kernighan, Brian W. and Ritchie, Dennis M. (1977) The M4 macro processor. Technical report, Bell Laboratories, Murray Hill (NJ).

Kingston, Jeffrey H. (1992) The Design and Implementation of the Lout Document Formatting Language, in: Software—Practice & Experience 23 (9). http://citeseer.ist.psu.edu/kingston93design.html

Lepper, Markus and Trancón y Widemann, Baltasar and Wieland, Jacob (2001) Minimize Mark-Up ! - Natural Writing Should Guide the Design of Textual Modeling Frontends, in: Conceptual Modeling – ER2001, LNCS, Vol. 2224, Springer, Berlin, November 2001.

Lepper, Markus and Trancón y Widemann, Baltasar (2011) D2d — a Robust Front-End for Prototyping, Authoring and Maintaining XML Encoded Documents by Domain Experts, in: Proceedings of KEOD 2011 International Conference on Knowledge Engineering and Ontology Design, SciTePress, Portugal, 2011, p. 449-456.

Lepper, Markus and Trancón y Widemann, Baltasar (2015) A Simple and Efficient Step Towards Type-Correct XSLT Transformations, in: Proceedings 26th International Conference on Rewriting Techniques and Applications (RTA 2015), LIPICS, Vol. 36, Dagstuhl Publishing, Saarbrücken/Wadern, 2015, p. 350-364.

Lepper, Markus (2015) Gustav Mahler, Dritte Sinfonie, Erste Abtheilung. Eine Annäherung http://senzatempo.de/mahler/gmahler_sinf3_satz1.html

Lepper, Markus and Trancón y Widemann, Baltasar (2016) D2d - Kreatives Schreiben von XML-codierten Texten, in: Informatik 2016 (GI Jahrestagung), ed. by C. Mayr, Martin Pinzge, p. 1935-1940.

Lepper, Markus and Trancón y Widemann, Baltasar (2018) Rewriting for Parametrization, in: Tagungsband des 35ten Jahrestreffens der GI-Fachgruppe “Programmiersprachen und Rechenkonzepte”, IFI Reports, University of Oslo, Oslo, 2018.

Lepper, Markus and Trancón y Widemann, Baltasar (2019) D2d – XML for Authors, in: Tagungsband des 36ten Jahrestreffens der GI-Fachgruppe “Programmiersprachen und Rechenkonzepte”, IFI Reports, University of Oslo, Oslo, 2019.

Lamport, Leslie (1986) LaTeX User’s Guide and Document Reference Manual, Addison-Wesley, Reading (MA).

Oxygen (2023) Offical oXygen website: https://www.oxygenxml.com

About the authors

Markus Lepper, composer, music theorist, and computer scientist. He holds a PhD each in computer science and musicology. Works and lives in Berlin and is co-founder of semantics gGmbH Berlin.

Baltasar Trancón Widemann holds a PhD in computer science from TU Berlin and a Habilitation degree from the University of Bayreuth. He has worked as a researcher in academia and as an industrial software engineer. He is currently professor of programming at Nordakademie Elmshorn, and co-founder of semantics gGmbH Berlin.

Contribution Type

Keywords