Travelling Humboldt—Data on the Move

Authors: Dumont, Stefan / Kraft, Tobias / Seifert, Sabine / Thomas, Christian / Wierzoch, Jan

Date: Wednesday, 6 September 2023, 9:15am to 10:45am

Location: Main Campus, L 1.202 <campus:note>

The long-term Academy project Travelling Humboldt–Science on the Move, based at the Berlin-Brandenburg Academy of Sciences and Humanities in Germany, publishes the American, Russian-Siberian and European Travel Journals of the Prussian naturalist and explorer Alexander von Humboldt (1769–1859)1. The journals are accompanied by thematically related letters from his world-spanning correspondence network as well as manuscripts from his vast legacy collection, many of which have not been published before.

Six years after the first presentation at the TEI Conference and Members’ Meeting 2016 in Vienna (Dumont et al. 2016), we have delivered eight subsequent versions of this digital, documentary edition2. The TEI/XML subset3 of the ehd has been developed by adopting established TEI subsets, e.g. the German Text Archive’s Base Format for Manuscripts (BBAW 2022; Thomas / Haaf 2017) and the CMIF format for correspondence metadata (TEI Correspondence SIG 2018), to ensure the highest possible degree of standardisation, re-usability, and interoperability of the data. The comprehensive transcription and encoding guidelines illustrate the specifications of the ehd’s TEI format.4 With the publication of the first volumes of its print component with the Springer Nature/J. B. Metzler publishing house5, the project’s hybrid strategy has been realised. With an effective single source approach, both the digital and the print component (book, PDF and eBook derivatives) completely rely on the same TEI/XML encoded data. The publication strategy is ‘digital first’, and the text-critical documentary edition can be fully accessed open access under a Creative Commons license.6

Following version 8 of the ehd (published in May 2022)7, we now fulfil the promise to deliver Humboldt’s complex handwritten and hard-to-decipher texts as data by making available (1) the annotated text transcriptions of more than 500 documents (ca. 2.800 pages), (2) the comprehensive Alexander von Humboldt Chronology with about 1.600 individual, dated statements from Humboldt’s almost 90-year lifespan, and (3) about 18.000 index entries (e.g. persons, places, institutions, bibliographic items) (Ette et al. 2022). All datasets are in TEI/XML format. The transcriptions and indices have been enriched using authority file information wherever available. Especially the index of persons shows great potential for re-use, since several historical (and mythological) persons in Humboldt’s texts have not yet been documented in the most important authority file for the German-speaking research community, the GND of the German National Library8. Recently, we handed over a first batch of 50+ person index entries to the library which have already been integrated into the GND, with the ehd named as the authoritative source.9 This supplementary data can help enhance the community’s authority data within the GND, Wikidata and other international portals.

The presentation focuses on this digital data publication, and illustrates our approach between re-using existing data and best practices on the one hand, and giving back to the wider (TEI) community on the other hand. For example, the adopted formats and specifications that were enhanced to fulfil the document-specific encoding needs of Humboldt’s writings can now be adopted by other projects. The ehd has also been active in discussions of the TEI recommendations, proposing specific interpretations or alterations of the guidelines. We demonstrate some of these developments with an emphasis on the most generally relevant aspects.10


Annisius, Marie, Barbara Fischer, and Alexander Steckel. 2022. ‘Ein Baum in Der GND - Gemeinsame Normdatei’.

Berlin-Brandenburgische Akademie der Wissenschaften (BBAW) (Ed.). 2022. DTABf. Deutsches Textarchiv – Basisformat.

Kraft, Tobias; Dumont, Stefan. 2020.: “The Humboldt Code. On creating a hybrid digital scholarly edition of a 19th century globetrotter.” In: Wiener Digitale Revue 1 (2020). DOI: (all URLs in this abstract last accessed 2022-06-20).

Dumont, Stefan, with Susanne Haaf, Tobias Kraft, Alexander Czmiel, Christian Thomas, Matthias Boenig. 2016. “Applying Standard Formats and Tools: ‘Alexander von Humboldt auf Reisen’ as an Example for the Collective Subsequent Use of DTABf and ediarum”. Presented at the TEI Conference and Members’ Meeting 2016, Vienna, 2016, September 26–30. Abstract (PDF):, pp. 69–70.

Ette, Ottmar, Stefan Dumont, Annika Geiser, Carmen Götz, Tobias Kraft, Ulrike Leitner, Ulrich Päßler, Florian Schnee, and Christian Thomas. 2022. ‘TEI-XML-Datenset der Tagebücher, Briefe, Dokumente, Forschungsbeiträge, Chronologieeinträge und Register der edition humboldt digital’. [Datensatz] GitHub.

Fischer, Barbara. 2022. ‘And the beat goes on - Die Zusammenarbeit von GND und Text+’. GND Confluence (blog). 6 September 2022.

Götz, Carmen, ed. 2022. Alexander von Humboldt: Tagebücher der Amerikanischen Reise: Von Spanien nach Cumaná. Vol. 1. edition humboldt print, I. Berlin, Heidelberg: Springer.

Päßler, Ulrich, and Ottmar Ette, eds. 2020. Alexander von Humboldt: Geographie der Pflanzen: Unveröffentlichte Schriften aus dem Nachlass. Vol. 1. edition humboldt print, III. Stuttgart: J.B. Metzler.

Pierazzo, Elena: “A Rationale of Digital Documentary editions”. In: Literary and Linguistic Computing, 26,4 (2011), pp. 463–477. DOI:

Sahle, Patrick: “What is a Scholarly Digital Edition?” In: Digital Scholarly Editing: Theories and Practices. Edited by Matthew James Driscoll and Elena Pierazzo, Cambridge, UK: Open Book Publishers, 2016.

TEI Correspondence SIG. 2018. Correspondence Metadata Interchange Format (CMIF).

Thomas, Christian; Haaf, Susanne. 2017. “Enabling the Encoding of Manuscripts within the DTABf: Extension and Modularization of the Format”. In: Journal of the Text Encoding Initiative [Online], Issue 10. DOI:

About the authors

Stefan Dumont has been a researcher at TELOTA/BBAW since 2011. He is also coordinator of the DFG project correspSearch, co-leader of the DFG project The German Letter in 18th Century and co-convener of the TEI SIG Correspondence. His research focuses on digital editions of correspondence.

Tobias Kraft works as a research coordinator at the BBAW. Together with his team, he creates the edition humboldt, a digital and printed edition of Alexander von Humboldt’s travel manuscripts and personal papers. Since 2019, he is also the director of the Proyecto Humboldt Digital (ProHD).

Sabine Seifert works as a research associate in the project A. v. Humboldt—Science on the Move (BBAW) and at the Theodor Fontane Archive. She is co-convener of the TEI SIG Correspondence and TEI council member. Main interests are digital editions, manuscripts, history of humanities and sciences.

Christian Thomas has been working in the German Text Archive and the infrastructure project CLARIN. Currently he is focussing on hybrid scholarly editions of A. v. Humboldt’s manuscripts & travel journals (BBAW), as well as Johann Wolfgang von Goethe’s letters & diaries (Klassik Stiftung Weimar).

Jan Wierzoch has been a researcher at TELOTA/BBAW since 2020. He studied History as well as Library and Information Science. At TELOTA, he is involved in the development of several digital editions, with a focus on visualisations.


  1. Project description:; digital edition:; cf. Kraft/Dumont 2020. 

  2. On Digital Scholarly Editions see Sahle 2016; esp. on the concept of ‘documentary editions’ cf. Pierrazo 2011. 

  3. The ODD of the ehd (“chained” from DTABf and others) will be published in summer 2023. 


  5. Book series edition humboldt print, Päßler / Ette 2020; Götz 2022. 

  6. CC-BY-SA 4.0 ( for the TEI/XML; CC-0 for indices and metadata. 

  7. Cf. the overview at; API:; TEI/XML (current version of the ehd)


  9. Cf., for instance, the entry on William Thomson, In the future, we plan to work together with GND even more efficiently via the Text+ GND Agency of the National Research Data Infrastructure/Text+, cf. Annisius, Fischer, Steckel 2022; Fischer 2022. 

  10. E.g. TEIC/TEI Issue #2028 “@calendar should allow multiple values”,, following the discussion on the TEI mailing list in August 2020, leading to a corresponding implementation in the TEI P5 Guidelines v. 4.3.0 (2021-08-31). 

Contribution Type