Towards correspSearch v3.0

Authors: Dumont, Stefan / Grabsch, Sascha / Müller-Laackman, Jonas / Sander, Ruth / Sobkowski, Steven

Date: Thursday, 7 September 2023, 2:15pm to 3:45pm

Location: Main Campus, L 1.201 <campus:egXML>

The web service correspSearch aggregates correspondence metadata from digital and printed editions (or scholarly catalogues of correspondence) and offers them for central search and retrieval (Dumont 2016). The web service is based mainly on the Correspondence Metadata Interchange Format (CMIF), which is being developed within the TEI Correspondence SIG (Stadler 2014; TEI Correspondence SIG 2018). The CMIF is built on the TEI element correspDesc (Stadler, Illetschko, Seifert 2016), which is used in a highly reduced and restrictive way. URIs from authority files are extensively used for the identification of persons and places (cf. Stadler 2012).

The web service correspSearch was made available as a prototype in 2014. Since 2017, a new software architecture, funded by the Deutsche Forschungsgemeinschaft (DFG), has been implemented. The version 2.0 (beta) of correspSearch was presented at the TEI conference in Graz in 2019 and went into live production in summer 2021. The size of the dataset increased continuously in the last years - most of the contributions coming from the community of scholarly editions: there are now over 200,000 edited letters from over 360 publications indexed in correspSearch. This paper aims to outline the developments of the last years and give some insights to the current work, as well as an outlook on the upcoming version 3 of the web service.

Further developments in the last few years include e.g. new search functions based on information from authority files. The search options have been expanded by using not only the URIs from authority files for identification, but also the information associated with the URIs to enrich the aggregated CMIF data. Of particular relevance in this regard is the German National Library’s Integrated Authority File (Gemeinsame Normdatei)1, which in turn is supplemented with information from Wikidata2. This opens up new approaches to research, as for example a search for correspondences based on the gender or occupations of the correspondence partners.3 In the late summer of 2023, new search functionalities based on extensions of CMIF v2 (Dumont et. al. 2019) will be released in a public beta phase, such as people or places mentioned in the letter.

Not only the search functionalities, but also the capture tools and workflows provided by correspSearch have been constantly developed further. For example, the CMIF Creator (Müller-Laackman 2022; Müller-Laackman, Dumont, Grabsch 2019) offers new functions to capture correspondence between (only) two correspondents more easily. The tools CMIF Check4 and CMIF Preview5 provide new ways to check the CMIF file for technical (and in parts also contentwise) consistency. In addition, correspSearch has also developed videos that provide an initial overview of the web service, as well as more detailed information on the digital methodological background. Another video tutorial guides the user step by step through the CMIF Creator to support the community in providing correspondence metadata of their edited letters.6

Finally, we would like to look ahead to developments, which will (probably) be still underway in late summer/fall. These include full text search with snippet view as well as visualizations that support the exploration of the dataset or individual search requests.


About the authors

Stefan Dumont has been a researcher at TELOTA/BBAW since 2011. He is also coordinator of the DFG project correspSearch, co-leader of the DFG project The German Letter in 18th Century and co-convener of the TEI SIG Correspondence. His research focuses on digital editions of correspondence.

Sascha Grabsch studied Literary Studies, Philosophy and Media Studies in Potsdam. Since 2012, he has been working at the BBAW in DFG projects and in the TELOTA initiative in the field of DH. The focus of his work and research is on the digital indexing of research data and scholarly editions of modern texts.

Jonas Müller-Laackman studied Arabic Studies in Berlin and Leiden. He was a researcher in the correspSearch Team between 2018 and 2021. Since 2022, he has been working as a Referent for Digital Research Services at the State- and University Library Hamburg. His work focuses on multilingual DH, DH methodologies and Arabic literature.

Ruth Sander studied English Philology, Scandinavian Studies, and Digital Humanities in Göttingen and Stuttgart. She joined TELOTA in 2021 where her primary focus is the digital indexing of research data and scholarly editions of modern texts.

Steven Sobkowski studied Media Informatics in Berlin and has been working as a Research Software Engineer since 2023 in TELOTA for the projects correspSearch and Redaktions-/Onlinesystem für Online-Editionen des Bundesarchivs.


