An annotation model for software mentions and citations

Authors: Alvares Freire, Fernanda / Ferger, Anne / Henny-Krahmer, Ulrike / Jettka, Daniel

Date: Wednesday, 6 September 2023, 9:15am to 10:45am

Location: Main Campus, L 2.202


Appropriate citation of software plays an important role in academic publications to make research results reproducible and reusable. There are several recommendations and guidelines on how to deal with research software (Anzt et al. 2021; Smith et al. 2016; Lamprecht et al. 2020) and how to cite software that is used in the research process (Jackson n.d.; Chue Hong et al. 2019a, 2019b; Druskat 2021a, 2021b).

To find out if these recommendations (e.g. consistent versioning, persistent identification, appropriate credit to developers) are actually reflected in practice, we examined conference abstracts of the DHd (Henny-Krahmer/Jettka 2022) and ADHO conferences (Jettka et al., to appear). Apparently, there is great potential (and need) for improving the current situation. In our software citation studies, an annotation model was formulated in the form of a TEI taxonomy. Initially, a document-centered approach was pursued, i.e., software mentions were semi-automatically identified and directly annotated with citation information that could be located somewhere else in the document, for instance, in the bibliographies of the abstracts.

We now propose a revised TEI annotation model, which aims at a more precise annotation to differentiate between pure software mentions (names) and other parts of citation information (such as URLs, developers, or bibliography entries) and linking these parts together. In our new approach, we use pointers (<ptr>) to an externally defined list of software entities and reference these pointers from the annotation instances of corresponding citation information. Thus we still aim at examining the current situation of software citation (this time in articles of the Journal of the Text Encoding Initiative), but at the same time, we provide and use a model and create a data basis of software mentions and citations which could be used for training of automatic methods to identify software mentions and corresponding citation information in academic texts.


About the authors

Fernanda Alvares Freire (0000-0002-6414-5212) is a research associate at the University of Rostock. Her work currently focuses on the development of study programs in the field of Digital Humanities. Additionally, she works on digital research methods such as social network analysis, data mining, and general data management and exploration approaches.

Anne Ferger (0000-0002-1382-2658) is a Research Associate at the University of Paderborn and University of Duisburg-Essen. Currently she mainly works on sustainable research tools and software development in the context of NFDI4Culture and FAIR research data management at the project MuMoCorp.

Ulrike Henny-Krahmer (0000-0003-2852-065X) is junior professor for Digital Humanities at the University of Rostock. She is Co-convenor of the DHd working group on Research Software Engineering (DH-RSE). Her research focuses on digital scholarly editions, digital text analysis, and evaluation and sustainability of DH research outputs.

Daniel Jettka (0000-0002-2375-2227) is research associate at the Universität Paderborn and works on sustainable software development and research infrastructure implementation in NFDI4Culture. Previously, he worked on repository management, interfaces for spoken language corpora, and digital infrastructure for the documentation of endangered languages.

