Encoding previously Transcribed Oral Literature: The Forager Folklore Database (FFDB)

Authors: Jokisch, Jan Niklas

Date: Thursday, 7 September 2023, 4:15pm to 5:45pm

Location: Main Campus, L 1 <campus:stage>


The Forager Folklore Database (FFDB) assembles a large corpus of hunter-gatherer oral literature in English translation. It is designed as an empirical basis for the systematic study of narrative universals and the evolutionary origin of storytelling, as well as for (comparative) folklore studies.

The folktales in question were originally recorded by anthropologists and others (clergy, explorers, etc.) during the past 150 years. The FFDB, aside from generating rich metadata for each narrative indicating its provenance as well as selected textual features, will provide a corpus of narratives as digital texts encoded in TEI XML. The encoding preserves spelling and page beginnings of the original source, thus making the oftentimes hard to access material available in a citable format. Furthermore, semantic annotation is set up for animals and plants (semi-automatically through WordNet synsets), colors, person and place names as well as for narrative categories such as ana- and epimythia. This will offer researchers a more fine-grained method of text retrieval, e.g., finding all texts containing birds, and allow for richer computerized approaches in the growing field of computational folktale and narrative studies.

The form of previously transcribed orality that the folktales take comes with its own set of challenges. Though they are transmitted to us in writing, many of the relevant features of the narratives derive from their origin in oral storytelling. Conversational framings, explanatory comments, personal asides, remarks and questions from the audience, the use of gestures, onomatopoeia, and songs go beyond the generally more bookish approaches of the TEI. They require new tools, new tags, and a new understanding of textuality and authorship.

The general project infrastructure is set up dynamically in the form of a work-in-progress relational database. We use Python to generate the TEI header automatically from the data stored in the database, before combining the headers with the pre-annotated texts. Unless more restrictive copyright prohibits this, the files will be published under a CC BY-NC-SA license. The metadata for the encoded narratives are enriched further by the motif assignments in Stith Thompson’s Motif-Index of Folk-Literature (1950-58), together with additional motif assignments, e.g., by Johannes Wilbert and Karin Simoneau (Folk Literature of the South American Indians, 1970-1992) and Sigrid Schmidt (Catalogue of the Khoisan Folktales of Southern Africa, 2013). A small selection of roughly 70 narratives has been encoded already to showcase the direction and possible scope of the FFDB.

Contribution Type