A multi-media dictionary of endangered languages with TEI Lex-0: a case study of Hatoma, Yaeyama Ryukyuan

Authors: Nakagawa, Natsuko / Miyagawa, So

Date: Thursday, 7 September 2023, 4:15pm to 5:45pm

Location: Main Campus, L 1 <campus:stage>


Recently, the authors and their colleagues have published several digital as well as printed dictionaries of Ryukyuan languages spoken in Okinawa and Kagoshima Prefectures, Japan. Ryukyuan languages are endangered, as only the grandparent generation speaks them in everyday conversation. Younger generations mainly use Japanese; some of them understand but do not use the languages, while others neither understand nor use them (Hanmine 2020). Searchable digital dictionaries with audio information help potential Ryukyuan speakers to learn their local languages.

Currently, we are building a digital dictionary of Hatoma, a Yaeyama Ryukyuan language, in the TEI Lex-0 format (Romary and Tasovac 2018). The original dictionary was written by Shinichi Kajiku, a native speaker of Hatoma Ryukyuan and a linguist. He spent more than 50 years compiling the dictionary, which was structured in a spreadsheet format by one of the authors in 2019 (Kajiku and Nakagawa 2020). The structured dictionary with audio files for each entry is now available online.

The dictionary contains some elements that TEI Lex-0 does not assume yet. For example, the guideline does not mention how to encode audio files. We decided to use the media element to provide the information on audio files. Additionally, the guideline does not provide elements related to tones and accents, while it does provide the stress element. We decided to use the <pron> element with the attribute @notation="accent" to provide accent information in Hatoma Ryukyuan. Finally, the original dictionary provides information on the pronunciation of example sentences in addition to entries. The example sentences and entries are written in kana, a Japanese writing system, which is a reader-friendly writing system but does not perfectly provide information on how to pronounce each expression. We decided to use the <pron> element with @notation="ipa" for example sentences although it is currently meant for only entries.


Kajiku, Shinichi, and Natsuko Nakagawa. (2020). Hatoma-Japanese Dictionary. Tachikawa: NINJAL Language Variation Division.

Hanmine, Madoka. (2020). Speaking my Language and Being Beautiful – Decolonizing Indigenous Language Education in the Ryukyus with a Special Reference to Sámi Language Revitalization. Rovaniemi: Lapin Yliopisto.

Romary, Laurent and Toma Tasovac. (2018). “TEI Lex-0: A Target Format for TEI-Encoded Dictionaries and Lexical Resources.” TEI 2018 Conference.

About the authors

Natsuko Nakagawa is an associate professor at the National Institute for Japanese Language and Linguistics. She is a linguist doing fieldwork on Yaeyama Ryukyuan and Nambu dialect in Aomori. Her research interest is how people within and across the research areas can communicate with each other using digital data.

So Miyagawa is an assistant professor at the National Institute for Japanese Language and Linguistics, working on digitization projects of language resources (text, audio, video, image) of endangered Ryukyuan languages. His research interests are in cultural heritage preservation and revitalization.

Contribution Type