LODinG aims to to explore the potential of the Linked Open Data (LOD) paradigm at the intersection of qualitative and quantitative studies in the humanities. The project consists of the following work areas:
TP1 (Digitale Lexikografie & Germanistik): Pandemiewortschatz und LOD – (Infektions-) Krankheiten in vernetzten digitalen Wörterbüchern
In subproject 1, we are developing the historical dictionary "Pandemictionary", which is part of the LODinG project of the Trier Centre for Digital Humanities and focuses on the vocabulary of historical pandemics such as cholera and the Spanish flu. It analyses how people spoke about these pandemics at the time by searching historical corpora and collecting pandemic vocabulary as linked open data. The data will be prepared for publication and retrieval in a Wikibase on the one hand and as a dictionary in a Wiki instance based on the structure of the Wiktionary on the other. The dictionary provides information on the meaning, pronunciation and grammar of keywords as well as authentic examples that illustrate the use and nuances of the words in different contexts.
TP2 (Digitale Edition, Germanistik & Romanistik): LOD für die editorische Erschließung von Literatur der frühen Neuzeit mit Fokus auf Botanik und Medizin
In this sub-project, we are particularly interested in early modern medical prose, not only in medical terminology but also in botany, which posed a lexicographical challenge both then and now. The sub-project explores early modern text ensembles from Romania and Germania that are significant in a European context. Romance studies: One of the central Dioscorides translations in Spain, which has not yet been edited in modern form and was a bestseller in the 16th century, inserts amusing anecdotes into the original text at various points, which make the work interesting for literary studies and linguistic questions.
TP 3 (Digital Humanities & Informatik): Extraktion von semantischen Statements zum Inhalt von geisteswissenschaftlicher Fachliteratur anhand von OpenAlex
The amount of specialist literature published each year exceeds the reading capacity of researchers. Traditionally, abstracts and keywords (for articles) and reviews (for books) serve as a solution. However, these cannot make the content of a specialist article or book available comprehensively and, above all, not semantically and machine-readable, and thus richly analyzable in the sense of 'semantic publishing'. The aim of the sub-project is to develop strategies for solving this challenge, which is of great relevance for classic research tasks such as determining the state of research or describing the research history.
TP4 (Sinologie & Informatik): Extraktion von semantischen Statements zum Inhalt von chinesisch-sprachiger Fachliteratur
China continues to be underestimated in the West as a powerful, innovative and productive research player, not least because a large part of the research output cannot be received due to language barriers. One way to break down these barriers is to make the content of Chinese specialist literature machine-readable and semantic, and thus also language-independent. With the help of experts in the relevant domains of Chinese-language research and the automatic processing of Chinese language, this sub-project therefore aims to transfer the experience gained in SP3 with the extraction and modeling of semantic statements from specialist literature to a suitable stock of Chinese-language specialist literature and thus make it accessible in a language-independent manner.
TP5 (Digitale Kulturwissenschaft & Computerlinguistik): Erschließung von Textinhalt und Bildelementen unter Nutzung multimodaler Wissensrepräsentationen am Beispiel umfangreicher Bestände an Weinetiketten
Today, powerful knowledge graphs are increasingly being trained not only with text, but also with multimodal data, which includes text and images in particular. This opens up corresponding possibilities for supporting multimodal indexing processes in which, for example, text recognition with OCR benefits from image objects that are also present in the document and, conversely, the automatic recognition of image objects is supported by text that is also present. This can also be implemented by corresponding knowledge representations in a single, joint processing step. The aim of the sub-project is to use this paradigm for the indexing of an extensive inventory of wine labels created by web scraping.
TP6 (Rechtswissenschaft & Digital Humanities): Legal LOD – Konzept-basierte Erschließung von mehrsprachigen europäischen Rechtstexten
The subject of the legal sub-project is the development of a multilingual corpus of European legal texts relating to digitization, such as the Digital Services Act 19 October 2022 (Regulation 2022/2065). This type of text is available in the 24 official languages of the EU, whereby all language versions are equally binding and are therefore in principle considered to have the same content, whereby the principle of equality of EU law is to be realized in all Member States. However, due to the complex editing, coordination and translation processes, there are always differences in detail that cannot be found and clarified using the simple synoptic display on the EUR-Lex platform. The sub-project aims to solve this problem by: (a) automatically aligning the legal texts sentence by sentence; (b) identifying key legal terms and other concepts and making them available as an LOD-enabled ontology; (c) annotating these concepts across the translations (first manually, then automatically); and finally (d) enabling a concept-guided search for term usages and definitions of relevant terms across different language versions, so that passages that differ in detail can be identified and their significance for Europe-wide national case law can be assessed. The free availability of the texts and their availability in semi-structured HTML format are conducive to this project.
TP7 Querschnittsprojekt: Integration und Föderation von LOD-Ressourcen: Domänenüber-greifende Modellierung geisteswiss. Fachwissens
The sub-project is intended to provide methodological and conceptual support by developing one or more data models for specialist knowledge in the humanities in consultation with the other TPs involved in the network and facilitating their integration into the common knowledge graph. In terms of the formal description of knowledge, the techniques commonly used to organize and represent knowledge, such as catalogs, glossaries, taxonomies, classifications, thesauri, semantic networks, ontologies and frames, are tested for their applicability in the respective project situations. The approach is modular, i.e. a distinction is made between cross-domain and domain-specific entities and predicates.
INF (TCDH und UB): Infrastruktur für LOD und Support (Burch, Schirra, Röpke)
As a cross-sectional project, the INF sub-project is intended to implement central technical requirements and methods for the other sub-projects and thus lay the foundations for the interoperability of the data modeled in the TPs and support professional research data management (RDM). The software basis for this is the general Wikidata platform (www.wikidata.org) on the one hand and a separate Wikibase instance on the other, which can accommodate project-specific data that initially does not have general Wikidata identifiers but is nevertheless stored in a Wikidata framework and can be interoperably linked with other parts of the knowledge graph that is created in LODinG. Technically, WikiData, like all projects operated by the Wikimedia Foundation, is based on MediaWiki and uses a Wikibase consisting of a repository to store structured data. The INF project is developing interfaces for this framework in order to be able to synchronize data.