Extraction of semantic statements on the content of humanities literature using OpenAlex (Schenkel, Schöch)
The amount of specialist literature published each year exceeds the reading capacity of researchers. Traditionally, abstracts and keywords (for articles) and reviews (for books) serve as a solution. However, these cannot make the content of a specialist article or book available comprehensively and, above all, not semantically and machine-readable, and thus richly analyzable in the sense of 'semantic publishing'. The aim of the sub-project is to develop strategies for solving this challenge, which is of great relevance for classic research tasks such as determining the state of research or describing the research history.
TP 3 (Digital Humanities & Informatics): Extraction of semantic statements on the content of specialist literature in the humanities using OpenAlex
In the humanities, too, the amount of specialist literature published each year exceeds the capacity of researchers many times over. Traditionally, abstracts and keywords (for articles) and reviews (for books) serve as a solution. However, these cannot make the content of a specialist article or book available comprehensively and, above all, not semantically and machine-readable, and thus richly analyzable in the sense of 'semantic publishing' (Shotton 2009, see also Schöch 2020).
The aim of the sub-project is to develop strategies for solving this challenge, which is of great relevance for classic research tasks such as determining the state of research or describing the research history of a research question (see Kreutz and Schenkel 2022). For this purpose, extensive collections of specialist literature from several humanities domains are to be semantically annotated manually in parts on the one hand, and on the other hand, the transformation of abstract and keywords or, if available, the full text of a publication into a manageable number of meaningful LOD statements is to be learned on this basis.
This also requires the modeling of the domain, at least in its rough elements, which means that aspects such as the nature of the underlying object of investigation, the epoch covered, the tools and methods used and the underlying specialist literature must be taken into account. In this respect, this sub-project is to be understood as a cross-sectional project that can make an important contribution to the integration of the results from the other sub-projects. The project could be operated on the basis of data from the Open Alex platform, where extensive metadata is already available in the form of LOD via an API or as a dump. However, the indexing depth of OpenAlex is comparatively low and follows the logic of a folksonomy rather than that of a structured model of a domain. This is to be made possible through the use of full texts (more information about the article content) and a data model of the domains examined (more structured vocabulary).
Team
- Prof. Dr. Ralf Schenkel
- Prof. Dr. Christof Schöch
- Jens Bruchertseifer
References
- Kreutz, Christin Katharina, and Ralf Schenkel. 2022. “Scientific Paper Recommendation Systems: A Literature Review of Recent Publications.” International Journal on Digital Libraries 23 (4): 335–69. https://doi.org/10.1007/s00799-022-00339-w.
- Schöch, Christof. 2020. “Open Access für die Maschinen.” In Die Zukunft des kunsthistorischen Publizierens, herausgegeben von Maria Effinger & Hubertus Kohle. Heidelberg: ART-Books. https://doi.org/10.11588/arthistoricum.663.c9210.
- Shotton, David. 2009. “Semantic Publishing: The Coming Revolution in Scientific Journal Publishing.” Learned Publishing 22 (2): 85–94. https://doi.org/10.1087/2009202.
Project Activities
Conference presentations
- Jens Bruchertseifer, Patrick Neises, Maria Hinzmann, Ralf Schenkel, Christof Schöch: "Investigating Zero-shot Topic Labelling of Scientific Papers Using LLMs". Workshop on Big (and Small) Data in Science and Humanities (BigDS 2025) im Rahmen der 1st Conference on Database Systems for Business, Technology and Web (BTW 2025), Bamberg University, March 3–7, 2025. – URL: https://btw2025.gi.de/program/workshops/bigds.
- Johanna Konstanciak, Tinghui Duan, Matthias Bremm, Anne Klee, Joëlle Weis, Maria Hinzmann, Julia Röttgermann, Christof Schöch: “Federated Queries for Literary Studies: Querying Wikidata via the MiMoTextBase and the Other Way Around”. International Conference Linked Open Data and Literary Studies, org. Frank Fischer. Berlin: Freie Universität Berlin, 19-20 Nov 2024. – Slides: https://mimotext.github.io/lod-lithist/federated-queries.html#/
- Christof Schöch: “Artificial Intelligence / Large Language Models and the Digital Humanities”. Third International Conference on Digital Humanities (CODH-24): The Next Stick and Stone of Civilization. Binus University, Semarang, Central Java, Indonesia, 30 Oct 2024. – Keynote, delivered remotely. – Website: https://digitalhumanities.website/speakers-codh-2024/
- Matthias Bremm, Maria Hinzmann, Julia Röttgermann and Christof Schöch: Linked Open Data for the Humanities: Lessons Learned in MiMoText & further TCDH projects. Online-Workshop STAGE project and MiMoText / TCDH projects, organised by Clarisse Bardiot and Christof Schöch | February 27, 2025. Slides: https://mimotext.github.io/lod-lithist/wikiverse.html#/
- Maria Hinzmann, Julia Röttgermann, Christof Schöch, Johanna Konstanciak, Tinghui Duan, Matthias Bremm, Anne Klee, Joëlle Weis: „Federated Queries for Literary Studies: Querying Wikidata via the MiMoTextBase and the other way around“, Conference Linked Open Data and Literary Studies, 19.-20.11.2024, Freie Universität Berlin.
- Maria Hinzmann, Julia Röttgermann. „Bidirectional Federated Queries on MiMoTextBase and Wikidata”, WikiMUC/Federated Queries Workshop, 05.-06.12.2024, München.
- Johanna Konstanciak, Tinghui Duan, Matthias Bremm, Anne Klee, Joëlle Weis, Maria Hinzmann, Julia Röttgermann, Christof Schöch: "Federated Queries for Literary Studies: Querying Wikidata via the MiMoTextBase and the Other Way Around". Linked Open Data and Literary Studies (International Conference). 19 November 2024, Freie Universität Berlin, Germany. – URL: https://www.temporal-communities.de/events/2024/conference-linked-open-data.html.
- Christof Schöch: “MiMoText – Mining and Modeling Text”. Workshop Databases on 18th Century France: Cooperation and Exchanges, org. Simon Dagenais and Damien Tricoire. 5 and 23 Sept 2024, Trier University, Germany. – URL: https://papa.uni-trier.de/2024/08/21/databases18thcenturyfrance/.
Publications
- Jens Bruchertseifer, Patrick Neises, Maria Hinzmann, Ralf Schenkel und Christof Schöch (2025). „Investigating Zero-shot Topic Labelling of Scientific Papers Using LLMs“. In: Workshop on Big (and Small) Data in Science and Humanities (BigDS 2025), 1st Conference on Database Systems for Business, Technology and Web (BTW 2025). Bamberg University, March 3–7, 2025. DOI: 10.18420/BTW2025-122.
- Maria Hinzmann, Matthias Bremm, Tinghui Duan, Anne Klee, Johanna Konstanciak, Julia Röttgermann, Moritz Steffes, Christof Schöch, Joëlle Weis (2025 / im Druck). “Patterns in modeling and querying a knowledge graph for literary history”. In: Pattern Theory in Language and Communication, ed. Sabine Arndt-Lappe, Milena Belosevic, Peter Maurer, Claudine Moulin, Achim Rettinger & Sören Stumpf. Trier: TCLC. – URL (preprint): https://doi.org/10.5281/zenodo.12080340.
Digital resources
- TP3-STTCL (Wikibase-Instanz), Wikibase.cloud, 2025. URL: https://tp3-sttcl.wikibase.cloud/.
Other
- Weitere aktuelle Vorträge, Workshops und Publikationen mit Relevanz für das Teilprojekt und unter Beteiligung von Mitarbeitenden des Teilprojekts sind auf der Seite Aktivitäten des Vorgänger-Vorhabens Mining and Modeling Text gelistet.