Extraction of semantic statements on the content of humanities literature using OpenAlex (Schenkel, Schöch)
The amount of specialist literature published each year exceeds the reading capacity of researchers. Traditionally, abstracts and keywords (for articles) and reviews (for books) serve as a solution. However, these cannot make the content of a specialist article or book available comprehensively and, above all, not semantically and machine-readable, and thus richly analyzable in the sense of 'semantic publishing'. The aim of the sub-project is to develop strategies for solving this challenge, which is of great relevance for classic research tasks such as determining the state of research or describing the research history.
TP 3 (Digital Humanities & Informatics): Extraction of semantic statements on the content of specialist literature in the humanities using OpenAlex
In the humanities, too, the amount of specialist literature published each year exceeds the capacity of researchers many times over. Traditionally, abstracts and keywords (for articles) and reviews (for books) serve as a solution. However, these cannot make the content of a specialist article or book available comprehensively and, above all, not semantically and machine-readable, and thus richly analyzable in the sense of 'semantic publishing' (Shotton 2009, see also Schöch 2020).
The aim of the sub-project is to develop strategies for solving this challenge, which is of great relevance for classic research tasks such as determining the state of research or describing the research history of a research question (see Kreutz and Schenkel 2022). For this purpose, extensive collections of specialist literature from several humanities domains are to be semantically annotated manually in parts on the one hand, and on the other hand, the transformation of abstract and keywords or, if available, the full text of a publication into a manageable number of meaningful LOD statements is to be learned on this basis.
This also requires the modeling of the domain, at least in its rough elements, which means that aspects such as the nature of the underlying object of investigation, the epoch covered, the tools and methods used and the underlying specialist literature must be taken into account. In this respect, this sub-project is to be understood as a cross-sectional project that can make an important contribution to the integration of the results from the other sub-projects. The project could be operated on the basis of data from the Open Alex platform, where extensive metadata is already available in the form of LOD via an API or as a dump. However, the indexing depth of OpenAlex is comparatively low and follows the logic of a folksonomy rather than that of a structured model of a domain. This is to be made possible through the use of full texts (more information about the article content) and a data model of the domains examined (more structured vocabulary).
Team
- Prof. Dr. Ralf Schenkel
- Prof. Dr. Christof Schöch
- Jens Bruchertseifer
References
Kreutz, Christin Katharina, and Ralf Schenkel. 2022. “Scientific Paper Recommendation Systems: A Literature Review of Recent Publications.” International Journal on Digital Libraries 23 (4): 335–69. https://doi.org/10.1007/s00799-022-00339-w.
Schöch, Christof. 2020. “Open Access für die Maschinen.” In Die Zukunft des kunsthistorischen Publizierens, herausgegeben von Maria Effinger & Hubertus Kohle. Heidelberg: ART-Books. https://doi.org/10.11588/arthistoricum.663.c9210.
Shotton, David. 2009. “Semantic Publishing: The Coming Revolution in Scientific Journal Publishing.” Learned Publishing 22 (2): 85–94. https://doi.org/10.1087/2009202.