Indexing text content and image elements using multimodal knowledge representations based on the example of extensive collections of wine labels
Today, powerful knowledge graphs are increasingly being trained not only with text, but also with multimodal data, which includes text and images in particular. This opens up corresponding possibilities for supporting multimodal indexing processes in which, for example, text recognition with OCR benefits from image objects that are also present in the document and, conversely, the automatic recognition of image objects is supported by text that is also present. This can also be implemented by corresponding knowledge representations in a single, joint processing step. The aim of the sub-project is to use this paradigm for the indexing of an extensive inventory of wine labels created by web scraping.
TP5 (Digital Cultural Studies & Computational Linguistics): Indexing text content and image elements using multimodal knowledge representations using the example of extensive collections of wine labels
Today, powerful knowledge graphs are increasingly being trained not only with text, but also with multimodal data, which includes text and images in particular. This opens up corresponding possibilities for supporting multimodal indexing processes in which, for example, text recognition with OCR benefits from image objects that are also present in the document and, conversely, the automatic recognition of image objects is supported by text that is also present. This can also be implemented by corresponding knowledge representations in a single, joint processing step. The aim of the sub-project is to use this paradigm for the indexing of an extensive inventory of wine labels created through web scraping. These can serve as an example of often very complex text-image media and therefore form a good starting point. Part of the pipeline to be developed will also be to provide the recognized text and image parts with Wikidata identifiers and other standard data so that they can be linked to other parts of the knowledge graph that is created in the network, as well as to knowledge graphs such as Wikidata. Possibilities for generalizing the process beyond wine labels and using it for other text-image media such as postcards, geographical maps or book illustrations are also being considered.
Team
- Veronika Wassermayr, M.Sc.
- Dr. Joëlle Weis
- Prof. Dr. Achim Rettinger
- Prof. Dr. Christof Schöch
Activities
- Chapter: Joëlle Weis, Christof Schöch (2024): “Vom Perler Hasenberg zur Lehmener Würzlay – Weinetiketten digital erschließen”. In: Digital ist besser? Sammlungsforschung im digitalen Zeitalter, edited by Katharina Günther und Stefan Alschner. Tagungsband der Endterm-Tagung des Forschungsverbunds Marburg-Wolfenbüttel-Weimar (MWW), Klassik Stiftung Weimar, 16.–17. Feb 2023. Göttingen: Wallstein. – URL: https://www.wallstein-verlag.de/9783835356153-002.html (Open Access).
- Talk: Christof Schöch, Claudine Moulin, Joëlle Weis: “Historical wine labels as pointers to places and spaces of wine cultivation, production and distribution: A case study from the German Mosel region”. Wine, place and space – Global geographies of wine cultivation, production and consumption, org. Daniela Ana, Marc Daferner, Tatiana López, Gerhard Rainer, Susann Schäfer, Christian Steiner, Anika Zorn. Eichstätt: KU Eichstätt, Feb 21-23, 2024. – URL: www.ku.de/en/the-ku/faculties/mgf/geographie/aktuelles/termine/wine-place-space. – Presentation: https://doi.org/10.5281/zenodo.14000744.
- Talk: Christof Schöch: “Weinetiketten erzählen Geschichte(n)”. KuLaDig Netzwerktreffen Rheinland-Pfalz, org. Christine Brehm. Bendorf-Sayn: Sayner Hütte, 5 Sept. 2023. – Presentation: https://dhtrier.quarto.pub/weinetiketten/.
- Resource: Weinetiketten der Mosel, coord. Christof Schöch. – URL: https://mosel.wikibase.cloud/ (experimental / work in progress).
- Resource: Wine Label Vocabulary (WLV), coord. Christof Schöch. – URL: https://github.com/dh-trier/wlv/blob/master/resources/wlv-label-docs.md (work in progress).
- Presse: Markus Naumann, "Die Etikettenretter", DWZ - Die Winzer-Zeitschrift, August 2024, S. 23 (PDF).