Master's Theses in Data Science
Theses in Data Science are assigned twice a year by the Examination Board in a central process. Outside of this process, we can only assign topics in rare exceptional cases.
We only supervise external theses in exceptional cases if the task fits in well with the research topics of the professorship. Please ask Prof. Schenkel specifically if you have a suggestion for a Master's thesis topic that you would like to work on outside the university.
Examples for recently completed Master's theses
Abstract: This thesis offers an approach to detect booking duplicates by calculating sentence similarity as an application of Natural Language Processing. These bookings are exports of an accounting software. Among lots of other information, each booking has a booking note which is a short text written by the person who created the booking in the accounting software. The presented approach is part of a larger project in which all booking information is analyzed but in this thesis, solely the textual information of the notes is used for determining the similarity of two bookings. Several models are used for calculating the similarity of booking pairs and their results are compared. One important research objective is the comparison of the TFIDF as an application of the vector space model and language models as BERT and sentenceBERT which are using word and sentence embedding vectors. The best models achieve a F1-score of 0.6004 and an AUC-score of 0.555. Thorough analysis of True Positives, False Positives and False Negatives shows that embedding vectors not only offer advantages but other challenges are a consequence of using word embedding vectors when short texts are analyzed.
Keywords: Natural Language Processing - Duplicate Detection - Accounting - Short Texts
- no abstract available -
- no abstract available -
- no abstract available -
- no abstract available -
Abstract: Argumentation Mining aims at automatically extracting structured arguments from unstructured textual documents. This work addresses the conduction of a cross-lingual argumentation mining task, the detection of argumentative discourse units (ADU)s. Our contribution is two-fold: firstly, we extract a German and French ADU-annotated parallel corpus for further research, secondly, we thereupon compare five state-of-the-art language models (LM)s. Following the CRISP-DM framework for data mining, we prepare the data from the popular Europarl corpus by conducting a topic modeling to semantically trim corpus size. On the French and German subcorpus, annotations are made, distinguishing between the labels “non-argumentative”, “claim” and “premise”. Given the human baseline, in the modeling phase, the five LMs German BERT, German DistilBERT, CamemBERT, mBERT and mDistilBERT are compared on the sentence classification task. The task is performed by the LMs with moderate success. There is a performance difference between German and French models, leading to the insight that considering the input language as a feature and not only a parameter is crucial. Other than that, the beneficial influence of multilingual pretraining is discussed, triggering a need for further research.
Abstract: Data integration of RDF knowledge bases is an important task that plays an increasingly important role. By using many different data sources, it is possible to expand the data stock of a knowledge base or, if necessary, to correct erroneous information in the knowledge base. For this purpose, alignment systems are increasingly used, which relate the schema of one data source to that of another data source in such a way that the data can then be transferred between the data sources. One such system is FiLiPo (Finding Linkage Points). It automatically finds mappings between the schema of a local RDF knowledge base and the schema of a web API. One of the current challenges with such systems is to integrate users more into the process. Especially when it comes to explaining to the users how and why the system made certain decisions. This bachelor thesis therefore presents a user interface for the FiLiPo alignment system that graphically presents FiLiPo's data to users. The user interface should enable users to understand, analyse and, if necessary, change or remove the alignments generated by FiLiPo.
Abstract: Within the framework of the Semantic Web, information (knowledge) can be recorded in so-called knowledge graphs. However, these can quickly grow to an unmanageable size, so that both the content and the structure of the graph are difficult for people to comprehend. Therefore, it is necessary to find ways to create a basic understanding of the properties of knowledge graphs.
The aim of this work is to determine "knowledge about knowledge graphs" automatically by means of the mathematical model of Formal Concept Analysis (FCA) and to present it to the user. Therefore, an interactive tool was developed with which a user can perform and control the exploration of knowledge graphs.
To confirm the effectiveness of the tool, it was tested by a number of people and then evaluated. The test persons assessed the user experience and usability of the tool as predominantly positive. The aspects rated as less good offer clues for future improvements and optimisations to make the use of the tool even more attractive.
Abstract: In this final thesis the user interface for the FiLiPo system is presented. The development of such user interface requires a further study of problems and risks, the drafting of a concept and its implementation. One of the main goals was to develop an intuitive user interface that allows to use all the functionalities of the FiLiPo system. The thesis provides with the short introduction into schema alignment of RDF based knowledge bases and Web APIs. It also gives short information about the Angular framework that was used for the implementation. After describing the main requirements that have to be taken into consideration and giving answers on how to implement an intuitive user interface, the main concept is presented. It is based on already known solutions and examples, but still requires some creativity for the visualization of the alignment results. Then the implementation is documented. Using the Angular allows a quick integration of different components and their easy manipulation. The results of the user evaluation are presented that show if the concept and implementation were successful or not. In the end, we discuss on the further possible improvements.