Master's Theses in Data Science

Theses in Data Science are assigned twice a year by the Examination Board in a central process. Outside of this process, we can only assign topics in rare exceptional cases.

We only supervise external theses in exceptional cases if the task fits in well with the research topics of the professorship. Please ask Prof. Schenkel specifically if you have a suggestion for a Master's thesis topic that you would like to work on outside the university.

Examples for recently completed Master's theses

[MT] Automatic selection of thematically suitable publications for indexing in a subject-specific bibliographic database

 - no abstract available -

[MT] A Web-Interface for Exploration and Visualization of Bibliographic Metadata

Abstract: There are many systems for the exploration of bibliographic metadata. However, retrieving and filtering information that is actually relevant often requires complicated search interfaces and long search paths, especially for complex information needs. In this work a web interface for the exploration and visualization of bibliographic metadata is proposed. The core idea is based on a Domain Specific Query Language (DSQL) called SchenQL which aims to be easy to learn and intuitive for domain experts as well as casual users for efficiently retrieving information on bibliographic metadata. This is achieved by using natural sounding keywords and specially designed functions for this particular domain. In addition, the web interface implements useful visualizations of citations and references or co-author relationships. The interface also offers keyword suggestions and an auto completion feature that allows for easily creating SchenQL queries, without having to learn all the keywords of the language beforehand. A three-part user study with 10 students and employees from the field of computer science was conducted where the effectiveness and usability of the SchenQL web interface was evaluated.

[BT] Comparison of contextualised embedding methods for similarity calculation of statements

 - no abstract available -

[BT] Feature Evaluation of Citation Distance Networks: Exploring new ways of measuring Scientific Impact

Abstract: This thesis introduces improvements to current approaches of classifying scientific work by observing the semantic similarity of publications in the same citation neighborhood. Available patterns in the neighborhood structures are used to generate an initial set of features. Different text representations, similarity measures and feature modes are implemented and studied to explore new approaches of generating meaningful features that improve classification procedures. Features are evaluated in terms of their predictive power when learning a model that distinguishes between seminal and survey publications. Learning patterns from features to better distinguish between the publications will be a proxy of the effectiveness of these features in evaluating research impact. The state-of-the-art research in this area achieved a result of 68.97% prediction accuracy whereas the approaches presented in this thesis achieved a prediction accuracy of up to 86.98% and therefore beat the latest results by a large margin. Thorough evaluation of the feature sets reveals which relationships in a neighborhood structure provide information that can help improve current research evaluation metrics by identifying high impact scientific work.

Keywords: Semantometrics - Feature Engineering - Natural Language Processing

[BT] Learning the Interface of Web APIs

All kinds of information can be retrieved from web APIs, for example metadata of publications. However, it is not always obvious what kind of data must be sent to the web API in order to receive a meaningful response. For this problem, a programme was developed that learns the appropriate transfer parameters of web APIs with the help of a source database. For this purpose, each type of data from the source database is sent to the web API and it is checked whether the response of the API is related to the sent data. Various parameters can be used to configure how closely the responses of the web API must match the data of the source database in order to be considered meaningful. For this purpose, several metrics for calculating string similarities were used to find the matches of both data sets. Through a suitable evaluation, it could be shown that with good configuration parameters all matches are found. In the presented system, a user also has the possibility to choose different metrics to compare the similarity of two values. For example, it is possible to specify that there must be an exact match between some data, such as ISBNs or other IDs. With the right configuration parameters, as well as knowing and specifying which metric is best for which type of data, almost any data can be recognised as a match that a human would also consider a match.

[BT] Meter reading app

 - no abstract available -