Bachelor's and Master's Theses
Bachelor's and Master's theses can be written in German or, by arrangement, in English. In the Master Data Science, the thesis has to be written in English.
Valid only for the Master Data Science: Please see this page for information.
Topics
In general, we offer topics from the fields of databases, information retrieval and semantic information systems. More precisely, our topics mainly belong to one or more of the areas of searching on semistructured data, integration of heterogeneous information sources, efficiency of large-scale search engines, conversational information retrieval, natural language processing, human computer interaction, data integration, query processing, semantic web, computational argumentation (ranking, clustering, validating and extracting arguments from natural language texts), scholarly recommendation systems, domain-specific query languages and scientometrics.
The topic of a thesis determines which person supervises the thesis. The thematic focus of the advisors can be found on their personal page under Team.
If you are interested in a topic suggested by the chair or if you have your own topic suggestion for a Bachelor's or Master's thesis, please contact Prof. Dr. Ralf Schenkel. If you have already spoken with staff of the chair about a possible topic, please also include this in your email.
Requirements
Please send us a list of your successfully completed modules with your request for a thesis. This overview helps us to assess which possible topic might fit your skills.
For a Bachelor's thesis, we expect that you have already successfully completed the following modules (if included in your module plan as a compulsory module) before you apply for a topic with us, as the content is very helpful for the successful completion of a Bachelor's thesis in our topics: Database Systems (Datenbanksysteme), Non-Relational Information Systems (Nichtrelationale Informationssysteme), CS-Project (Informatik-Projekt or Großes Studienprojekt), Advanced Programming (Fortgeschrittene Programmierung or Programmierung II).
For a Master's thesis, we expect you to have attended relevant Master's lectures offered by the group in the field of database systems or information retrieval. Ideally, you should also have completed your research project with the group.
Completed Bachelor's theses
- no abstract available -
- no abstract available -
- no abstract available -
Politics and Linguistics have an inextricable affinity. A wide array of evidence suggests that latent ideological nuances are ingrained within the language of political discourse. Over the last decade, uncovering and leveraging patterns in language data has become one of the most outstanding achievements of modern Data Science, which raises some noteworthy questions regarding its prospects within the political landscape.
This paper will examine how the relationship between Politics and Linguistics can be approached in Data Science. I will explore the abilities and limitations of contemporary concepts and state-of-the-art instruments in Natural Language Processing, Machine Learning, and Information Retrieval to address questions inspired by political linguistics, and, more specifically, to classify political claims in terms of their ideology with the help of political party programs in the context of an election process. The connections between Linguistics, Ideology and Data Science are interesting in their own right, but may also be of paramount importance for practical applications. Leveraging political linguistics could have profound implications for research on political behavior, and enable a more accessible way of understanding political agendas, revealing antagonistic lexical structures that arise from a set of political parties competing for attention and support in the context of an election.
Abstract: Data integration of RDF knowledge bases is an important task that plays an increasingly important role. By using many different data sources, it is possible to expand the data stock of a knowledge base or, if necessary, to correct erroneous information in the knowledge base. For this purpose, alignment systems are increasingly used, which relate the schema of one data source to that of another data source in such a way that the data can then be transferred between the data sources. One such system is FiLiPo (Finding Linkage Points). It automatically finds mappings between the schema of a local RDF knowledge base and the schema of a web API. One of the current challenges with such systems is to integrate users more into the process. Especially when it comes to explaining to the users how and why the system made certain decisions. This bachelor thesis therefore presents a user interface for the FiLiPo alignment system that graphically presents FiLiPo's data to users. The user interface should enable users to understand, analyse and, if necessary, change or remove the alignments generated by FiLiPo.
Abstract: Within the framework of the Semantic Web, information (knowledge) can be recorded in so-called knowledge graphs. However, these can quickly grow to an unmanageable size, so that both the content and the structure of the graph are difficult for people to comprehend. Therefore, it is necessary to find ways to create a basic understanding of the properties of knowledge graphs.
The aim of this work is to determine "knowledge about knowledge graphs" automatically by means of the mathematical model of Formal Concept Analysis (FCA) and to present it to the user. Therefore, an interactive tool was developed with which a user can perform and control the exploration of knowledge graphs.
To confirm the effectiveness of the tool, it was tested by a number of people and then evaluated. The test persons assessed the user experience and usability of the tool as predominantly positive. The aspects rated as less good offer clues for future improvements and optimisations to make the use of the tool even more attractive.
Abstract: In this final thesis the user interface for the FiLiPo system is presented. The development of such user interface requires a further study of problems and risks, the drafting of a concept and its implementation. One of the main goals was to develop an intuitive user interface that allows to use all the functionalities of the FiLiPo system. The thesis provides with the short introduction into schema alignment of RDF based knowledge bases and Web APIs. It also gives short information about the Angular framework that was used for the implementation. After describing the main requirements that have to be taken into consideration and giving answers on how to implement an intuitive user interface, the main concept is presented. It is based on already known solutions and examples, but still requires some creativity for the visualization of the alignment results. Then the implementation is documented. Using the Angular allows a quick integration of different components and their easy manipulation. The results of the user evaluation are presented that show if the concept and implementation were successful or not. In the end, we discuss on the further possible improvements.
- no abstract available -
Abstract: Databases are used to store information and it is therefore essential that they are complete. In reality, however, databases have gaps and therefore methods must be used to supplement this missing information. Existing Linked Data systems use interfaces (SPARQL endpoints) for this purpose, which are not provided by all data providers. The common solution in practice is to provide a web API to still be able to request information. In order to be able to supplement missing information via Web APIs, a programme is implemented in this thesis that enables the connection of Linked Data systems and Web APIs. Thus, the programme ExtendedSPARQL developed in this thesis can completely answer a query to the local knowledge base by filling in missing information on-the-fly with the help of external web APIs. In doing so, the programme decides which external Web APIs are relevant for missing information and how to request the external Web APIs. It also decides how to extract the information it is looking for from Web API responses and how to add it to the results of the query. Furthermore, ExtendedSPARQL executes as few Web API requests as possible so that missing information is added with the least effort and redundant information is avoided. It is also easy to use, so that even users with only basic SPARQL knowledge can successfully perform ExtendedSPARQL queries. ExtendedSPARQL also provides a graphical user interface, which makes it even easier to use. In a subsequent evaluation, the programme proved that missing information can be successfully added using external web APIs and that redundant results rarely occur.
Abstract: Researchers are normally not familiar with the thematic orientation of all journals and conferences in their field of research. As soon as researchers want to publish their work, they face the problem of finding a suitable journal or conference where they want to submit the paper. The aim of this thesis is the development of a recommender system, which can find suitable ones in respect of a given title of a publication. The system is based on data from dblp and Semantic Scholar, which contain titles of publications as well as their abstracts and keywords. Different methods for determining the similarity and relevance of papers were investigated. These include Tf/idf, BM25 and cosine similarity in conjunction with Doc2Vec. Various techniques were analysed in order to find and rank the journals and conferences associated with the corresponding papers. In addition, methods were developed to improve the results of the recommender system, such as looking at the number of citations from journals and conferences. The methods were evaluated automatically and manually. It turned out that cosine similarity with Doc2Vec did not achieve good results in contrast to the other two methods. To improve the usability of the recommender system, a visualisation in form of a web service was implemented.
Since the development of the Semantic Web by Tim Berners-Lee, more and more information is being published on the internet as Linked Open Data. These are specially designed to be analysed by machines. All elements are given unique identifiers. The elements can then be linked to each other via relations and form ever larger networks. The result is a "Giant Global Graph" in which all things of interest can be referenced.
But while the amount of data in the SemanticWeb is constantly growing, only a few can use it. Searching for information is difficult because the user needs some prior knowledge. On the one hand, he needs to know how the data in the graph are connected and how they are labelled. On the other hand, they need knowledge about the query language SPARQL, which can be used to make queries to data sources in the Semantic Web. The visual query language developed in this work makes it easier for the user to get started and thus enables even non-experts to search the Semantic Web for information. Instead of a written query, the user graphically constructs a query from prefabricated elements. For this purpose, the Visual Query Builder programme was developed in this work, which implements such a visual query language. By specifying a schema for the respective data endpoint, the user is given the elements he can use. Thus, the user can see which elements exist at all and which attributes they have. The programme developed in this work and the underlying visual query language were then evaluated by a group of test persons. Visual Query Builder was able to prove that it enables both beginners and advanced users to successfully search a data source in the Semantic Web for desired information. In the evaluation, particular attention was paid to the usability of the application. The evaluation showed that the application achieved good results in both test procedures used and was able to prove its effectiveness.
Digital libraries, such as dblp or the German National Library (DNB), aim to bring knowledge together online and make it available via the internet. Unfortunately, incomplete data sets are part of the everyday life of a digital library. Missing information, such as titles or author names, could be added using external web APIs. The main problem here is the integration of the external data into the local database, since a common schema, which serves to describe the structure of the data, must first be found. This is the main task of schema integration, which is a subfield of information integration and data migration. The ActiveSPARQL programme designed in this thesis exploits schema integration to use data from Web APIs to answer queries on-the-fly. When a user makes a query to the application, both the data from the local database and the externalWeb-APIs should be used to answer it satisfactorily. Using both sources is called a hybrid request. The design is based on the already existing framework ANGIE. In contrast to this, no wrapper is generated to answer the query, but an extended SPARQL query. In addition, ANGIE requires that the access methods of the web APIs must be declared manually. This step can be automated by the AID4SPARQL programme. This is able to find linkage points between the local and external data and thus ensure that external information is compatible with the local data. The results from AID4SPARQL are prepared in such a way that they can be used as a configuration for communication with web APIs. In addition to ActiveSPARQL, aWeb interface was designed to enable non-experts to create and execute hybrid queries without prior knowledge. Finally, a concept for evaluating the framework is presented, which can be used to compare ANGIE and ActiveSPARQL.
- no abstract available -
Abstract: This thesis introduces improvements to current approaches of classifying scientific work by observing the semantic similarity of publications in the same citation neighborhood. Available patterns in the neighborhood structures are used to generate an initial set of features. Different text representations, similarity measures and feature modes are implemented and studied to explore new approaches of generating meaningful features that improve classification procedures. Features are evaluated in terms of their predictive power when learning a model that distinguishes between seminal and survey publications. Learning patterns from features to better distinguish between the publications will be a proxy of the effectiveness of these features in evaluating research impact. The state-of-the-art research in this area achieved a result of 68.97% prediction accuracy whereas the approaches presented in this thesis achieved a prediction accuracy of up to 86.98% and therefore beat the latest results by a large margin. Thorough evaluation of the feature sets reveals which relationships in a neighborhood structure provide information that can help improve current research evaluation metrics by identifying high impact scientific work.
Keywords: Semantometrics - Feature Engineering - Natural Language Processing
All kinds of information can be retrieved from web APIs, for example metadata of publications. However, it is not always obvious what kind of data must be sent to the web API in order to receive a meaningful response. For this problem, a programme was developed that learns the appropriate transfer parameters of web APIs with the help of a source database. For this purpose, each type of data from the source database is sent to the web API and it is checked whether the response of the API is related to the sent data. Various parameters can be used to configure how closely the responses of the web API must match the data of the source database in order to be considered meaningful. For this purpose, several metrics for calculating string similarities were used to find the matches of both data sets. Through a suitable evaluation, it could be shown that with good configuration parameters all matches are found. In the presented system, a user also has the possibility to choose different metrics to compare the similarity of two values. For example, it is possible to specify that there must be an exact match between some data, such as ISBNs or other IDs. With the right configuration parameters, as well as knowing and specifying which metric is best for which type of data, almost any data can be recognised as a match that a human would also consider a match.
- no abstract available -
- no abstract available -
- no abstract available -
- no abstract available -
- no abstract available -
- no abstract available -
- no abstract available -
- no abstract available -
- no abstract available -
Completed Master's theses
- no abstract available -
Abstract: There are many systems for the exploration of bibliographic metadata. However, retrieving and filtering information that is actually relevant often requires complicated search interfaces and long search paths, especially for complex information needs. In this work a web interface for the exploration and visualization of bibliographic metadata is proposed. The core idea is based on a Domain Specific Query Language (DSQL) called SchenQL which aims to be easy to learn and intuitive for domain experts as well as casual users for efficiently retrieving information on bibliographic metadata. This is achieved by using natural sounding keywords and specially designed functions for this particular domain. In addition, the web interface implements useful visualizations of citations and references or co-author relationships. The interface also offers keyword suggestions and an auto completion feature that allows for easily creating SchenQL queries, without having to learn all the keywords of the language beforehand. A three-part user study with 10 students and employees from the field of computer science was conducted where the effectiveness and usability of the SchenQL web interface was evaluated.
- no abstract available -
Abstract: This thesis introduces improvements to current approaches of classifying scientific work by observing the semantic similarity of publications in the same citation neighborhood. Available patterns in the neighborhood structures are used to generate an initial set of features. Different text representations, similarity measures and feature modes are implemented and studied to explore new approaches of generating meaningful features that improve classification procedures. Features are evaluated in terms of their predictive power when learning a model that distinguishes between seminal and survey publications. Learning patterns from features to better distinguish between the publications will be a proxy of the effectiveness of these features in evaluating research impact. The state-of-the-art research in this area achieved a result of 68.97% prediction accuracy whereas the approaches presented in this thesis achieved a prediction accuracy of up to 86.98% and therefore beat the latest results by a large margin. Thorough evaluation of the feature sets reveals which relationships in a neighborhood structure provide information that can help improve current research evaluation metrics by identifying high impact scientific work.
Keywords: Semantometrics - Feature Engineering - Natural Language Processing
All kinds of information can be retrieved from web APIs, for example metadata of publications. However, it is not always obvious what kind of data must be sent to the web API in order to receive a meaningful response. For this problem, a programme was developed that learns the appropriate transfer parameters of web APIs with the help of a source database. For this purpose, each type of data from the source database is sent to the web API and it is checked whether the response of the API is related to the sent data. Various parameters can be used to configure how closely the responses of the web API must match the data of the source database in order to be considered meaningful. For this purpose, several metrics for calculating string similarities were used to find the matches of both data sets. Through a suitable evaluation, it could be shown that with good configuration parameters all matches are found. In the presented system, a user also has the possibility to choose different metrics to compare the similarity of two values. For example, it is possible to specify that there must be an exact match between some data, such as ISBNs or other IDs. With the right configuration parameters, as well as knowing and specifying which metric is best for which type of data, almost any data can be recognised as a match that a human would also consider a match.
- no abstract available -