Multivariate Analysis

As in all scientific disciplines that work with empirical data, there are only few linguistic phenomena that can be described by investigating the relation between only two variables. It is far more frequent to find that there is a multitude of variables that have an effect on a dependent variable of interest, and to make things worse, these variables may also simultaneously have an effect on each other.

In order to be able to analyze and describe the often complex interplay between several stochastic variables, multivariate methods have become an almost indispensable tool for experimental linguists and corpus linguists. These methods can also be applied as automatic classifiers of linguistic data.

In this course, we will use the statistical programming language R to explore several multivariate methods. We will apply the excellent graphical capabilities of R to visualize the occasionally complicated relations within data sets consisting of several variables of interest. We will discuss in detail several types of multiple regression models, which are versatile tools when analysing and predicting continuous or categorical dependent variables. In addition to this, we will also have a look at tree-based multivariate methods, which are very often a highly interesting alternative to regression models. 

This course is primarily directed at students with a basic knowledge in R, but who have only little or no experience with multivariate statistics. Students without such basic knowledge but who are willing to learn are also welcome. In order to be able to practice the statistical analysis, students are invited to install the R environment RStudio (freely available under on their laptops.


Dr. Gero Kunter is a research associate at the Institute of English and American Studies at Düsseldorf University.