The Digital Humanities (or e-Humanities) are a comparatively young field of study at the interface between Computer Sciences and the Humanities. Yet, the focus within this field lies not only on the philologies (Linguistics and Literary Studies), but also on the Humanities and Cultural Sciences at large. The Digital Humanities have gained more and more significance in recent years due to the increasing digitization of data in all scientific fields, be it via retro-digitisation or the growing number of born digital data. Important tasks within the digital humanities are:
- Digitisation: textual data can be digitized and made machine-readable with the help of methods like Optical Character Recognition (OCR). Yet, this only works with high quality printed texts and in modern languages. A low print quality (e.g. yellowed or incomplete pages) and old, difficult to read scripts (such as manuscripts, a narrow typeface, or non-Latin letters) inhibit the recognition of letters and punctuation marks. Statistical models of likely (or unlikely) of letter sequences can no longer be applied to older language levels or texts based on different orthographical norms. Hand-written texts can barely be digitized with the help of OCR methods and need to be transcribed manually (double-keying) or newly developed Handwritten Text Recognition methods. With multimodal data such as archaeological artefacts, paintings, or older sound and video data, the digitization becomes even more complicated.
Examples of projects focusing on digitisations are the project Media-historical, methodological, and media-technological Principles of the Digitisation of Works in the Historical Art of Projection by the department of Media Studies and the Trier Center for Digital Humanities, as well as the Virtual Scriptorium St. Matthias with its follow-up project eCodicology.
- Archiving: Whilst non-digital data in the humanities such as stone tablets and papyri are comparatively easy to archive and, under ideal conditions, long-time storable, the long-time storage of digital data is still a problem to be tackled. One issue is the material fatigue of most digital media, another issue is the continuous development of both hardware and software. Today, there are only few computers with a floppy drive, and probably even fewer still running software able to read a document written 20 years ago with the then latest word processing programme. Thus, the questions of long-time archiving and long-term availability are discussed intensely. One solution proposed by the TCDH is the Virtual Data Repository.
- Representation: after the digitization of humanities data, the question arises how they are best presented and made available for both experts and laymen. This includes categories such as technical aspects (e.g. character encoding, the choice of an appropriate markup language, and a performant and reliable data base) as well as aspects of functional aesthetics. Typical examples in this category include the establishment of digital editions of literary works, the setup of a cultural-historical multimedia data base, as well as the development of digital research data bases. Projects (supported by the TCDH) dealing with these points are portal European History Online and the Heinrich Heine Portal.
- Visualisation: this field is positioned between the representation of data and their analysis. This includes e.g. the question how complex data in the humanities can be visualized in such a way that they are easily accessible for both scientists and laymen. Connected to this is the possibility to establish an additional component of exploration depending on the visual processing of said data. An example here is the Digital Peters, a visual re-processing of Arno Peters’s Synchronoptical World History. The digital medium allows for a graphical representation and interlinking of historical events and their interconnections. Another example is the project Epistolary Networks, visualizing social, spatial, temporal as well as topical networks in correspondence corpora.
- Analysis: digital data bring with them two major advantages. On the one hand, they (theoretically) are much more easily accessible than non-digital data, and, on the other hand, allow for a (partly) automatic analysis and evaluation. This enables the retrieval of information that would traditionally only be accessible with significantly more effort. Particularly methods of text mining are of significance in this regard as they enable the evaluation of said data and the recognition of trends and interdependences. TCDH examples include the project SeNeReKo, focusing on the automated semantic analysis of ancient Egyptian and ancient Indian texts in order to gather new insights into religious contacts between the two cultures. Another example is the project Asymmetrical Encounters, aiming at using text mining methods on historical newspaper corpora in order to gain an insight into how different national cultures influences each other culturally. Yet, automated analysis also includes the deciphering of unknown scripts or encrypted historical documents as well as the automated evaluation of Twitter data to e.g. analyse the spread of linguistic neologisms or topical conjunctures.