Empirical Semantics and Computational Linguistics¹

BURGHARD B. RIEGER
Technical University of Aachen

INTRODUCTION

The history of literary and linguistic text processing by computers can be traced back to the late-50s in West Germany. After a short review of the development of computational linguistics in West Germany, this paper will be devoted mainly to those present traits in the field which appear to combine promising components likely to become seminal in the future for a wide range of disciplines in the human sciences and beyond.

Most prominent in this respect are recent achievements which have occurred somewhat simultaneously within the intersection of cognitive psychology, artificial intelligence and empirical linguistics. In these fields scholars are engaged in researching different aspects of the problems and processes of natural language understanding. Although these differ by discipline, their aspects apparently share some common interests as well as a commonality of approach. Both might be keyworded as the procedural notion of knowledge, memory, and meaning conveyed by natural language processing through a cognitive activity of human or artificial systems. Thus, the representation of knowledge, the understanding of meanings, and the analysis of texts, have become focal areas of mutual interest whose computational (preferably dynamic) modeling obviously serves to unify descriptive, explicative, simulative purposes at stake.

With regard both to the prospects of new technological achievements and to the potential benefits or detriments that these achievements could imply, cognitive theory and cognitive science will consequently play an increasingly important role in the information society of the future. From the linguistic viewpoint natural language texts, whether stored electronically or written conventionally, will in the foreseeable future still provide the major source of scientifically, historically, and socially relevant information. Due to the new technologies, the amount of such textual information continues to grow beyond manageable quantity. Available data, therefore, no longer serve to fill an assumed knowledge gap, solving the problem of lack of information in a given instance, but will instead create a new problem which arises from the abundance of information that confronts the user.

Therefore, there is a pressing need to employ computers more effectively than hitherto for the analysis of such natural language materials to devise a reliable selection of relevant information, given a certain specification of aspect under which a subject domain of knowledge and information is to be searched. Advances have only recently been made [ RIEGER 1984e] in view of an artificial system capable of understanding the meanings implied in natural language texts for inference purposes in restricted subject domains and in view of an algorithm for generating automatically even basic but somewhat formal representation of the knowledge from input texts, that would allow a human user to consult that knowledge base to avoid unnecessary reading of irrelevant texts. It is true that the results obtained from some existing systems or simulative models appear to be promising, and that significant effects can already be seen as produced by these advances in some related areas as well as in rather remote branches of science and society, but our understanding of the bunch of complex intellectual activities subsumed under the notion of cognition is still very limited.

COMPUTATIONAL LINGUISTICS

During the early-50s the availability of the newly developed computer gave rise to the first approach to the analysis of natural languages and texts. At a scale that nobody had previously imagined, the calculating capacities of these new machines seemed to allow for the first time for the paradigm of the natural sciences to be extended to a new realm of phenomena which apparently had been formerly out of reach: to describe numerically and to explain mathematically the regularities and laws that governed languages and their structuring entities [ FUCKS 1952, 1955, 1968; HERDAN 1956, 1960]. Advanced by mathematicians, statisticians, and physicists rather than linguists, quantitative analyses of language material, however, did not become part of computer assisted linguistics until the comparably fast and massive processing of non-numerical data by machines made another linguistic goal of automatic information processing appear feasible: that of language translation.

Among the different approaches, at least two complementary trends could be noticed within the new activities of linguistic computing: more application-oriented interest and primarily theoretical interest. The foundation of heavily funded special research groups (SFB 99; SFB 100) by the German Research Association (DFG) at the universities of Heidelberg, Konstanz and Saarbrücken, as well as the three consecutive programs for the reinforcement of research in the informatory and documentary sciences (IuD) forwarded by the Federal Ministry of Science and Research (BMFT) since 1969, reflected these trends. But apart from these special areas of concentrated and official funding, quite a number of comparable activities went on at various other places. Working on a smaller scale both in terms of financial resources and scientific claims, these smaller groups nevertheless contributed to an even greater degree to the present status of computational linguistics in Germany.

At the Universities and Technical Universities of Aachen, Berlin, Bielefeld, Bochum, Bonn, Erlangen, Göttingen, Hamburg, Karlsruhe, Köln, Mannheim, München, Regensburg and Stuttgart, research began on a wide range of problems. This was gradually evolved from the empirical analysis of language and discourse, to the development of algorithms for the morpho-phonetic, syntactic, and semantic description, to the more grammar-theoretical development and testing of syntactic or semantic parsing strategies, to the logico-semantic representation of knowledge and information, to the simulative goals of natural language understanding and inference in dialogue systems of artificial intelligence, (and sometimes even with a competitive edge towards industrial systems' developments for commercial application and use). In numerous state-of-the-art reports [ UNGEHEUER 1971; BATORI 1977; STRASZNER 1977; FAUSER and RöSNER 1979; LENDERS 1980; FAUSER and RATHKE 1981; WAHLSTER 1981; KRALLMANN 1982; KRAUSE 1982; HAUENSCHILD and PAUSE 1983] German research has continuously been compared with developments abroad and accompanied by critical reviews which focussed on domestic needs and advances.

Meanwhile, computational linguistics is taught on an academic level at more than 10 German universities. Their different curricula concentrate on varying aspects and directions, depending on the affiliated disciplines (e.g., information science with computer science, psychology with cognitive science, artificial intelligence with software science, and semiotics with linguistics and phonetics) [ LUTZ-HENSEL 1981].

These advances were certainly achieved by the unprecedented extension of computational linguistics: the issue of a single but encompassing objective like machine translation had not only been intimately associated with computational linguistics but has even proved since to be in some respects constituent of very large parts of it [ HERZOG 1981]. By now, German experts widely agree that even after the ALPAC-Report [ALPAC 1966] and its devastating consequences in the United States, the impact made by at first rather promising, later controversial results of both data- and theory-oriented researches in machine translation has decisively influenced the field's development. Thus, quantitative and algebraic linguistics, computer-assisted literary and linguistic studies, language data processing, knowledge based automatic inference, machine simulation of natural language understanding, dialogue systems of man-machine communication, expert systems in artificial intelligence, etc. owe much, and in some cases even their very existence, to the seminal controversies revolving around machine translation [ BAR-HILLEL 1965] to become what is now recognized world-wide as the specific discipline of computational linguistics. Japanese and European work seems to have taken the American scientific community by surprise [ KAY 1984] in giving a new start to machine translation. Its recent renaissance was to be witnessed at the last international gathering of computational linguists, in Stanford (COLING 84), where Japanese and European workers reported on new and very large scale projects underway (ATHENE, EUROTRA), the somewhat moderate goals of which, in the light of recent achievements, might give rise to more substantial hopes for future success than expectations of past disillusionment.

LANGUAGE COMPARISON

Here I will examine some of the experimental results achieved in lexico-statistics or, rather, empirical word-semantics, based on the processing of special linguistic data. I am referring to the material collected for the Language Comparison Project on contemporary East and West German newspapers, supported by the German Research Foundation (DFG) in Bonn, West Germany.

This project was originally to investigate language-variations and language-changes that might have developed since Germany was divided into countries nearly forty years ago. The investigation was intended to focus on whether a comparative dictionary of diverging word-meanings in East and West German language usage could be compiled, and, if so how such a dictionary would have to look like in order to be useful for specialized linguists as well as journalists and politicians.

However, from the very start of the project, in 1976, these lexico-logical issues soon became associated with diverging political and ideological expectations, which both tended to hamper and promote the project's financial support. Based on preceding works since 1964 and initiated during the early-70s [ HELLMANN 1984], the project was interrupted several times and finally dropped before the full-scale processing of the data had been finished. As a consultant affiliated with the project during its last two years, I have been concerned with the semantic analysis and description of word meanings by way of quantitative approaches to the language material available. This material will briefly be examined below, as it has been worked on apart from the original comparative objective of the project, within more recent research projects at the Aachen MESY-Group.

As is well known, Germany is divided into two separate states, the Federal Republic of Germany (BRD), in the West, and the German Democratic Republic (DDR), in the East. Communication among people living in the two Germanies has been progressively reduced to a minimum since 1945. The development of the respective, publicly used languages in the media was more or less regionalized, and it was to be expected that this tendency would consequently have enforced and stabilized language variations, according to the increasingly different living-conditions in the East and the West. As there are reasons to assume that these variations effect the semantics of the language material more than its syntax, word meaning analysis appeared to be the most important and expectedly most revealing stratum of linguistically based comparative research in the (possibly conceptual and semantic) changes or stabilities.

When the Language Comparison Project was initiated, the Ost-Politik had not gained momentum. So, no collaboration from Eastern officials could be expected. Thus the only accessible public language from East Germany were recordings of radio- and TV-broadcasts and newspaper-texts, which could be made available for research purposes in the West. For this reason two widely circulated newspapers, NEUES DEUTSCHLAND (ND), from the East, and DIE WELT (DW), from the West, both of which are representative examples of officially used language, were selected as data sources from which to analyze samples. By about 1972 a core-corpus of three samples (1959-1964-1969) was available in machine-readable form. These comprised approximately 2 million tokens of about 60 percent of texts from DW and about 40 percent from ND. From the 1964 sample a subset of texts (175 DW-articles with 7,000 tokens and 57 ND-texts with 2,000 tokens, taken from the front and second pages of respective newspapers) were then manually categorized according to a catalog of the most frequent (greater or equal than 5) 365 types of lexical entries used in the texts concerned. These texts were then automatically rewritten, suppressing all functional words to form strings of them in the order of their occurrences in the original texts, providing the data-tapes for the following analysis.

EMPIRICAL SEMANTICS

Current semantic theories of word meanings and world knowledge representation regard memory in human or artificial systems of cognition and understanding as a highly complex structure of interrelated concepts. But the cognitive principles underlying these structures are still poorly understood. As the problem of their mutual and complex relatedness has been increasingly recognized, different methods and formats have been proposed with different success to model these interdependencies. However, the work of psychologists, AI (Artificial Intelligence) researchers, and linguists active in that field still appears to be determined by their respective discipline's general line of approach, rather than from the consequences of the intersections of these approaches.

In linguistic semantics, cognitive psychology, and knowledge representation, most of the necessary data concerning lexical, semantic and external world information are still provided introspectively. Researchers are exploring (or make testpersons explore) their own linguistic or cognitive capacities and memory structures, to depict their findings (or to let hypotheses about them be tested) in various representational formats (lists, arrays, trees, nets, active networks, and the like). It is widely accepted that model structures resulting from these analyses do have a more or less ad hoc character, and tend to be confined to their limited theoretical or operational performances within a specified subject domain or implemented system. Thus, these approaches, by definition, can map only what is already known to analysts and not what might be conveyed in texts unknown to them. Being basically interpretative and in want of operational control, such knowledge representations will not only be restricted quite naturally to undisputed informational structures, which consequently can be mapped in accepted and well established (concept-hierarchical, logically deductive) formats, but they will also lack the flexibility and dynamics of more constructive model structures which are needed for automatic meaning analysis and representation from input texts to allow for a component to build up or modify a system's own knowledge, however shallow and vague that may appear compared to human understanding.

Other than these more orthodox lines of introspective data acquisition in meaning and knowledge representation research, the present approach has been based on the algorithmic analysis of discourse that real speakers or writers produce in actual situations of performed or intended communication on a certain subject domain. The approach makes essential use of procedural means to map fuzzy word meanings and their connotative interrelations in the format of conceptual stereotypes. Their varying dependencies constitute dynamic dispositions that make only those concepts accessible which may, within different contexts differently, be considered relevant under a specified perspective or aspect². Thus, under the notion of lexical relevance and semantic disposition, a new meaning relation may operationally be defined between elements in a conceptual representation system, which in itself may be reconstructed empirically from natural language discourse. Such dispositional dependency structures would seem to be an operational prerequisite and a promising candidate for the simulation of contents-driven (analogically-associative) instead of formal (logically-deductive) inferences in semantic processing.

In view of an introductory illustration rather than a detailed and qualifying discussion, some of the standard concept and word-meaning representational formats in memory models and knowledge systems will be compared, in order to motivate our rather strict departure from them in developing and using some statistical means for the analysis of texts and for the representation of the data obtained, which will briefly be introduced as the semantic space model. Starting from the notion of priming and spreading activation in memory as a cognitive model for comprehension processes, we will deal with our procedural method of representing semantic dispositions by way of inducing a relation of lexical relevance among labeled concept representations in semantic space. In conclusion, two or three problem areas connected with word meaning and concept processing will be touched on which might be tackled anew and perhaps be brought to a more adequate though still tentative solution under an empirically founded approach in procedural semantics.

PROCEDURAL MODELS

Lexical structures in linguistic semantics, memory models in cognitive psychology, and semantic networks in AI-research use in common some structure of directed graphs as basic format of their models. Probably such directed graph, as shown in Fig. 1, is one of the most familiar forms of concept representation which experimental psychologists have set up and tested in the course of their developments of memory models [e.g., COLLINS and QUILLIAN 1969; KLIX 1976].

Figure 1. Concept Representation by Directed Graph

Here we have a hierarchy of labeled concept nodes with predicates and properties linked to them, which are inherited by directly descendent nodes. The hypotheses formulated and tested in experiments predict that test-persons will take more time to identify and decide given propositions with an increasing number of node- and level-transitions to be processed in the course of interpretation. Evaluating a sentence like "A canary can sing" will take less time than deciding whether the sentence "A robin can breathe" is true or not. Thus, reaction-time serves as an indicator showing whether the proposed model structure is correct or in need of modification.

In early artificial intelligence research a different type of knowledge representation was developed for question-answering systems. A fragment of the most common schema of the semantic network type, e.g. [ WINOGRAD 1975], is shown in Fig. 2. Here again we have labeled concept nodes linked to one another by pointers representing labeled relations which form a network instead of a tree structure. This enables the system to answer correctly questions like "Is Susy a cat?" by identifying the SUSY-node, its ISA-relation pointer and the CAT-node. Moreover, the pointer structure allows for the processing of paths laid through the network, initiated by questions like "Susy, cat?" which will prompt the answer "Susy is a cat. Cat eats fish. Cat is an animal. Fish is an animal."

Figure 2. Concept Representation by Semantic Network

A schematic representation of concept relatedness envisaged by cognitive theorists working along more procedural lines of memory models [ COLLINS and LOFTUS 1975] is shown in Fig. 3. Their distance-relational conception lends itself readily to the notion of stereotype representation for concepts that do not have intersubjectively identifiable sharp boundaries [ ROSCH 1975]. Instead of binary decision of category, stereotypical concepts or prototypes are determined by way of their adjacency to other prototypes. Taken as a memory model, stimulation of a concept will initiate spreading activation to prime the nearer concepts more intensely than those farther away in the network structure, thus determining a scope of concepts related by their primed semantic affinity. In the example provided, the stimulation of the concept-node MANAGEMENT will activate that of BUSINESS first, then INDUSTRY and ORGANIZATION, with about the same intensities, then ADMINISTRATION and so on, with the intensities decreasing as a function of the activated nodes' distances.

Figure 3. Distance-Relational Representation

These three schemata of model structures, although obviously concerned with the simulation of symbol understanding processes, are designed to deal primarily with static aspects of meaning and knowledge. Thus in interpreting input symbols or strings, predefined or stored meaning relations and constructions can be identified and their representations be retrieved. Without respective grounding made explicit and represented in that structure, however, possibly distorted or modified exemplification of such relations or relevant supplementary semantic information can hardly be recognized nor be provided within such representational systems. As the necessary data are not taken from natural language discourse in communicative environments but elicited in experimental settings by either exploring one's own or the test-persons' linguistically relevant cognitive or semantic capacities, usage similarities among contextual variations of identical items can be ascertained only with difficulty. This is rather unsatisfactory from the viewpoint of a linguist, who thinks that his discipline is an empirical one and, hence, that descriptive semantics ought to be based on linguistic data produced by a real speaker or listener in the actual act of communicative performance, in order to let new meaning representations (or fragments of them) replace (or improve) older ones to change or update a static memory structure.

FUZZY LEXICAL STRUCTURES

It has been shown elsewhere [ RIEGER 1980]³ that in a sufficiently large sample of pragmatically homogeneous texts, called corpus, only a restricted vocabulary, i.e., a limited number of lexical items, will be used by the interlocutors, however comprehensive their general personal vocabularies might be. Consequently, the lexical items, employed to convey information on a certain subject domain under consideration in the discourse concerned, will be distributed according to their conventional communicative properties, constituting semantic regularities which may be detected empirically from the texts.

The empirical analysis of discourse and the formal representation of vague word meanings in natural language texts are based on the Wittgensteinian notion of language games and his assumption that a great number of texts analyzed for the terms' usage regularities will reveal essential parts of the concepts and hence the meanings conveyed.

A meaning of a word is a kind of employment of it. For it is what we learn when the word is incorporated into our language. That is why there exists a correspondence between the concept rule and meaning... Compare the meaning of a word with the function of an official. And different meanings with different functions. When language games change, then there is a change in concepts, and with the concepts the meanings of words change. [ WITTGENSTEIN 1969; 10e].

The statistics being used so far for the systematic analysis not of propositional strings but of their elements, namely words in natural language texts, is basically descriptive. Developed from and centered around a correlational measure to specify intensities of co-occurrence of lexical items in natural language discourse, these analysis algorithms allow for the systematic modeling of a fragment of the lexical structure constituted of the vocabulary employed in the texts as part of the concomitantly conveyed world knowledge.

A correlation coefficient is appropriately modified to be used as a mapping function. It serves to compute the relational interdependence of any two lexical items from their textual frequencies. Those items which co-occur frequently in a number of texts may be positively correlated and hence called affined, whereas those items each of which occurs without any other items may be negatively correlated, and hence called repugnant. Degrees of word-repugnancy and word-affinity, indicated by numerical values ranging from -1 to +1, can thus be determined without consulting an investigator's or his test-persons' word or world knowledge (semantic competence). Instead this can be based solely on the usage regularities of lexical items observed in a corpus of pragmatically homogeneous texts, spoken or written by real speakers or listeners in actual or intended communication (semantic performance).

Let T be such a corpus that consists of t texts belonging to a specific language game, i.e., satisfying the condition of pragmatic homogeneity. For the sake of illustrating the analysis algorithm's performance, let us consider a simplified case where the vocabulary V used in the texts shall be limited to one of only three wordtypes, namely x_i, x_j and x_k, each of which has a certain overall token-frequency. Then the modified correlation coefficient a will measure the regularities of usage by the affinities and repugnancies that may hold between any lexical item and all the others in the discourse analyzed. That will yield for any item an n-tuple of correlation-values a, where n is the total number of items. In the case of lexical item x_i with n=3, its correlation-values form a triple of a_ii, a_ij, a_ik. These values are now interpreted as coordinates that will allocate each lexical item x_i, x_j and x_k to point y_i, y_j and y_k respectively in a three-dimensional space spanned by the three axes i, j, and k, as illustrated in Fig. 4. As the positions of these points now obviously depend on the regularities of the lexical items used within the texts, these y-points are called corpus-points of i, j and k in the a- or corpus-space.

Figure 4. Corpus Space and Corpus Point

Consequently, the less the usages of any two items differ, the shorter becomes the distance of corresponding two y-points in this space. These differences may be calculated by a distance measure d, between any two y-points, as illustrated by dotted lines in Fig. 4. The distance-values are real, non-negative numbers representing a new characteristic. For any item y_i, y_j, and y_k, an n-tuple of d-values, i.e., for y_i the triple d_ii, d_ij, d_ik, is obtained, which may be interpreted as new coordinates. These will again allocate each item x_i, x_j and x_k to new points z(d_i), z(d_j) and z(d_k) in a new n-dimensional space, called semantic space, as illustrated in Fig. 5.

Figure 5. Semantic Space

The positions of such points in the semantic space will clearly depend on all the differences (d- or distance-values) in all the regularities of usage (a- or correlation-values) which any lexical item shows in the texts. Thus, each lexical item is mapped onto a fuzzy subset of the vocabulary according to the numerically specified regularities with which these items have been used in the discourse analyzed. Measuring the differences between usage regularities of lexical items allows the above interpretation and consecutive mappings of items onto theoretical constructs. These new abstract entities represent what meanings may be composed of, that is to say, a number of operationally defined elements whose varying contributions are to be derived directly from the different usage regularities that the corresponding lexical items produce in the texts analyzed. As being theoretical constructs, these entities constitute meaning from a more holistic approach to lexical system description. Translating the Wittgensteinian notion of meaning into a mathematically operational form of empirical feasibility, these new meaning-components can be procedurally characterized as a function of all the differences of all regularities with which any one of the vocabulary's items is used compared to any other item in the same corpus of discourse.

The resulting system, sets of fuzzy subsets of the vocabulary, represents a structured lexicon. It is a relational data structure which may be interpreted topologically as a hyperspace with a natural metric, called semantic space. Its linguistically labeled elements represent meaning points, and their mutual distances represent meaning differences. The position of a meaning point may be described by its semantic environment. This is determined by those other points located within a given diameter from the meaning point concerned in the semantic hyperspace.

Fig. 6 shows the topological environment Eá GESCHäFTñ, i.e., those points situated within the hypersphere of a certain diameter from the meaning point GESCHäFT/business, computed from the corpus of German newspaper texts, comprising about 7,000 tokens of 365 types in 175 texts from the 1964 editions of the daily West German DIE WELT.

GESCHÄFT/business	0.000
WERB/advertisement	2.837	KENNTNIS/knowledge	3.028
BITTE/request	3.284	TECHNIK/technic	3.527
PERSON/person	3.930	BUCH/book	4.232
FÄHIG/capable	4.471	ORGANISAT/organization	4.526
INFORMAT/information	4.568	ERFAHR/experience	4.708
ALLGEMEIN/general	4.816	BRITAIN/Britain	4.838
KONTAKT/contact	4.902	UNTERRICHT/instruction	4.919
ANGEBOT/offer	5.047	AUSGABE/expense	5.064
RAUM/space	5.098	DIPLOM/diploma	5.155
VERBAND/association	5.183	COMPUTER/computer	5.212
STADT/city	5.216	ELEKTRON/electron	5.311
LEHR/teach	5.321	LEIT/lead	5.404
WEG/way	5.464	STELLE/position	5.498
WIRTSCHAFT/economy	5.503	MODE/fashion	5.537
JOURNAL/journal	5.621	BILDUNG/education	5.657
GEBIET/area	5.697	SUCH/search	5.733
SYSTEM/system	5.752	EINSATZ/activity	5.813
ARBEIT/labor	5.834	AUFTRAG/order	5.872
WUNSCH/wish	5.880	PROGRAMM/program	5.880
AUSLAND/abroad	5.881	INDUSTRIE/industry	5.909

Figure 6. Meaning Differences from GESCHÄFT in DIE WELT

Having seen that topological environments of that sort do, in fact, assemble meaning points of a certain semantic affinity solely by the text analyzing algorithms and without any competent interference of language user, a number of questions arose:

point density

structuredness

internal relation,

semantic relatedness

Further investigation revealed that there are regions of higher point density in the semantic space, forming clouds and clusters. These were detected by multivariate and cluster analysis methods, which showed, however, that those items related both paradigmatically and syntagmatically formed what can be called connotative clouds, rather than what is known as semantic fields. Although it seemed difficult to specify internal relations in terms of any logically deductive or concept hierarchical system, their elements' positions showed a high degree of stable structures, which suggested a regular form of contents-dependent associative connectedness [ RIEGER 1981b; 1982; 1983].

SEMANTIC SPACE OPERATIONS

Following a more semiotic understanding of meaning constitution, the present semantic space model may be considered the core structure of a word meaning or world knowledge representation system, which separates the format of a basic stereotypical meaning representation from its latent organization of interdependent relations. Whereas the former is a rather static and topologically structured and associative memory, representing the data produced by text analysis algorithms, the latter can be characterized as a collection of dynamic and flexible structuring processes to reorganize these data according to various principles [ RIEGER 1981b]. Other than declarative knowledge that can be represented in predefined semantic network structures, meaning relations of lexical relevance and semantic dispositions, which are heavily dependent on context and domain of knowledge concerned, will be more adequately defined procedurally, i.e., by generative algorithms that induce them on changing data only and whenever necessary. This is achieved by a recursive procedure that produces hierarchies of meaning points, structured under given aspects according to and depending on their meanings' relevancy [ RIEGER 1984b].

Taking up the heuristics provided by Spreading Activation Theory in semantic memory, cognitive structures, and concept representation advanced by [ QUILLIAN 1968; OLSON 1970; COLLINS and LOFTUS 1975], the notion of spreading activation can be employed not only to denote activation of related concepts in the priming process studied in subsequent publications, e.g., [ LORCH 1982; FLORES D'ARCAIS and JARVELLA 1983], but also, generically somewhat prior to that, to signify the very procedure which induces these relations between concepts. Originally developed as a procedural model to cope with observed latencies of activated concepts in comprehension processes, priming and spreading activation is based on network-type models or world-knowledge structures, as illustrated briefly before. Essentially defined by nodes, representing concepts, meanings or objects, and pointers which relate them conceptually, semantically, or logically to one another, these formats have a considerable advantage over the semantic space structure outlined above. One of the problems of distance-like data structures in semantic processing is that distance is a symmetric relation, to which we can not apply well-known search strategies for retrieval, matching, and inference, because they are based on some non-symmetric relations realized by pointer structures in well-known representations for word meaning or world knowledge.

In order to make such procedures operate on the semantic space data, its structure has to be transformed into some hierarchical organization of its elements. For this purpose the semantic space model has to be reinterpreted as a sort of conceptual raw data and associative base structure. What appeared disadvantageous at first now turns out to be an advantage over more traditional formats of representation. Other than these approaches which presuppose the structural format of the semantic memory models that are to be tested in word recall or concept recognition experiments, the semantic space provides some data necessary for the procedural definition of not static but dynamic model structures that allow variable stereotypes instead of fixed categorical concept representations. Thus, the concept nodes, as abstract mappings of meanings of lexical items, are not just linked to one another according to the way cognitive scientists supposedly know to organize conceptual information in memory, but should be based on this varying structure of dynamically organized stereotype concepts. Defined as procedures that operate on the semantic space data, this is equivalent to a dynamic restructuring of meaning points and, depending on the controlling parameters, the generation of paths between them along which activation might spread whenever a meaning point is stimulated in case of priming.

Unlike the ready-set and fixed relations among nodes, an algorithm has been devised which operates on the semantic space data structure to induce dependencies between its elements, i.e., among subsets of the meaning points. Starting from a meaning point, the recursive procedure detects fragments of the semantic space according to the semantic similarities to other points, i.e., the distance relations which we named semantic relevance. Stop conditions may be deliberately formulated, either qualitatively, by specifying a target point, or quantitatively, by specifying the number of points to be processed.

Given one meaning point as a start, the algorithm will first list all its neighbor points by increasing distances, second provide similar lists for each of these neighbors, and third prime the starting point as root node of the search tree. Then the algorithm's generic procedure will take the first entry from the first list, determine its nearest neighbor among those points already primed from the appropriate second list, in order to identify it as the ancestor (mother node) to which the new descendant (daughter node) is linked, whose label is then deleted from the first list. Repeated successively for each of the meaning points listed, and in turn primed in accordance with this procedure, the algorithm will select a particular fragment of the relational structure latently inherent in the semantic space under a certain perspective, i.e., the aspect or initially primed meaning point that the algorithm started from.

Carrying on this process and consuming all the labeled points in the space, unless stopped under conditions of given target points, the number of points to be processed, or threshold of maximal distance, the algorithm transforms prevailing similarities of meanings into a binary, non-symmetric, and transitive relation between them. This relation allows hierarchical reorganization of meaning points into a n-ary DDS-tree with the primed point as its root [ RIEGER 1984a]. If we introduce a numerical measure, weighted by a function of a node's distance values and level of its tree position, it may either express a concept's dependencies given by the root's descendants in that tree, or, inversely, evaluate the nature of their criteria for that concept specified and determined by that tree's root.

Without introducing the algorithms formally, some of their operative characteristics can well be illustrated by a few simplified examples.

Beginning with the schema of a distance-like two-dimensional data structure with 11 points labeled a to k, as shown in Fig. 7, the stimulation of three different start points, a, b and c results in the dependency structures shown in Fig. 8, where the working process of the least distance algorithm is illustrated as distance detection (first row), as a step-list representing the selecting process of points activated (second row), then as their n-ary tree representations of points' relation as to the priming (third row), and finally as their transformations to binary tree structures (fourth row).

Figure 7. Simple Example of Two-Dimensional Semantic Space

Figure 8. Process of Generating Dispositional Dependency Structure (DDS)

It is apparent that stimulation of other points within the same configuration will result in similar but nevertheless different trees, depending on the aspect under which the structure is accessed, i.e., the point initially stimulated by the algorithm.

Applied to the semantic space data of 365 defined meaning points calculated from the newspaper corpora of the 1964 editions of both the West German DIE WELT (DW) and the East German NEUES DEUTSCHLAND (ND), the procedure generates the Dispositional Dependency Structures (DDS) of DEUTSCH/German and EUROP/Europe, as shown in Figs. 9, 10, 11, and 12.

Different stop conditions are used for the generation of DDS-graphs, for example, target node NENN/name/call is used for ND: DDSá DEUTSCH/Germanñ, target node WELT/world for ND: DDSá EUROP/Europeñ, quantitative stop condition of the total number of nodes to be processed (=50) for DW: DDSá DEUTSCH/Germanñ and DDSá EUROP/Europeñ. In the DW: DDSá DEUTSCHñ, given in Fig. 9, we find two descendants, ERKLäR/declare and MINISTER/minister, on level 1, which characterize the connotative alternatives to follow as descendants on deeper levels of the dependency structure. In the DW: DDSá EUROP/Europeñ, given in Fig. 10, there are five alternatives, i.e., TEILNAHME/participation, POLITIK/politics, ERKLäR/declare, VERHäLTNIS/relation, and CHEF/head, on the first level, that diversify even further downwards with one deepest branch from TEILNAHME to MINISTER, on the 8th level. In the ND: DDSá DEUTSCH/Germanñ, given in Fig. 11, there are two descendent connotative alternatives, FRIED/peace and TREFFEN/meeting, on level 1, each of which dominates two main branches of descendants on level 2. The ND: DDSá EUROP/Europeñ, given in Fig. 12, shows two descendent alternatives, SPALT/split and IMPERIALIST/imperialist, on the first level, both dominating the main connotative dependencies which unfold from the fourth level downwards.

Figure 9. DDS of DEUTSCH in DIE WELT (DW)

Figure 10. DDS of EUROP in DIE WELT (DW)

Figure 11. DDS of DEUTSCH in NEUES DEUTSCHLAND (ND)

Figure 12. DDS of EUROP in NEUES DEUTSCHLAND (ND)

Attention is drawn to the dependencies of direct descendants in Figs. 9 to 12; e.g., DOKTRIN/doctrine ® MöGLICH/possible ® WIRKLICH/real in ND. This dependency is found in exactly the same order in both DDSá DEUTSCHñ and DDSá EUROPñ, but at slightly different positions; in the former from the second level of the tree, whereas from the seventh level in the latter. Similar parallelisms of direct dependencies may be also found in the DW trees.

To calculate such differences, a numerical measure of criteriality, ranging from 1.0 to 0, has been defined recursively to express the connotative load that any descendent node may contribute to the semantic dispositions concerned as a function of the distances involved and the aspect, i.e., the root node from which the generation of a DDS-tree is started [ RIEGER 1984a]. Thus each node in Figs. 9 to 12 has two numerical values; its criteriality and its distance in terms of the semantic space's metric.

For a wide range of purposes in processing DDS-trees different criterialities of nodes can be used to estimate which paths are more likely being taken than others, under priming of certain meaning points.

SEMANTIC DISPOSITIONS

Generation of DDS-trees is not only a prerequisite to source-oriented, contents-driven search and retrieval procedures, which may thus be performed effectively on the semantic space structure, but it also permits to detect, by way of its particular procedural definition, varying dependencies of identical concepts under different aspects that might change dynamically.

Let the meaning point DEUTSCH/German be stimulated with EUROP/Europe given as the target point in the semantic space structures of both DW and ND, then, in both cases, the DDSá DEUTSCHñ can be generated as illustrated above (Figs. 9 and 11), providing a variety of semantic dispositions inherent in the semantic space of DW and ND under the aspect of DEUTSCH/German. The tree generation process, however, will be terminated when the given target is encountered and incorporated into the tree as its last node. Tracing back its ancestor nodes to the root node activates its dependency path constituted of those intermediate nodes which determine the associative transitions of any target node under any specifiable aspect. Looking up EUROP/Europe as the target node under the aspect of DEUTSCH/German, and, vice versa, DEUTSCH as the target under the aspect of EUROP, will prove, though unsurprisingly, to be approximately the same dependency paths in inverted order, ANTI and SPALT under the aspect of DEUTSCH being replaced with IMPERIALIST under the aspect of EUROP, as shown in Figs. 11 and 12, and separately in Fig. 13.

Figure 13. Dependency Path of EUROP and DEUTSCH in ND

Comparing nodes with identical labels under the same aspect in both semantic space structures reveals essential connotative differences between the East German ND and the West German DW. The dependency paths consist of EUROP ® ERKLäR ® DEUTSCH for DW whereas for ND it is EUROP ® SPALT/split ® ANTI/anti ® NATION/nation ® GANZ/unity/entire® HAND/act ® FRIED/peace ® DEUTSCH.

Source-oriented search and retrieval processes, operating as described on procedurally defined dynamic structures like the dispositional dependencies, may also be employed as a relational hierarchy for the simulation of an analogical, contents-driven inference, as opposed to logical deduction. The basic idea behind it was to define some operation that would simultaneously work its way through two or more DDS-trees by parallel processing. For this purpose the algorithm is started from the two or more meaning points considered to represent the premises, e.g., DEUTSCH/German and EUROP/Europe. After their DDS-trees are generated the actual inference procedure begins to work its way through every tree, tagging each encountered node according to one of the three tagging modes of Breadth-First, Depth-First or Highest-Criteriality. When in either tree the search procedure encounters the node that has already been tagged by another priming process, it stops to activate the dependency paths from this concluding common node; in our example this is ERKLäR/declare in the DW semantic space structure for all three modes of tagging (Fig. 14), and in the ND semantic space it is FüHR/lead in the BF-mode, ANTI/anti in the DF-mode, and GANZ/unity/entire in the HC-mode (Fig. 15).

Figure 14. Result of Inference and Concluding Node in DW Semantic Space

Figure 15. Result of Inference and Concluding Node in ND Semantic Space

CONCLUSION

Thus it appears that the DDS-procedure provides a flexible, source-oriented, contents-driven method for the multi-perspective induction of a relevance relation among stereotypically represented concepts linguistically conveyed by natural language discourse on specified subject domains.

semantic space,

semantic dispositions

References

[1]: ALPAC, 1966: Languages and Machines, Computers in Translation and Linguistics. ALPAC-Report (Automatic Language Processing Advisory Committee Report), Washington, D.C.
[2]: BAR-HILLEL, Y., 1965: The Outlook of Computational Semantics. Proc. of the Conference on Computer Related Semantic Analysis, Detroit.
[3]: BATORI, I.S., 1977: Linguistische Datenverarbeitung, Computergestützte Sprachforschung, und EDV für Philologen. Sprache und Datenverarbeitung, No. 1 (1977), pp. 2-11.
[4]: BATORI, I./KRAUSE, J./LUTZ, H.D. (eds.), 1982: Linguistische Datenverarbeitung: Versuch einer Standortbestimmung. Tübingen: Niemeyer.
[5]: COLLINS, A.M./QUILLIAN, M.R., 1969: Retrieval Time from Semantic Memory. Journal of Verbal Learning and Verbal Behavior, No. 8 (1969), pp. 240-247.
[6]: COLLINS, A.M./LOFTUS, E.F., 1975: A Spreading Activation Theory of Semantic Processing. Psychological Review, No. 6 (1975), pp. 407-428.
[7]: FAUSER, A./RöSNER, D., 1979: Computational Linguistics in Western Germany: A Selected Bibliography. Institut für Informatik, Universität Stuttgart.
[8]: FAUSER, A./RATHKE, C., 1981: Studie zum Stand der Forschung über Natürlich-sprachliche Frage/Antwort-Systeme. Universität Stuttgart (BMFT-ID-81-006).
[9]: FLORES, D./ARCAIS, G.B./JARVELLA, C. (eds.), 1983: The Progress of Language Understanding. New York/Sydney/Toronto: Wiley.
[10]: FUCKS, W., 1952: On the Mathematical Analysis of Style. Biometrika, No. 39, pp. 122-129.
[11]: FUCKS, W., 1955: Mathematische Analyse von Sprachelementen, Sprachstil und Sprachen. Arbeitsgemeinschaft für Forschung des Landes Nordrhein-Westfalen, No. 34a, Köln/Opladen: Westdeutscher Verlag.
[12]: FUCKS, W., 1968 Nach Allen Regeln der Kunst. Stuttgart: Deutsche Verlags-Anstalt.
[13]: HAUENSCHILD, C./PAUSE, P.E. (eds.), 1983: Linguistik und Künstliche Intelligenz: Aktivitäten in der BRD. Linguistische Berichte, No. 88.
[14]: HELLMANN, M.W. (ed.), 1984: Ost-West-Wortschatzvergleiche: Maschinell Gestützte Untersuchungen zum Vokabular von Zeitungstexten aus der BRD und der DDR. Forschungsberichte des Instituts für Deutsche Sprache, IDS-Mannheim, No. 48, Tübingen: Gunter Narr.
[15]: HERDAN, G., 1956: Language as Choice and Chance. Groningen: Nordhoff.
[16]: HERDAN, G., 1960: Type, Token Mathematics: A Textbook of Mathematical Linguistics. Den Haag: Mouton.
[17]: HERZOG. R. (ed.), 1981: Computer in der Übersetzungswissenschaft. Frankfurt: P. Lang.
[18]: KAY, M., 1984: FUG: A Formalism of Machine Translation. Proc. of the 10th International Conference on Computational Linguistics (COLING 84), Stanford, pp. 75-78.
[19]: KITAGAWA, T., 1980: The Role of Computerized Information Systems in Knowledge Societies. In R. Pfab, F.V. Stachelsky and J. Tonnemacher (eds.), Technische Kommunikation und Gesellschaftlicher Wandel, Berlin: Spiess, pp. 24-73.
[20]: KLIX, F., 1976: Strukturelle und Funktionelle Komponenten des Menschlichen Gedächtnisses. In F. Klix (ed.), Psychologische Beiträge zur Analyse Kognitiver Prozesse, Berlin: Akademie Verlag, pp. 57-98.
[21]: KRALLMANN, D., 1982: Linguistische Datenverarbeitung: Gestern, Heute und Morgen. In I. Batori, J. Krause, and H.D. Lutz, (eds.), Linguistische Datenverarbeitung: Versuch einer Standortbestimmung, Tübingen: Niemeyer, pp. 3-12.
[22]: KRAUSE, J., 1982: Mensch-Maschine-Interaktion in Natürlicher Sprache: Evaluierungsstudien zu Praxisorientierten Frage-Antwort-Systemen und Ihrer Methodik. Sprache und Information, No. 1, Tübingen: Niemeyer.
[23]: LENDERS, W., 1980 Linguistische Datenverarbeitung: Stand der Forschung. Deutsche Sprache, No. 3 (1980), pp. 213-264.
[24]: LORCH, R.F., 1982: Priming and Search Processes in Semantic memory: A Test of Three Models of Spreading Activation. Journal of Verbal Learning and Verbal Behavior, No. 21, pp. 468-492.
[25]: LUTZ-HENSEL, M., 1981: Planung der LDV-Ausbildung an Wissenschaftlichen Hochschulen. In D. Krallmann and J. Krause (eds.), Linguistische Datenverarbeitung und Informationswissenschaft in der BRD, Essen/Regensburg: LDV-Fittings, pp. 1-31.
[26]: OLSON, D.R., 1970: Language and Thought: Aspects of a Cognitive Theory of Semantics. Psychological Review, Vol. 77, No. 4, pp. 257-273.
[27]: QUILLIAN, M.R., 1966: Semantic Memory. Doctoral dissertation, Carnegie Institute of Technology.
[28]: QUILLIAN, M.R., 1968: Semantic Memory. In M. Minsky (ed.), Semantic Information Processing, Combridge, Mass.: MIT Press, pp. 216-270.
[29]: RIEGER, B.B., 1977: Bedeutungskonstitution: Bmerkungen zur Semiotischen Problematik eines Linguistischen Problems. Zeitschrift für Linguistik und Literaturwissenschaft, No. 27/28, pp. 55-68.
[30]: RIEGER, B.B., 1980: Fuzzy Word Meaning Analysis and Representation in Linguistic Semantics. Proc. of the 8th International Conference on Computational Linguistics (COLING 80), Tokyo: ICCL Com., pp. 76-84.
[31]: RIEGER, B.B., 1981a: Feasible Fuzzy Semantics. In H.J. Eikmeyer and H. Rieser (eds.), Words, Worlds, and Contexts: New Approaches in Word Semantics, Berlin/New York: de Gruyter, pp. 193-209.
[32]: RIEGER, B.B., 1981b: Connotative Dependency Structures in Semantic Space. In B.B. Rieger (ed.), Empirical Semantics: A Collection of New Approaches in the Field, Vol. II, Bochum: Brockmeyer, pp. 622-711.
[33]: RIEGER, B.B., 1982: Procedural Meaning Representation. In J. Horecky (ed.), Proc. of the 9th International Conference on Computational Linguistics (COLING 82), Amsterdam/New York: North-Holland, pp. 319-324.
[34]: RIEGER, B.B., 1983 Clusters in Semantic Space. In L. Delatte (ed.), Actes du Congrès International Informatique et Sciences Humaines, Liège: LASLA, pp. 805-814.
[35]: RIEGER, B.B., 1984a: Inducing a Relevance Relation in a Distance-like Data Structure of Fuzzy Word Meaning Representation. In R.F. Allen (ed.), Proc. of the 4th International Conference on Databases in the Humanities and Social Sciences (ICDBHSS 83), Osprey, Fla.: Paradigm Press, pp. 374-386.
[36]: RIEGER, B.B., 1984b: Lexical Relevance and Semantic Disposition: On Stereotype Word Meaning Representation in Procedural Semantics. In Hoppenbrouwes, Seuren and Weijters (eds.), Meaning and the Lexicon, Proc. of the 2nd International Colloquium on the Interdisciplinary Study of the Semantics of Natural Language, Nijmegen: N.I.S. Press, pp. 387-400.
[37]: RIEGER, B.B., 1984c: Semantic Relevance and Aspect Dependency in a Given Subject Domain. In D.E. Walker (ed.), Proc. of the 10th International Conference on Computational Linguistics (COLING 84), Stanford University Press, pp. 298-301.
[38]: RIEGER, B.B., 1984d: Semantische Dispositionen: Prozedurale Wissensstrukturen mit Stereotypisch Repräsentierten Wortbedeutungen. In B.B. Rieger (ed.), Dynamik in der Bedeutungskonstitution, Hamburg: Buske Verlag, pp. 163-228.
[39]: RIEGER, B.B., 1984e: The Baseline Understanding Model: A Fuzzy Word Meaning Analysis and Representation System for Machine Comprehension of Natural Language. In T. O'Shea (ed.), Proc. of the 6th European Conference on Artificial Intelligence (ECAI 84), New York/Amsterdam: Elsevier, pp. 748-749.
[40]: ROSCH, E., 1975: Cognitive Representations of Semantic Categories. Journal of Experimental Psychology: General, No. 104, pp. 192-233.
[41]: STRASZNER, E., 1977: Linguistische Datenverarbeitung (LDV): Anwendungsbereiche und Forschungsstand. Sprachwissenchaft, No. 2 (1977), pp. 433-470.
[42]: SWINNEY, D.A., 1979: Lexical Processing during Sentence Comprehension. Journal of Verbal Learning and Verbal Behaviour, No. 18, pp. 733-743.
[43]: UNGEHEUER, G., 1971: Linguistische Datenverarbeitung: Die Realität und eine Konzeption. IBM-Nachrichten, No. 206, pp. 688-694.
[44]: WAHLSTER, W., 1981: Natürlich-sprachliche KI-Systeme: Entwicklungsstand und Forschungsperspektiven. In J. Siekmann (ed.), German Workshop on Artificial Intelligence (GWAI 81), Berlin: Springer, pp. 50-68.
[45]: WAHLSTER, W., 1982: Aufgaben, Standards und Perspektiven Sprachorientierter KI-Forschung. In I. Batori, J. Krause und H.D. Lutz (eds.), Linguistische Datenverarbeitung: Versuch einer Standortbestimmung, Tübingen: Niemeyer, pp. 13-24.
[46]: WINOGRAD, T. 1975: Frame Representation and the Declarative/Procedural Controversy. In D.G. Bobrow and A. Collins (ed), Representation and Understanding: Studies in Cognitive Science, New York/San Francisco/London: Academic Press, pp. 185-210.
[47]: WITTGENSTEIN, L.J.J., 1969: Über Gewißheit: On Certainty. New York/San Francisco/London: Harper & Row.
[48]: ZADEH, L.A., 1965: Fuzzy Sets. Information and Control, No. 8 (1965), pp. 338-353.

Footnotes:

¹ This paper presents some results of a project in Computational Semantics conducted by the Mathematic-Empirical Systems research group (MESY), at the German Institute of the Technical University of Aachen, West Germany, and supported by the NRW Ministry of Science and Research, under grant IV A 2-FA 8600. The project is concerned with the development of means for the automatic construction of fuzzy semantic and associative knowledge representation systems from natural language discourse input. As central aspects of this project have already been reported in papers presented at the 3rd International Conference on Databases in the Humanities and Social Sciences (ICDBHSS 83), at Rutgers University [RIEGER 1984a], at the 2nd International N.I.S. Colloquium on the Interdisciplinary Study of the Semantics of Natural Language (Meaning and the Lexicon), at Nijmegen University [RIEGER 1984b], and at the 10th International Conference on Computational Linguistics (COLING 84), at Stanford University [RIEGER 1984c], these aspects are only partially taken up here again. Published in: Raben. J./Sugita, S./Kubo, M. (eds.): Toward a Computer Ethnology. Proceedings of the 8th International Symposium at the Japan National Museum of Ethnology. (Senri Ethnological Studies No. 20), Osaka (National Museum of Ethnology) 1987, pp. 97-120.

²Instead of formally introducing any of the algorithms developed and tested so far, some ideas of their performance and application will be given with figures and examples. For more detailed introductions see the general bibliography on the MESY-project. For the procedural approach see the author's recent publications.

³See also [ RIEGER 1977] where the principle of semantization is introduced as a procedural means to constitute meanings by the process of consecutive choice restrictions from the level of pragmatics, via semantics and syntactics, down to morpho-phonetics. Ranges of possible choice can be established on each of these semiotic levels and may be reconstructed from the morpho-phonetic level upwards by an equivalently generative procedure. On any of the semiotic levels it will select from their constitutive sets of elements and symbols those combinatorial strings of elements which, being not exhaustible considering the number of formally possible combinations, represent recurrent realizations of factually established redundancies. These redundancies of recurrent elementary combinations on one level allow resolution of their identifications into constitutive elements on the next semiotic level, where formally an even wider range of combinations is possible which again is not exhaustible factually, and so forth, from phonemes to syllables, syllables to words, words to phrases, phrases to discourses, etc. This increase of systematic combinatorial possibilities among elements from level to level corresponds to a decreasing determinateness of the rules which govern the structural realizations of any of these combinations. Thus, the notion of semantization extends that of meaning implying choice to a procedural continuum, according to which the semiotic level of morpho-phonetics will convey meaning under more specific restrictions on less choice than the level of pragma-semantics, which will purport meaning under less specifiable restrictions on more possibilities of choice.

Empirical Semantics and Computational Linguistics1

BURGHARD B. RIEGERTechnical University of Aachen