Generating Dependency Structures of Fuzzy Word Meanings in Semantic Space1

Burghard B. Rieger
Technical University of Aachen, Germany

This paper will report on one of the central objectives of a project in computational semantics which is supported by the Northrhine-Westphalia Ministry of Science and Research under grant IV A 2-FA 8600.

1. Modelling system structures of word meanings and/or world knowledge is to face the problem of their mutual and complex relatedness. Under the notion of semantic relevance and knowledge disposition this interdependency may empirically be reconstructable from natural language discourse although most approaches in linguistic semantics and artificial intelligence do not address these issues. Instead, linguists as well as experts engaged in word meaning and/or world knowledge representation still provide the necessary semantic or external world data introspectively by exploring their own competence and memory capacities to depict their findings in some semantic or conceptual structures (lists, arrays, networks, etc.). They do so with the understanding that their models may have a more or less ad hoc character and tend to lack - beyond their limited operational performance - intersubjective control. Other than these introspective explorations, the present approach strives to derive directly via automatic analysis of natural language discourse some basic data whose relational structure will not be declared but procedurally be defined by algorithms which induce it.

2. Based upon statistical means for the empirical analysis of discourse and for the formal representation of vague word meanings in natural language texts, procedures have been devised which allow for the systematic modelling of a fragment of the lexical structure constituted by the vocabulary employed in the texts as part of the concomitantly conveyed world knowledge concerned. The coefficients applied will map lexical items onto fuzzy subsets of the vocabulary according to the numerically specified regularities these items have been used with in the discourse analysed. The resulting system of sets of fuzzy subsets is a datastructure which may be interpreted topologically as a hyperspace with a natural metric. Its linguistically labeled elements (representing meaning points) and their mutual distances (representing meaning differences) form discernable clouds and clusters which determine the labels' associative meaning relations. Thus, the analysing algorithm takes natural language texts from a certain subject domain as input and produces as output the distance-like datastructure (semantic space) of linguistically labeled elements (meaning points) whose positions represent essential properties of the conceptual prototypes according to which their labels have been employed in the texts analysed. Their varying dependencies which constitute a (latent) associative relational structure may procedurally be defined and modelled on the semantic space data to allow not only for search and retrieval operations being executed but also for inferencial processes being performed on that data structure under different aspects of semantic contents and relevance.

Figure 1, Figure 2, Figure 3

3. Taking up ideas from the theory of semantic memory and spreading activation in cognitive psychology, a new algorithm is presented which operates on the semantic space data to generate - other than the CDS-procedure - associative dependency structures (ADS) in the format of general (n-ary) trees. Given one meaning point's position being primed, the algorithm will first start to list all neighbouring points by increasing distances. Then, the algorithm's generic procedure will take the first on the list, determine its most adjacent point among those already primed, and identify it as its mother-node before deleting the new daughternode's label from the list. Repeated successively for each of the meaning points listed and in turn primed in accordance with this procedure, the algorithm of least distances will select a particular fragment of the relational structure latently inherent in the semantic space, depending on the aspect, i.e. the primed meaning point the algorithm is initially started with. Working its way through and consuming all labeled points in the space system, the ADS-algorithm transforms prevailing similarities of meanings as represented by adjacent points to establish a binary, non-symmetric, and transitive relation between them. This relation allows for the hierarchical reorganisation of meaning points as nodes under a primed head in an n-ary ADS-tree.

The process of detection and identification which the algorithm performs may be illustrated in view of a two-dimensional space configuration of 11 points ád{a, b, c, d, e, f,g, h, i, j, k } ñ (Fig. 1).

Submitted to the search procedure of least distances under initial priming of the point a the algorithm will identify the distances concerned as in Fig. 2 and produce the equivalent tree representations as shown in Fig. 3. For the effective use in procedural meaning representation and semantic processing, the ADS-trees may additionally be evaluated by associative criterialities, not given here. The criteriality is a numerical expression of the degree or intensity by which any ADS-node is dependent on its mother-node, calculated as a function of both, the involved meaning points' topology and its relative distances leading to the initially primed point in the semantic space.

 Figure 4

Figure 5

Examples of associative dependency trees are given below where the upper fragments of the ADS's of ARBEIT/labour (Fig. 4) and INDUSTRIE/industry (Fig. 5) are shown as computed from the semantic space structure derived of a sample of German newspaper texts from the 1964 daily editions of 'Die Welt'.

4. The ADS-trees' properties permit different though related model-bound interpretations which can only be indicated here:


Footnotes:

1Published in: Hattori, S./Inoue, K. (Eds.): Proceedings of the XIIIth International Congress of Linguists 1982, Tokyo (CIPL) 1983, pp. 543-548.