In a rather sharp departure from those more orthodox lines of introspective data acquisition in meaning and knowledge representation research, the present approach (1) has been based on the algorithmic analysis of discourse that real speakers/writers produce in actual situations of performed or intended communication on a certain subject domain, and (2) the approach makes essential use of the word-usage/entity-relationship paradigm in combination with procedural means to map fuzzy word meanings and their connotative interrelations in a format of stereotypes. Their dynamic dependencies (3) constitute semantic dispositions that render only those conceptual interrelations accessible to automatic processing which can - under differing aspects differently - be considered relevant. Such dispositional dependency structures (DDS) would seem to be an operational prerequisite to and a promising candidate for the simulation of contents-driven (analogically-associative), instead of formal (logically-deductive) inferences in semantic processing.
It has been shown elsewhere ( RIEGER 1980), that in a sufficiently large sample of pragmatically homogeneous texts, called corpus, only a restricted vocabulary, i.e. a limited number of lexical items will be used by the interlocutors however comprehensive their personal vocabularies in general might be. Consequently, the lexical items employed to convey information on a certain subject domain under consideration in the discourse concerned will be distributed according to their conventionalized communicative properties, constituting semantic regularities, which may be detected empirically from the texts.
For the quantitative analysis not of propositional strings but of their elements, namely words in natural language texts, rather simple statistics serve the basically descriptive purpose. Developed from and centred around a correlational measure to specify intensities of co-occurring lexical items used in natural language discourse, these analysing algorithms allow for the systematic modelling of a fragment of the lexical structure constituted by the vocabulary employed in the texts as part of the concomitantly conveyed world knowledge.
A correlation coefficient appropriately modified for the purpose has been used as a mapping function ( RIEGER 1981a). It allows to compute the relational interdependency of any two lexical items from their textual frequencies. Those items which co-occur frequently in a number of texts will positively be correlated and hence called affined, those of which only one (and not the other) frequently occurs in a number of texts will negatively be correlated and hence called repugnant. Different degrees of word-repugnancy and word-affinity may thus be ascertained without recurring to an investigator's or his test-persons' word and/or world knowledge (semantic competence), but can instead solely be based upon the usage regularities of lexical items observed in a corpus of pragmatically homogeneous texts, spoken or written by real speakers/hearers in actual or intended acts of communication (communicative performance).
The resulting system of sets of fuzzy subsets constitutes the semantic space. As a distance-relational datastructure of stereotypically formatted meaning representations it may be interpreted topologically as a hyperspace with a natural metric. Its linguistically labelled elements represent meaning points, and their mutual distances represent meaning differences.
The position of a meaning point may be described by its semantic environment. Tab. 1 shows the topological environment EáUNTERNEHMñ, i.e. those adjacent points being situated within the hypersphere of a certain diameter around its center meaning point UNTERNEHM/enterprise as computed from a corpus of German newspaper texts comprising some 9000 tokens of 360 types in 175 texts fron the 1964 editions of the daily DIE WELT.
Having checked a great number of environments, it was ascertained that they do in fact assemble meaning points of a certain semantic affinity. Further investigation revealed ( RIEGER 1983) that there are regions of higher point density in the semantic space, forming clouds and clusters. These were detected by multivariate and cluster-analyzing methods which showed, however, that the both, paradigmatically and syntagmatically, related items formed what may be named connotative clouds rather than what is known to be called semantic fields. Although its internal relations appeared to be unspecifiable in terms of any logically deductive or concept hierarchical system, their elements' positions showed high degree of stable structures which suggested a regular form of contents-dependant associative connectedness ( RIEGER 1981b).
Corroborating ideas expressed within the theories spreading activation and the process of priming studied in cognitive psychology ( LORCH 1982), a new algorithm has been developed which operates on the semantic space data and generates - other than in RIEGER (1982) - dispositional dependency structures (DDS) in the format of n-ary trees. Given one meaning point's position as a start, the algorithm of least distances (LD) will first list all its neighbouring points and stack them by increasing distances, second prime the starting point as head node or root of the DDS-tree to be generated before, third, the algorithm's generic procedure takes over. It will take the first entry from the stack, generate a list of its neighbours, determine from it the least distant one that has already been primed, and identify it as the ancestor-node to which the new point is linked as descendant-node to be primed next. Repeated succesively for each of the meaning points stacked and in turn primed in accordance with this procedure, the algorithm will select a particular fragment of the relational structure latently inherent in the semantic space data and depending on the aspect, i.e. the initially primed meaning point the algorithm is started with. Working its way through and consuming all labeled points in the space structure - unless stopped under conditions of given target nodes, number of nodes to be processed, or threshold of maximum distance - the algorithm transforms prevailing similarities of meanings as represented by adjacent points to establish a binary, non-symmetric, and transitive relation of semantic relevance between them. This relation allows for the hierarchical re-organization of meaning points as nodes under a prisme head in an n-ary DDS-tree ( RIEGER 1984a).
Without introducing the algorithms formally, some of their operative characteristics can well be illustrated in the sequel by a few simplified examples. Beginning with the schema of a distance-like data structure as shown in the two-dimensional configuration of 11 points, labeled a to k (Fig. 1.1) the stimulation of e.g. points a or c will start the procedure and produce two specific selections of distances activated among these 11 points (Fig. 1.2). The order of how these particular distances are selected can be represented either by steplists (Fig. 1.3), or n-ary tree-structures (Fig. 1.4), or their binary transformations (Fig. 1.5). It is apparent that stimulation of other points within the same configuration of basic data points will result in similar but nevertheless differing trees, depending on the aspect under which the structure is accessed, i.e. the point initially stimulated to start the algorithm with.
Applied to the semantic space data of 360 defined meaning points calculated from the textcorpus of the 1964 editions of the German newspaper DIE WELT the Dispositional Dependency Structure (DDS) of UNTERNEHM/enterprise is given in Fig. 2 as generated by the procedure described.
Beside giving distances between nodes in the DDS-tree, a numerical measure has been devised which describes any node's degree of relevance according to that tree structure. As a numerical measure, a node's criteriality is to be calculated with respect to its root or aspect and has been defined as a function of both, its distance values and its level in the tree concerned. For a wide range of purposes in processing DDS-trees, different criterialities of nodes can be used to estimate which paths are more likely being taken against others being followed less likely under priming of certain meaning points. Source-oriented, contents-driven search and retrieval procedures may thus be performed effectively on the semantic space structure, allowing for the activation of dependency paths. These are to trace those intermediate nodes which determine the associative transitions of any target node under any specifiable aspect.
Using these tracing capabilities within DDS-trees proved particularly promising in an analogical, contents-driven form of automatic inferencing which - as opposed to logical deduction - has operationally been described in RIEGER (1984c) and simulated by way of parallel processing of two (or more) dependency-trees.
1Published in: COLING 84 - Proceedings of the 10th International Conference on Computational Linguistics, Stanford (Stanford UP) 1984, pp. 298-301.