A Systems Theoretical View on Computational Semiotics.
Modeling text understanding as meaning constitution by SCIPS.

Burghard B. Rieger
FB II: Department of Computational Linguistics,
University of Trier, Germany

Extended Abstract

In a rather sharp departure from CL and AI approaches, modeling in Computational Semiotics (CS) neither presupposes rule-based or symbolic formats for linguistic knowledge representations, nor doesit subscribe to the notion of symbolically represented world knowledge as some static structures that may be abstracted from and formatted independently of the way they are processed. Consequently, knowledge structures and the processes operating on them are to be modeled procedurally and ought to be implemented as algorithms. They determine Semiotic Cognitive Information Processing Systems (SCIP) systems as collections of cognitive information processing devices whose semiotic character consists in their multi-level representational system of (working) structures emerging from and being modified by such processing. According to different types of cognitive modeling distinguished in the past, computational semiotics can be characterized as aiming at the dynamics of emergent meaning constituted by processes which may be simulated as multi-resolutional representations within the frame of an ecological information processing paradigm.

1  Introduction

Natural language texts (still) are the most flexible and as that highly efficient means to represent knowledge for and convey learning to others. We do so by language means, employing words, forming sentences, producing texts whose meanings are understood to convey, stand for, designate, refer to or deal with topics and subjects, entities and domains, structures and processes in the real world. What appears to be conditional for this kind of text understanding is humans' language faculty, i.e. the (performative) ability to identify, recognize, produce, and structure some fragments of real world stimuli according to some internal-though externally conditioned-principles (competence). Other than traditional approaches in linguistics proper (LP), computational linguistics (CL) and artificial intelligence research (AI), computational semiotics (CS) neither depends on rule-based or symbolic formats for (linguistic) knowledge representations, nor does it subscribe to the notion of (world) knowledge as some static structures that may be abstracted from and represented symbolically independent of the way they are processed. Instead, knowledge structures and the processes operating on them are modeled as procedures that can be implemented as algorithms. Semiotic Cognitive Information Processing (SCIP) systems allow to study the emergence of sign structures as a self-organizing process on the basis of combinatorial and selective constraints universal to all natural languages. Their regularities are exploited by text analyzing algorithms operating on different levels which may be interpreted as intermediate (internal) representations of the semiotic system's states of recursive, self-similar adaptation to the (external) structures of its environment as signaled and mediated by the natural language discourse processed.

2  Computational Semiotics

In terms of the theory of information systems, life may be understood as the ability to survive by adapting to changing requirements in the real world. Thus, system faculties like perception, identification, and interpretation of structures (external or internal to a system) may be conceived as a form of dynamic information processing which (natural or artificial) systems-due to their own structuredness-are able to perform. In addition to vertical transmission of system specific (intraneous) experience through (biogenetically successive) generations, mankind has complementarily developed horizontal means of mediating specific and foreign (extraneous) experience to (biogenetically unrelated) fellow systems within their own or any later generation. This is made possible by a semiotic move that allows not only to distinguish processes from results of experience but also to convert the latter to knowledge facilitating it to be re-used, modified and improved in learning. Vehicle and medium of this move are representations, i.e. complex sign systems which constitute languages and form structures, like words, phrases, texts which may be realized in communicative processes, called actualization.

2.1  Modes of Processing

The basic idea of model construction in terms of semiotic cognitive information systems is that their processing is an adequate correlate which couples its structures to those of their surroundings determining a system's environment as a collection of structures which that particular system is able to process in order to survive. Accepting the cognitive point-of-view (implying that information processing is knowledge based), human beings have to be considered very particular cognitive systems whose outstanding plasticity and capability to adapt to changing environmental conditions is essentially tied to their sign and symbol generation, manipulation, and understanding capabilities which render them semiotic. The use and understanding of natural languages in communicative discourse expands their learning potential well beyond experimental experience into realms of thought experiments or reasoning whose virtuality may be characterized by the fact that it dispenses with the identity of space-time coordinates for systems and their environment which normally prevails for this relation when qualified to be indexed real. It appears, that this dispensation of space-time-identity is not only conditional for the possible distinction of systems (mutually and relatively independent) from their environments, but also establishes a notion of representation which may be specified as exactly that part of a time-scaled process that can be separated and identified as its outcome or result in being (or becoming) part of another time-scale1. Accordingly, immediate or space-time-identical system-environments without representational form may well be distinguished from mediate or space-time-dispensed system-environments whose particular representational import ( texts) corresponds to their particular bivalent timely status both, as longer-term material (composed of language signs and structures functioning and having virtual meaning), and as shorter-term structure (in need of being (re)cognized in order to be identifyable. This double identity calls for a particular modus of actualization (understanding) that may be characterized as follows:
For systems appropriately adapted and tuned to such environments,actualization consists essentially in a twofold embedding to realize
  • the spacio-temporal identity of pairs of immediate system-environment coordinates which will let the system experience the material properties of texts as signs (i.e. by functions of physical access and mutually homomorphic appearance). These properties apply to the percepts of language structures presented to a system in a particular discourse situation, and
  • the representational identity of pairs of mediate system-environment parameters which will let the system experience the semantic properties of texts as meanings (i.e. by functions of identification, organization, emergence, activation of structures). These virtual properties apply to the comprehension of language structures recognized by a system to form the described situation.
  • Hence, according to the theory of information systems, functions like interpreting signs and understanding meanings translate to processes which extend the fragments of reality accessible to a living (natural and possibly artificial) information processing system. This extension applies to both, the immediate and mediate relations a system may establish according to its own evolved adaptedness or dispositions (i.e. innate and acquired structuredness, processing capabilities, represented knowledge).

    2.2  Semiotic Enactment

    Semiotic systems' ability to actualize environmental representations does not merely add to the amount of experiential results available, but constitutes also a significant change in adaptive modus. Splitting up experience in experiential processes and experiential results-the latter being representational and in need for procedural actualization by the former-is tantamount to the emergence of a new kind of experiences which allows to be tried and tested, very much like hypotheses in experimental settings. The results of such tentative experiencing-like in immediate system-environments-may become part of a system's adaptive knowledge but may also-other than in immediate system-environments-be neglected or selected, accepted or dismissed, varied and repeatedly actualized and re-used without any risk for the system's own survival, stability or adaptedness.

    For this kind of experiencing, the concept of representation has to be considered fundamental It is also to the computational semiotic approach to cognition, allowing to model-instead of presupposing-the distinction of processes of cognition from their results which may emerge-due to the traces these processes leave behind-in some structuredness (knowledge) of some representation. Different representational modes of such structures not only comply with the distinction of internal or tacit knowledge (as e.g. in memory) on the one hand and of external or declarative knowledge (as e.g. in discourse) on the other2, these modes also relate to different types of formats ( distributional vs. symbolic), modeling ( connectionist vs. rule-based) and processing ( stochastic vs. deterministic). It is this range of correspondences that Fuzzy Linguistics is based upon and tries to exploit to come up with a unifying framework for most of the different approaches followed so far.

    Thus, (textual) representations increase the potentials of adaptive information processing beyond a system's life span but can do so only by simultaneously constraining this potential by dynamic structures corresponding to knowledge. The built-up, employment, and modification of these structural constraints is controlled by procedures whose processes determine cognition and whose results constitute adaptation. Systems properly attuned to textual system-environments have acquired these structural constraints (language learning) and can perform certain operations efficiently on them (language understanding). These are prerequisites to (re)cognize mediate (textual) environments, to respond to their needs for, and to enact the systems' own abilities of actualization. Systems capable of and tuned to such knowledge-based processes will in the sequel be referred to as semiotic cognitive information processing systems (SCIPS).

    3  Modeling Cognition

    The alliance of logics and linguistics, mediated mainly by (language) philosophy in the past and by (discrete) mathematics since the first half of this century, has long been (and partly still is) dominating the way in what terms natural languages expressions should be explicated and how their processing could be modeled. It may well be suspected that some of the problems encountered by these model constructions are due to the representational formats they employ in depicting and manipulating entities (elements, structures, processes, and procedures) considered to be of interest or even essential to the understanding of the communicative use of natural languages by humans.

    3.1  Semiotic Attunement

    For SCIP systems' ability to adapt efficiently to changing environmental conditions, learning how to anticipate possible changes in its environment is tied to structure which, consequently, has not only to be acquired but also represented. Processes which do not presuppose such representations (symbolic or else) to operate on, but which-by their being operational-will make such representational structures emerge, are called semiotic.

    In a systems theoretic approach, attunement characterizes a property or function of the system-environment relation which may be regarded as the procedural equivalent of the static understanding of knowledge structures as realized in cognitive information processing models so far. Dynamic conceptions of structuredness allow to define knowledge as an open, modifiable, and adaptive system whose organization can be conceived as a function of the system's own processing results (knowledge acquisition). The apparent ambiguity of system here is an immediate consequence of the cognitive process and its result being indistinguishable in semiotic enactment which the modeling may resolve by introducing different levels and/or perspectives. Multi-level resolution in semiotic modeling allows for these entities' own (yet misconstrued) ontology which is not (or not fully) accounted for by predicative and propositional representations or rule-based and truth-functional formats. Semiotic models, instead, are to find and employ representational formats and processing algorithms which do not prematurely decide and delimit the range of semiotically relevant entities, their representational formats and procedural modes of processing. One of their advantages would be that the entities considered relevant would not need to be defined prior to model construction but should emerge from the very processing which the model simulates or is able to enact. It appears that-if any-this property of semiotic models does account for the intrinsic (co- and contextual) constraining of the meaning potential characteristic of natural language discourse which renders them semiotic in a meaning (or function) constituting sense which is the core of understanding.

    Representing a system's environment (or fragments thereof) in a way, that such representations not only take part in a system's direct (immediate) environment (via language texts) but may moreover be understood as virtual in the sense that new (mediate) environments (via textual meanings) can also be processed, has been introduced explicitly elsewhere. This way is again dependent on a system's attunement to these kinds of discourse situations which have to be modeled accordingly.

    3.2  Discourse Situations

    These situations (comprising system, environment, and processing) are considered cognitive inasmuch as the system's internal (formal and procedural) knowledge has to be applied to identify and recognize structures external to the system (meaning interpretation). These situations become semiotic whenever the internal knowledge applied to identify and interpret environmental structures is derived from former processes of external structure identification and interpretation and applied as the result of self-organizing feedback through different levels of (inter-)mediate representation and organization. This process (of meaning constitution or structure understanding) is the multiple enactment of the threefold relation which is called-following Peirce- semiosis 3 . This triadic relation allows for the different ontological abstractions of language as a
  • component (sign) in a system's external environment, i.e. material discourse as a physical space-time location;
  • constituent of virtuality which systems properly attuned experience as their environment (object), i.e. structured text as an interpretable potential of meanings, and
  • process of actualization (interpretant) in a particular system-environment situation, i.e. understanding as cognitive constitution of meaning.
  • Under these preliminary abstractions, the distinction between (the formats of) the representation and (the properties of) the represented is not a prerequisite but an outcome of semiosis, i.e. the semiotic process of sign constitution and understanding. Hence, it should not be a presupposition or input to but a result or output of the processes which are to be modeled procedurally and called semiotic.

    4  Constructive Representations

    As more abstract (theoretical) levels of representation for these processes-other than their procedural modeling-are not (yet) available, and as any (formal) means of deriving their possible results-other than by their (operational) enactment-are (still) lacking, it has to be postulated that these processes-independent of all other explanatory paradigms-will not only relate to but produce different representational levels of entity formation. They do so in a way which Marr characterized as being formally controlled or computable, which can be modeled procedurally or algorithmized, and which may empirically be tested or implemented. Procedural models of this kind are understood to denote a class of (re)presentational, i.e. modeled (re)constructions of entities whose interpretation is not (yet) tied to an underlying theory which would provide the semantics for the entities (or expressions) that these type of models present. Instanciating their defining procedures as implemented algorithms will result in processes which produce some (abstract) structures whose visualizations can only then be compared to those structures originally observed to hold for and be characteristic of the modeled object.

    4.1  Natural Language Structures

    Structural linguistics has contributed substantially to how language items come about to be employed in communicative discourse the way they are. The fundamental constraints have been identified that control the multi-level combinability and formation of language entities by distinguishing the restrictions on linear aggregation of elements (syntagmatics) from restrictions on their selective replacement ( paradigmatics). Describing regularities by computational procedures whose varying degrees of combinatorial determinacy will not only detect different patterns of elements' linear distributions but may also be identified with the constraints being applied to constitute the syntagmata and paradigmata observed. Defining structures of that sort procedurally by an algorithmic or computational operation whose enactment will instantiate a process in space-time to select the elements concerned according to their structural, i.e. their syntagmatic and paradigmatic relatedness, is to provide for the semioticity of entities whose vagueness and re-constructive openness can more satisfactorily be accounted for by the dynamism of distributive as opposed to symbolic representational formats. They will map structured input data according to its immanent regularities to yield new, structural representations emerging from that computation (as hypothesized by performative linguistics and realized in procedural models of computational semiotics). Components of these new structures are value distributions or vectors of input entities that depict properties of their structural relatedness, constituting multi-dimensional (metric) space structures ( semiotic spaces). Their elements may also be interpreted as (labeled) fuzzy sets allowing set theoretical operations be exercised on these representations that do not require categorial type (crisp) definitions of concept formations. Computation of letter (morphic) vectors in word space, derived from n-grams of letters graphemes as well as of word (semic) vectors in semantic space, derived from word type correlations of word token distributions in discourse may serve to illustrate the operational flexibility and granular variability of these representational formats.

    Figure 1
    Figure 1: Situational setting of SCIP system and environment allowing for Endo-Reality to differ from Exo-Reality. The the system's (non-propositional) faculties of language processing are kept strictly apart from the (propositional) way textual descriptions of its environment are generated to constitute the setting's structural coupling.

    4.2  Semiotic Experimental Design

    As we have separated cognitive processes from their resultant structures above, so may we distinguish here between the long-term structure as an addressable representation of knowledge (stereotype or concept) and its short-term process in a situational embedding (employment or activation) with the semiotic implication that the structures depend on the processes and vice versa to let addressable representations emerge and cognitive processes be enacted. Thus, the duality of the inner-outer distinction or the system-environment opposition may be mediated by processes operating on some supposedly common, basal representational structures4 whose efficient reorganization can be modeled procedurally to result in a-more or less subjective-internal (or endo-)view the system develops,  a n d  in a-more or less objective-external (or exo-)view of the surrounding environment that constitutes reality.

    To find out (and preferably be able to test) what of the structural information inherent in natural language discourse-defined  a n d  structured by the text analytical processes-might be involved in mediating or constituting that duality, an experimental setting has been designed whose system-environment components (Fig. 1) are meant to allow for the system's own view of its environment ( Fig. right: endo-reality) to differ from our external view of that environment (Fig. left: exo-reality). It is based on the assumption that some deeper representational level or core structure-like the semantic space concept -might be identified which could be considered a common base for different notions of representations corresponding to different formats of meaning developed by theories of referential and situational semantics as well as some structural or stereotype semantics. Therefore, the propositional form of natural language predication-undoubtedly the common basis of traditional meaning theories-has only been used here to control the format of the natural language training material which described the exo-reality, not, however, to determine the way these descriptions were processed by the SCIP system in order to arrive at its endo-reality view of it.

    4.3  System-Environment Setting

    The experimental setting consists of a directionally mobile system in a two dimensional environment with some objects at certain places and a corpus of natural language texts which describe correctly these objects' locations relative to the system's position as the structural coupling between system and environment. Natural language understanding would have to be considered successfully enacted whenever some representation of the objects' locations could be derived as a result of the computational processing of these textual descriptions of the original, and is at least vaguely similar to it (see Fig.). What makes such an artificially abstracted system5 a semiotic one, is that-whatever the system might gather from the as yet uninterpreted textual structures-the organization of emerging entities will not be the result of some decoding processes which would necessarily call for that code being made known to the system. Instead, the system's (co- and contextually restricted) perceptual and processing capabilities should suffice to (re-)organize the environmental data  a n d  to (re)present the results in some dynamic structure which determines the system's knowledge (susceptibility), learning (change) and understanding (representation).

    To enable an inter-subjective scrutiny, it was assumed here that the (unknown) results of an abstract system's (well known) acquisition process is compared against the (well known) traditional interpretations of the (unknown) processes of natural language meaning constitution6.

    Figure 2
    Figure 2: External view of reference plane with location of objects  \bigtriangleup and [¯]  ( Exo-Reality) propositionally described by texts in the training corpus (structural coupling), and 2-dim-image of SCIP system's view of its environment (Endo-Reality) showing regions of potential object locations by profile lines of common likelihood (isoreferentials).

    4.4  Situational Restriction

    For the purpose of testing semiotic processes, their situational complexity has to be reduced by abstracting away irrelevant constituents, hopefully without oversimplifying the issue and trivializing the problem. In order to achieve this, the parameters have to be specified constituting the SCIP situation according to which
  • the three main components of the experimental setting, the system, the environment, and the discourse are specified by sets of conditioning properties. These define the SCIP system by way of a set of procedural entities like orientation, mobility, perception, processing, the SCIP environment is defined as a set of formal entities like reference plane, objects, grid, direction, location, and the SCIP discourse material mediating as structural coupling between system and environment is structured first by a number of part-whole related (granular) entities like word, sentence, text, corpus of which sentence and text require further defining restrictions in order to be specified by a formal syntax and referential semantics;
  • the system's environmental data is provided by a corpus of (natural language) texts comprising correct expressions of true propositions denoting relations of system-position and object-location (SP-OL relations for short) described according to the formally specified syntax and semantics (representing the exo- view or described situations), and
  • that the system's internal picture of its surroundings (representing the endo- view or discourse situations) is to be derived from this language environment  o t h e r  than by way of propositional reconstruction, i.e. without syntactic parsing and semantic interpretation of sentence and text structures.
  • Consequently, the exo- knowledge allowing the designers of the experimental setting to control the propositional encoding and decoding of environmental information in texts which the system in its specified environment would process, have to be kept strictly apart from and was essentially not to be included in the SCIP system's endo- capacities. Thus, the system's own non-propositional processing will have to allow for some results which-as the system's internal representation-would not be interpretable as mere repetitious reproductions or application of knowledge structures made available to it externally, but which would instead have the chance to be different from (however comparable to) the exo- view of its environment.

    5  Conclusion

    The experimental setting developed to allow for semiotic testing hinges on the idea that cognitive information processing will both operate on and produce structures as a condition for and/or a results of such processing. Semiotic structures have to have some space-time extension, i.e. are in principle observable apart from and independent of being processed cognitively. The processes operating on and modifying such structures can in principle be dealt with independent of their temporal duration by procedures which can be defined as processes abstracted from their temporality. Procedures can be represented formally, their notational format be parsed and checked for correctness, their expressions be interpreted or compiled for execution and-provided a suitable automaton is available-become initial for the enactment of processes in time again, having not only a certain duration but also the effect of operating on and modifying structures which are in fact (not only in principle) observable. This two-sided independence facilitates procedural cognitive models to relate structured language expressions which can be analyzed (or observed) without being understood, to language understanding processes which can be conceived (as procedures) abstracted from their temporal duration. It appears, that by this move procedures and algorithms found to model some aspects of cognitive information processing for language comprehension can be tested against-not on the grounds of-an accepted model of cognitive (language) understanding.

    References

    [1]
    D. Marr: Vision. Freeman, SanFrancisco 1982.

    [2]
    A. Meystel: Semiotic Modeling and Situation Analysis. AdRem, Bala Cynwyd 1995.

    [3]
    B. B. Rieger: Meaning Acquisition by SCIPS. In: B. M. Ayyub (ed): IEEE-Transactions ISUMA-NAFIPS-95, Los Alamitos 1995, 390-395.

    [4]
    B. B. Rieger: Situations, Language Games, and SCIPS. In: A. Meystel/ N. Nerode (eds): Architectures for Semiotic Modeling and Situation Analysis, Bala Cynwyd 1995, 130-138.

    [5]
    B. B. Rieger: Situation Semantics and computational linguistics: towards Informational Ecology. In: Kornwachs/ Jacoby (eds): Information. New Questions to a Multidisciplinary Concept, Berlin 1996, 285-315.

    [6]
    B. B. Rieger: Computational Semiotics and Fuzzy Linguistics. On meaning constitution and soft categories. In: A. Meystel (ed): A Learning Perspective: ISAS-97, NIST, Washington 1997, 541-551.

    [7]
    B. B. Rieger: Computating Granular Word Meanings. A fuzzy linguistic approach in Computational Semiotics. In: P. P. Wang (ed): Computing with Words, New York 1998, [in print].

    Footnotes:

    1 Different linear time scales extended to those of differently scaled time cycles can be conceoved, particularly in view of the resolutional power of representations and their semiotic processing in computational models.

    2 Whereas tacit knowledge cannot be represented other than by the immediate system-environments' corresponding states, explicit knowledge is bound to acquire some formal properties in order to become externally presented and thereby part of mediate system-environments. Natural languages obviously provide these formal properties---as partly identified by research in linguistic competence (principles knowledge and acquisition of language) - whose enactment - as investigated in studies on natural language performance (production and understanding of texts) - draws cognitively on both bases of (explicit and tacit) knowledge.

    3 By semiosis I mean [...] an action, or influence, which is, or involves, a cooperation of three subjects, such as sign, its object, and its interpretant, this tri-relative influence not being in any way resolvable into actions between pairs. (Peirce 1906, p. 282)

    4 Representational formats will be called basal if they can provide a frame for the formal unification of categorial-type, concept-hierarchical, truth-functional, propositional, phrasal, or whatever other representations.

    5 The system's channels of perception to form its own or endo-view of its surroundings are extremely limited, and its ability to act (and react) is heavily restricted compared to natural or living information processing systems.

    6 The concept of knowledge underlying this use here may be understood to refer to known as having well established (scientific, however controversial, but at least inter-subjective) models to deal with, whereas unknown refers to the lack of such models.