File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/85/e85-1016_metho.xml
Size: 16,208 bytes
Last Modified: 2025-10-06 14:11:43
<?xml version="1.0" standalone="yes"?> <Paper uid="E85-1016"> <Title>PARAHETRIZED ABSTRACT OBJECTS FOR LINGUISTIC INFORMATION PROCESSING</Title> <Section position="4" start_page="0" end_page="108" type="metho"> <SectionTitle> PARAMETRIZED ABSTRACT OBJECTS </SectionTitle> <Paragraph position="0"> One of the basic difficulties in natural language processing arises from the fact that modularity is both a desirable and hardly attainable property of the systems. At first it seems quite reasonable to break the programs into manageable, modular sub-programs0 especially as the linguistic data, at least at first approximation,lend themselves to a clear-cut classification in terms of morphology, syntax, semantics and pra~matics. Moreover, comparatively sophisticated techniques and methods are already available in eachsubfield.</Paragraph> <Paragraph position="1"> Unfortunately, it has now become commonplace knowledge that this strategy of developing separate modules for each sub-problem and integratin~ them smoothly is not satisfactory: the operations of parsing sentences, producing internal representations, reasoning about them, answerin~ questions, generatin~ text, and so on, are strongly interdependent. The degree, order and location of the interactions between different parts may vary significantly, according to individual situations.</Paragraph> <Paragraph position="2"> A consequence of the realization of this fact has been the development of strongly integrated, usually procedural systems, where the individual subprograms operate simultaneously on several deeply intricated linguistic levels . The price one has to pay for a relative success of this approach is in terms of understandability and ~eneralization: many systems are strongly dependent on the particular type of problem they have been programmed to solve, and possible extensions or transpouitions would require fundamental modifications.</Paragraph> <Paragraph position="3"> Uhat kind of software tools would allow at the same time modularity and multi-faceted, polymorphic and concurrent interactions between processes ? Modularity and complex interactions are characteristic features of the object oriented paradigm .</Paragraph> <Paragraph position="4"> Modularity iS provided by the structuration in terms of objects.</Paragraph> <Paragraph position="5"> Complex interaction is a consequence of the distribution of the control between t~e different objects and of the (possibly multiple) inheritance facilities between hierarchically dependent objects. The possibility of using different points of views fOr the same objects is a consequence of this structuration.</Paragraph> <Paragraph position="6"> This approach leads to focussing on the basic process of abstraction. In this context, abstraction is a process which, starting from a description of the data, yields an abstract specification. This involves three steps.</Paragraph> <Paragraph position="7"> - define the relevant objects for the problem under study; - define the possible functions and relations on the objects; -give explicitly the constraints between the functions and relations.</Paragraph> <Paragraph position="8"> The object-oriented approach is usually equated with the Smalltalk &quot;vision&quot; ( Goldberg and Robson, 1983 ), while other views of objects are rejected as being irrelevant or even self contradictory.</Paragraph> <Paragraph position="9"> Smalltalk objects can be characterized by the following properties: - Each object is an instance of a class (a generic object).</Paragraph> <Paragraph position="10"> - Each object has a local memory which can only be updated by functions (or procedures) local to the object.</Paragraph> <Paragraph position="11"> - Objects are orsanlzed into a tree-llke hierarchy implyin~ tree-like inheritance. - Communication between objects is organized through message passing.</Paragraph> <Paragraph position="12"> However ,it has been argued that the object oriented approach can be fruitfully carried over into applicative language contexts (Steels ,198Z) or into particular systems based on lo~ic where the fundamental mechanism is data abstraction (Goguen et al. 1983).</Paragraph> <Paragraph position="13"> As regards Smalltalk, it does not provide systematic facilities for deflnin~ abstract data types; all computations are highly dependent on side effects (assiPSnment is systematically used in local operations) and there is no explicit typing.</Paragraph> <Paragraph position="14"> ge favour the approach exemplified by such lanPSuaaes as OBJ and their possible extensions (Goguen and Meseguer, 1984), as we think that they will allow ,in the lon~ range, a more efficient programming style and make the systematic proof of programs possible.</Paragraph> <Paragraph position="15"> The OBJ languaQe is based upon data abstraction: an object is a type (i.e. a domain of values with functions accessing those values); objects are organized into a hierarchy ( an acyclic ~raph) representin~ the dependencies among types. Computations are performed by using equatlonal axioms as oriented rewrite rules.</Paragraph> <Paragraph position="16"> Therefore , granted the availability of a theorem prover , the consistency of the specifications given in the abstract data type can be formally checked. Moreover , since the objects have a clear mathematical definition, all the techniques of abstract algebra are also available.</Paragraph> <Paragraph position="17"> More generally, axioms could be given not only as equations . but also as formulas in a logical theory (such as first order predicate calculus or temporal logics) assuming those theories satisfy some given restrictions ( Goguen and Bursta11, 1984).</Paragraph> <Paragraph position="18"> As we have no access yet to any version of OBJ, we have decided , in a first stage, to restrict the object structure to its free component, i.e.</Paragraph> <Paragraph position="19"> only the signatures, not the axioms are defined. Therefore~ the computations have to be explicitely coded. As a consequence,, no checking of consistency is possible for the time being.</Paragraph> <Paragraph position="20"> The actual implementation is done in the ML language (Gordon et al., 1979). ML is a functional language which is fully higher-order. &quot;It has a polymorphic type discipline which combines the flexibility of programming in a typeless language with the security of compile-time type checking&quot;. Moreover, one can define one's own types , which may be abstract and/or recursive.</Paragraph> <Paragraph position="21"> To give a flavour of the ML programming style, consider a possible definition of the abstract recursive type of binary trees, with tip values of an arbitrary type (denoted by *) and non tip nodes of some other arbitrary type (denoted by **). (This exemple is taken from Gordon et al., 1979):</Paragraph> <Paragraph position="23"> and istip t =isl(rep_tree t) and tipof t= outl(reptree t) and labelof t = fSt(OUtr (reptree t)) and sonof t =snd(outr(reptree t)) This type is defined as recursive and abstract. The symbols &quot;+&quot; and &quot;#&quot; respectively, denote the two type constructors &quot;disjoint sum&quot; and &quot;cartesian product&quot; The functions &quot;abs_tree&quot; and &quot;rep_tree&quot; ,both of them of type (ty -> ty), are only available inside the definition of the abstract type &quot;tree&quot; : abs_tree maps the concrete representation of a tree unto its abstraction; rep_tree has the converse effect. Finally, isl, inl, inr, outl, outr are functions or predicates on the disjoint sum. They are defined as follows: isl: (* + **) -> bool tests membership of left summand; inl: * -> (* + **) injects into left summand; inr: * -> (** + *) injects into right summand; outl: (* + **) -> * projects out left summand; outr: (* + **) -> ** projects out right summand.</Paragraph> <Paragraph position="24"> The signature of this type is the set of operators: tiptree=-: * -> (**,*)tree comptree=-: * # (**,*)tree # (**,*)tree ->(*,**)tree istip=-: (*,**)tree ->bool tipof=-: (*,**)tree ->** l~belof=-:(*,**) tree -> * sonof=-: (*,**)tree ->((**,*)tree # (**,*) tree)) The version of ML we use (INRIA ,1984) is written in Lisp with access to the lisp system. So our object environment is constructed as a collection of abstract data types. The hierarchy between types results from the combination and enrichment of more basic types. This hierarchy creates multiple inheritance relations between types. Some examples will be given in the context of temporal objects.</Paragraph> <Paragraph position="25"> Clearly, the management of the object level must be done on top of ML The explicit coding mixes Lisp and ML. As we work in a functional environment there is no &quot;local memory&quot;.However, this is , to our viewpoint, a minor drawback compared to the advantage of the abstraction facilities.</Paragraph> <Paragraph position="26"> In a next stage, we intend to introduce the necessary axioms and perform the computations in a deductive style.</Paragraph> <Paragraph position="27"> This approach can be used for the formal representation of natural language, or as a grammar formalism . In particular the syntactical and semantical analysis can be done in terms of objects. (De Boissieu and Forest , 1985).</Paragraph> </Section> <Section position="5" start_page="108" end_page="110" type="metho"> <SectionTitle> PROCESSING TEHPORAL INFORMATION </SectionTitle> <Paragraph position="0"> Tense and time representation in natural languages is generally studied under one of the three main disciplines : logics, linguistics, and artificial intelligence. A brief overview of these 4 different vlewpolnts is given in (Bestougeff an Ligozat, 1984).</Paragraph> <Paragraph position="1"> The main problem is to choose the relevant objects in order to get an adequate abstraction. It must be strongly emphasized that we deny ourselves the right to assume any particular physical representation of time from the outset.</Paragraph> <Paragraph position="2"> The concrete properties result from the specifications.</Paragraph> <Paragraph position="3"> The choice of the basic objects is somehow arbitrary, but it should nevertheless comply to the following rules :the objects must be - close to linguistic intuit{on.</Paragraph> <Paragraph position="4"> - general enough to be reusable as such in different contexts, or give rise to new objects by enrichment or inheritance.</Paragraph> <Paragraph position="5"> The second point is required to avoid ad-hoc and independant specifications. To achieve these goals it may be necessary to define primitive objects, which do not have any lingustic interpretation but are merely buildinE blocks whose use enhances modularity In this case ,the lower level objects can be hidden to the user Keeping this in mind ,we can now proceed to the description of the linguistic motivations which are behind the construction of temporal objects The idea is to ~ive a systematic way of representing temporal information by defining abstract structures based upon the concepts and the hypotheses of a particular linguistic theory.</Paragraph> <Paragraph position="6"> The linguistic theory we rely on is that of A. Culioli (Culioli, 1980), suitably adapted to computational purposes.</Paragraph> <Paragraph position="7"> Temporal information can be informally characterized as information pertaining to the location and &quot;shape&quot; of the states and events described by natural language. In particular, this includes what is commonly referred to as aspect.</Paragraph> <Paragraph position="8"> Furthermore, temporal information in natural language has both a descriptive and an operative structure: it describes and allows the users to make systematic inferences. Among these inferences are those concerned with the ordering of events, but such inferences are only part o~ a whole set of inferences on the factuality, the degree of completion, the type of occurrence, of the situations considered. In fact, it can be argued that the ordering relations are not necessarily of a primary nature.</Paragraph> <Paragraph position="9"> Some examples will illustrate the kind of data and inferences we have in mind.</Paragraph> <Paragraph position="10"> Consider the following simple sentences: (i) John is ill.</Paragraph> <Paragraph position="11"> (2) John repairs cars.</Paragraph> <Paragraph position="12"> (33 John is repairing my car.</Paragraph> <Paragraph position="13"> (4) John repaired my car.</Paragraph> <Paragraph position="14"> (5) John has repaired my car.</Paragraph> <Paragraph position="15"> (6) John was repairing my car.</Paragraph> <Paragraph position="16"> (7) My car has been repaired.</Paragraph> <Paragraph position="17"> (83 My car is repaired now.</Paragraph> <Paragraph position="18"> (9) John was singing.</Paragraph> <Paragraph position="19"> (i0) John sang.</Paragraph> <Paragraph position="20"> (Ii) John has been singing.</Paragraph> <Paragraph position="21"> (12) Cats are smart.</Paragraph> <Paragraph position="22"> We wish to account for some basic information imparted by the use of such sentences. For example: The different uses of the simple present tense in (I) and (Z) are related to a difference between the semantic types of the verbs t_So bee ili and to repair. We will account for this difference by adapting a classification (essentially due to Vendler (1967)) into four semantic types ( state, activity, accomplishement, achievement ) . The usefulness of such a classification is further illustrated by comparing the behaviour Of the verb to repair in sentences (6, 7, 8) with that of the verb t So sin~ in (9, 10, 11).</Paragraph> <Paragraph position="23"> The comparison between (4) and (6) in relation with (7) shows the necessity of suitably representing the difference between the simple and the progressive past, at least for verbs of the type to repair a car , which are classified as accomplishments.</Paragraph> <Paragraph position="24"> To represent the difference between (4) (simple past) and (5) (present perfect), we have to express what makes (8), but not (4), derivable from (5). Reichenbach's system of temporal indexes (point of speech, point of event, point of reference ) can be used to handle this phenomenon (Reichenbach, 1957 ). It provides a way of describin~ the notion of &quot;present relevance&quot;, which is present in (5) , but not in (4).</Paragraph> <Paragraph position="25"> The contrast between (1 , 2) and (12.) points to another kind of distinction one has to make: (i) expresses a state, (2) a habit, which hold at the moment of speech. On the contrary, (12) states a general fact which is basically undetermined with respect to the moment of speech. Dependence on the time of speech is a fact of temporal deixis. Ue shall refer to it as enunsiativit Z , (following A. Culioli) . By opposition situations such as (12) will be termed aoristic.</Paragraph> <Paragraph position="26"> The preceding examples give sume idea of the type of information that has to be represented. We have deliberately played down the purely sequential type of information, which is the only type of temporal information most systems are concerned with.</Paragraph> <Paragraph position="27"> Moreover, the purely sequential type of information is mostly incomplete (this is stressed in particular by Smith (1978)). Consider the followina examples: (13) John saw his doctor this morning: he is ill.</Paragraph> <Paragraph position="28"> (14) John saw his doctor this morning: now he is ill.</Paragraph> <Paragraph position="29"> Contrastin~ (13) and (14) shows a potential indeterminacy in the relation between the two sentences. Smith (1978) claims that sentences like (I), where no explicit &quot;reference time&quot; is provided (e.g. by a time adverbial such as now) are temporally incomplete. Ue will be content at this point of our discussion with notin~ the need for a convenient notation for such a phenomenon.</Paragraph> </Section> class="xml-element"></Paper>