File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/90/c90-3026_abstr.xml
Size: 15,855 bytes
Last Modified: 2025-10-06 13:46:58
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-3026"> <Title>M.I.T Artificial Intelligence Laboratory</Title> <Section position="1" start_page="0" end_page="145" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> The hierarchy of salience of the items of the knowledge assumed by the speaker to be shared by him and by the hearer constitutes one aspect of a dynamic account of discourse (Sect. I). It is claimed that a representation of this hierarchy is a good support for discourse analysis (reference assignement, Sect. 2) and for discourse production (pronominallzatlon, definite description, Sect. 3).</Paragraph> <Section position="1" start_page="0" end_page="144" type="sub_section"> <SectionTitle> 1.1 In studying communication, it must </SectionTitle> <Paragraph position="0"> be distinguished between the speaker's own image, the hearer's image and the assumptions the speaker has made about the hearer's image of the world. In the very process of discourse, the image of the world undergo changes of different kinds: new objects, relations, etc. are added to the repertoirs on the basis of the content of what has just been said, the universe of discourse may be restricted in that a certain of phenome:,a of a particular kind is marked as relevant for further discourse whereas the other elements are disregarded, or the salience (activation, foregrounding) of the items is changed in the sense of being easily accessible in memory (see Sgall, Haji~ov~ and Panevov~, 1986, p. 54f). I% has been shown (Haji~ov~ and Vrhov~, 1982; HajiSov~, 1987) that the changes of salience are dependent to a great extent on the topic/focus articulation of the utterance. As a matter of fact, most algorithms for anaphora resolution work with the notion of salience (el., e.g., Hobbs, 1976; Sidner, 1979; Brennan, Friedman and Pollard, 1989); however, while in most of these approaches the degrees of salience are given only syntactically, the hierarchy of activation in our mode\] is elso determined by the toplc/focus articulation of the sentence.</Paragraph> <Paragraph position="1"> Leaving aside the distinction between the contextually hound and non-bound elements of the utterance within Its topic and focus parts (for the relevance of contextual boundness in this respect, see HaJi~ov~, Hoskovec and Sgall, in press), these relationships can be summarized as follows: (1) the items referred to in the focus of the utterance be it by a noun phrase or by a stressed pronoun receive the highest degree of salience; (ii) the items referred to by a noun phrase in the topic part of the utterance are activated one degree less than the items referred to in the focus part; (ill) a pronominal reference to an item in the topic part of the utterance keeps the activation unchanged; (iv) the activation of the items not mentioned in the given utterance fades away; the fading is steeper if the given item was the most activated item after the preceding utterance and less steep if the given item preserved high salience for some of the previous utterances, being mentioned in its topic.</Paragraph> <Paragraph position="2"> We do not attempt to cover &quot;VP-anaphora since in the model of the stock of knowledge assumed by the speaker to be shared by him and the hearer (SSKK) we work - for the time being - only with mental images of objects, rather than wlth those of events.</Paragraph> </Section> <Section position="2" start_page="144" end_page="144" type="sub_section"> <SectionTitle> 1.2 Several thresholds can be es- </SectionTitle> <Paragraph position="0"> tablished on the hierarchical structure of the activated part of SSK; at \].east two of them are important in the context of the present paper. One threshold characterizes those items of the SSK that are activated to such an extent that they can be referred to in the topic part of the following utterance; this is to say that the salience of these items is large enough for the hearer to identify easily their referentsdeg The second (higher) threshold delimits that part of SSK the items of which can be referred to by pronouns; their salience is assumed by the hearer to be large enough for the speaker to assign the reference in a straightforward way.</Paragraph> </Section> <Section position="3" start_page="144" end_page="144" type="sub_section"> <SectionTitle> 1.3 The representation of the discourse </SectionTitle> <Paragraph position="0"> in terms of the hierarchy of activation of the elements of SSK suggests itself to be used for a split up of the discourse into segments; the segments correspond to those parts of the discourse for which there is a characteristic grouping of most activated items. These most activated items in each segment can then be regarded as the &quot;topic&quot; of the given segment; items which may be understood as the &quot;topic(s)&quot; of the discourse can then be computed on the base of the 'topic(s)&quot; of the segments.</Paragraph> <Paragraph position="1"> The ideas outlined in Sect. 2 and 3 will be illustrated by an analysis of multifarious examples; the results of those sections will serve as a theoretical base for further practical applications in various systems.</Paragraph> <Paragraph position="2"> There are two competitors in the first sentence, both NP's. The antecedent of the ellipsis in the second sentence is the subject of the first sentence.</Paragraph> <Paragraph position="3"> The antecedents of relative pronouns are easy to compute, too. By a thorough investigation of a large amount of technical texts we found that relative pronouns almost certainly (about 90-95 %) refer to the head of the closest preceding NP which has appropriate morphematic categories (gender, number,etc.): &quot;Pou~iv&me disk z polykarbonAtu, ~ Jsme pgedem oSistili.&quot; &quot;We use a ~ of a polycarbonate which we've cleaned before.&quot; An important role is played also by a tendency to keep the syntactic dependency hierarchy in referring - the antecedent of a pronoun in the subordinate clause is to be found on the higher or equal layer of the hierarchy: &quot;Pro r~, u~ivan~ syst@mem s rutinou, kter& mu pom&h~, m~e...&quot; &quot;The ro~ used by the system with the routine which helps it, can...&quot; Here, the pronoun &quot;it&quot; refers to &quot;program&quot; rather than to &quot;system&quot;.</Paragraph> <Paragraph position="4"> 2.I SSK can help to solve the referent assignment in discourse analysis. If we want to show a most suitable way of application of SSK in the context of other usual methods of solving this problem, we should remind first , f all the assignment based on syntactic relations: In Czech coordinated clauses the subject of the second (or third, fourth etc.) clause is usually deleted. Then it is (more or less) unambiguously understood to be the same as the subject of the first clause. The same holds for two successive sentences ,</Paragraph> </Section> <Section position="4" start_page="144" end_page="145" type="sub_section"> <SectionTitle> 2.2 When working within the framework </SectionTitle> <Paragraph position="0"> of the functional generative description (see Sgall, HaJi~ov& and Panevov&, 1986), the solution of anaphora can be supported by the topic-focus articulation and the hierarchy of activation of the items of the SSK. A good help for finding the pronoun's antecedent is the form of the pronoun used in the text. The strong form of a pronoun (ten, tento = this; sebe = himself;...) refers - in technical texts almost unamblguously - to the focus of the preceding sentence, the weak (unstressed) form implies referring preferably to the topic: &quot;Nejslab~Im 5l&nkem v cel~m ~et~zu je vst~. Ten Je pPi~inou mnoha probl~m8.&quot; &quot;The poorest member in the whole chain is the i__n_put. This causes a lot of problems.&quot; /the strong form &quot;this (ten)&quot; refers to &quot;input&quot;/; &quot;S~st@m vyvolAv~ rekursivnl program. M6~ete h__q u~it,...&quot; &quot;The system calls a recursive program. You can use it ,... &quot; / &quot;it&quot; refers to &quot;system&quot; in primary case/; The antecedent is not &quot;DAT players&quot;, which can be computed only on the basis of factual knowledge - if you know DAT's are newer than CD's.</Paragraph> <Paragraph position="1"> These strategies are relatively reliable (80-85%) and can be used in discourse analysis. null 2.3 When we take into account also other aspects of the role of SSK in discourse analysis, we can base the algorithm of reference assignment on the following strategies: (I) if the subject of the sentence has a null form, the subject of the preceding clause is referred to, as long as the grammatical agreement is preserved; (2) in case of a relative pronoun we try to find the head of the closest preceding noun phrase as the antecedent; (3) if the referring expression is a ' weak pronoun, we look for the antecedent in the topic of the preceding clause, in case of a strong pronoun (or &quot;adjective pronoun&quot; in the noun phrase as &quot;this man&quot;) we investigate the focus; (4) if there are more competitors after step (3) or if none of the steps (I) through (3) can be used, we apply SSK in the form of a list of NP's from the preceding text (from the beginning of the actual paragraph) with their respective degrees of activity and choose the most activated item with the congruent morphological categories. If we cannot find an item &quot;activated enough&quot; (the concrete value, or the difference of values, is to be determined independently on the way the activation is evaluated), we prefer leaving the anaphora unresolved in order to prevent wrong solutions of the &quot;global&quot; references (to a preceding clause, sentence, an action identified by a verb, a coordination of items etc.) or references which cannot be solved without the use of semantics, e.g.: 3. The discourse production has more freedom than analysis, because the speaker can choose the means while describing his ideas. Of course, he has to take care of the hearer to enable him to interpret the text easily and, if possible, unambiguously; at the same time, he should not repeat unnecessarily definite NP's. The main criterion in the speaker's choice between the use of a pronoun and a definite NP may be the actual state of SSE. We deal with technical texts only but we believe the basic ideas hold for other types of texts as well.</Paragraph> </Section> <Section position="5" start_page="145" end_page="145" type="sub_section"> <SectionTitle> 3.1 When producing a sentence of a </SectionTitle> <Paragraph position="0"> continuous text, the speaker can use three types of referring expressions - weak pronouns, strong pronouns (including the demonstrative and relative ones) and more or less complex definite expressions (compare &quot;John&quot; with &quot;the boy who played with a ball yesterday as I have told you...&quot;). Depending on the actual state of SSK he chooses the relatively &quot;weakest&quot; means (from a weak pronoun to a complex description) the use of which enables the hearer to find the referent correctly. Two aspects of SSK are important in this choice: (a) the degree of activation /da(0) of the object (referent) in SSK - an important role is played by the minimal degree of activation (MIN), i.e., the threshold below which it is not possible to refer to objects by pronouns (see 1.2); (b) the existence of &quot;competitors&quot; i.e. objects differing in activation only by degree ~ (see Haji~ov~ and Vrbov~, 1982) and having the same morphological categories.</Paragraph> <Paragraph position="1"> &quot;PPehr&va6e PAT jsou mnohem dra~Ni nee CD pPehr&va~e. Toto nov@ za~izenl jeNt@ v@robci nebylo pPijato.&quot; &quot;The PAT la~ are much more expensive than CD players. This new device is not yet accepted by producers.&quot;</Paragraph> </Section> <Section position="6" start_page="145" end_page="145" type="sub_section"> <SectionTitle> 3.2 We claim that on the background of </SectionTitle> <Paragraph position="0"> these two aspects we can find the following four cases involved in discourse production: (i) da(O)~ MIN (as a special case this holds for &quot;new objects&quot;): 146 3 In a technical text, the speaker prefers to use a definite NP. The degree of its com~ plexlty depends on the presence of possible competitors.</Paragraph> <Paragraph position="1"> &quot;Vstupnl data (0) se m~nl pomocl programu D-TYPE. Zpo6&tku vyvol&vg subrutinu D-START, kter& ukl&d& data (0) do pam~ti.&quot; &quot;The .in~ data (0) are changed by the D-TYPE program. In the beginning it calls the D.-START subroutine which loads the data (0) into the memory.&quot;; (I) ... ~ (02 ) Nemajl stejnou scuborovou strukturu.&quot; ... The Z (02 ) haven't the same file structure.&quot; (2) ... ~ (0 2 ) Pou~fva~ ~ (01 ) k ...&quot; ... The X (0 2 ) use them (01 ) for...&quot;; (b) the expression referring to 01 has the position of subject in C: When referring to 01, a strong pronoun has to be used, in case of 02 a weak pronoun will do.</Paragraph> <Paragraph position="2"> (ii) da(O)>MIN and the object 0 has no competitor or the competitor is &quot;far enough&quot;: A weak pronoun can be used in this case. &quot;Vstupnl data (0) se m~nl pomoe~ programu D-TYPE. M~nl ~_ (0) na speei&in~ typ.&quot; &quot;The input data (0) are changed by the D-TYPE program. It transforms them (0) into a special type.&quot;; (iii) da(O1)>MIN , the object 01 has a competitor 02, none of them having the maximum degree of activation (MA_~X) : In this case a pronoun does not help. A definite NP (at least for one object) has to be used.</Paragraph> <Paragraph position="3"> &quot;Oba ~ (0 2 ) sdilejl n~kter@ souborz (01). ~ (01 ) ~ (02 ) pomAhajl k ...&quot; &quot;Both systems (02) share some files (01). Those (01 ) help them (02 ) to ...&quot;; (s) cases (a),(b) do not hold: In this situation the competition cannot be &quot;solved&quot; by syntactic means. The solution of the problem is the same as in (iii).</Paragraph> <Paragraph position="4"> The difference will take place if we try to start the s st_~(O1)/utillties(02).&quot; ; (iv) da(O1)=MAX , da(O2)=MAX-1 and 01 competes with 02: This is the most dlffleult situation. We can divide it into three subcases by the way referring expressions are used in the following clause (sentence) C:</Paragraph> </Section> <Section position="7" start_page="145" end_page="145" type="sub_section"> <SectionTitle> 3.3 As we have already stated, our </SectionTitle> <Paragraph position="0"> study is the first step on the way to a complex account of the impact of SSK in discourse production. To handle the interplay between pronouns and definite NP's in all details, one has to state the relevant differences in the activation of competitors (in various types of sentences), to consider the possibility of the marked use of strong pronouns and definite NP's and many other problems.</Paragraph> <Paragraph position="1"> (a) the expression referring to 02 has the position of subject in C: Here we face the &quot;subject-preserving tendency&quot;, which is very common in continuous texts. This helps to avoid the possible ambiguity between competitors so that weak pronouns can refer to both objects (01 ,02).</Paragraph> <Paragraph position="2"> &quot;Oba ~Kst6_~ (02 ) se li~i v utilitAch (01). &quot;Both s~ystems (02 ) differ in utilities (01).</Paragraph> </Section> </Section> class="xml-element"></Paper>