File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/h86-1012_metho.xml
Size: 18,212 bytes
Last Modified: 2025-10-06 14:11:55
<?xml version="1.0" standalone="yes"?> <Paper uid="H86-1012"> <Title>Sentence Direct Object Subject Objects of Prepositional Phrases Subsequent Sentences (update FocusList):</Title> <Section position="2" start_page="115" end_page="115" type="metho"> <SectionTitle> 2. Uses of Focusing </SectionTitle> <Paragraph position="0"> Focusing is used ill four places in PUNDIT -- for definite pronouns, for elided noun phrases, for one-anaphora, and for implicit associates.</Paragraph> <Paragraph position="1"> As stated above, reference resolution is called by the semantic interpreter when it is in the process of filling a thematic role. Reference resolution proposes a referent for the constituent associated with that role. For example, if the verb is replace and the semantic interpreter is filling the role of agent, reference resolution would be called for the surface syntactic subject. After a proposed referent is chosen tbr the subject, any specitic selectional restrictions on the agent of replace (such as the constraint that the agent has to be a human being) are checked. If the proposed referent fails selection, backtracking into reference resolution occurs and another referent is selected.</Paragraph> <Paragraph position="2"> Cooperation between reference resolution and the semantic interpreter is discussed in detail in \[Pahner1986\]. The semantic interpreter itself is discussed in \[Palmer1985\].</Paragraph> <Paragraph position="3"> 2.1. Pronouns and Elided Noun Phrases Pronoun resolution is done by instantiating the referent of the pronoun to the first member of the FocusList unless the instantiation would violate syntactic constraints on coreferentiality. 3 (As noted above, if the proposed referent fails selection,</Paragraph> </Section> <Section position="3" start_page="115" end_page="122" type="metho"> <SectionTitle> 3 At the moment, the syntactic constraints on coreferentiality used by the system are very simple. If the direct object is </SectionTitle> <Paragraph position="0"> reflexive it must be instantiated to the same referent, as the subject. Otherwise it must be a different referent. Obviously, as the system is extended to cover sentences with more complex structures, a more sophisticated treatment of syntactic constraints on Focusing and Reference Resolution in PUNDIT backtracking occurs, and 'another referent is chosen.) The reference resolution situation in the maintenance texts however, is complicated by the fact that there are very few overt pronouns. Rather, in contexts where a noun phrase would be expected, there is often elision, or a zero-np as in Won't power up and Has not failed since Hill's arrival. Zeroes are handled exactly as if they were pronouns. The hypothesis that elided noun phrases can be treated in the same way as pronouns is consistent with previous claims in \[Gundel1980 \] and \[Kameyama1985\] that in languages such as Russian and Japanese, which regularly allow zero-np's, the zero corresponds to the focus. If these claims are correct, it is not surprising that in a sub-language like that found in the maintenance texts, which also allows zero-np's, the zero should correspond to the focus.</Paragraph> <Paragraph position="1"> Another kind of pronoun (or zero) also occurs in the maintenance texts, which is not associated with the local focus, but is concerned with global aspects of the text.</Paragraph> <Paragraph position="2"> For example, the field engineer is a default agent in the maintenance domain, as in Thinks problem is in head select area. This is handled by defining default elided referents for the domain. The referent is instantiated to one of these if no suitable candidate can be found in the FocusList.</Paragraph> <Section position="1" start_page="116" end_page="120" type="sub_section"> <SectionTitle> 2.2. Implicit Aaaoclatea </SectionTitle> <Paragraph position="0"> Focttni,,g is also used in the processing of certain full noun phrases, both definite and inde\[i,,ite, which involve implicit associates. The term implicit associates refers to tile relationship between a disk drive and the motor in examples like The field engineer installed a disk drive. The motor failed. It is natural for a human reader to infer that the motor is part of the disk drive. In order to capture this intuition, it is necessary for the system to relate the motor to the disk drive of which it is part. Relationships of this kind have been extensively discussed in the literature on definite reference. For example, implicit associates correspond to inferrable entities described by \[Prince1981\], the associated use definites of \[Hawkins1978\], and the associated type of implicit backwards specification discussed by \[Sidner1979\]. Sidner suggests that implicit associates should be found among the entities in focus. Thus, when the system encounters a definite noun phrase mentioned for the first time, it sequentially examines each member of the FoeusList to determine if it is a possible associate of the current noun phrase. The specific association relationships (such as part-whole, objectproperty, and so on) are defined in the knowledge base.</Paragraph> <Paragraph position="1"> This mechanism is also used in the processing of certain indefinite noun phrases.</Paragraph> <Paragraph position="2"> In every domain, it is claimed, there are certain types of entities which can be classified as dependent. By this is meant an entity which is not typically mentioned on its own, but which is referred to in connection with another entity, on which it is dependent. In the maintenance domain, for example, parts such as keyboards, motors, and printed circuit boards are dependent, since when they are mentioned, they are normally mentioned as being part of something else, such as a console, disk drive, or coindexing using some of the insights of \[Reinhart1976\], and \[Chomsky1981\] will be required. Focusing and Reference Resolution in PUNDIT printer. 4 In an example like The system is down. The field engineer replaced a bad printed circuit, board, it seems clear that a relai,ionship between the printed circuit board and the system should be represented. Upon encountering a reference to a dependeni, eni,ity like the printed circuii, board, the system looks through the FocusList~ 13o determine if any previously mentioned entities can be associated with a printed circuii, board, and if so, the relationship is made explicit. If no associate has been mentioned, the entity will be associated wii,h a default defined in the knowledge base. For example, in the maintenance domain, parts are defined as dependent entities, and in the absence of an explicitly mentioned associate, they are represented as associated with the system.</Paragraph> <Paragraph position="3"> 2.3- One-Anaphora PUNDIT extends focusing to the analysis of one-anaphora following \[Dahl1984\], which claims that focus is central to the interpretation of one-anaphora. Specifically, the referent of a one-anaphoric noun phrase (e.g., the blue one, some large ones) is claimed to be a member or members of a set which is the focus of the current clause.</Paragraph> <Paragraph position="4"> For example, in Installed two disk drives. One failed, the set of two disk drives is assumed to be the focus of One failed, and the disk drive I hat failed is a member of thai, set. This analysis can be contrasted with thai, of \[ll:,,lliday1976\], which treats one-anaphora as a surface syntactic phenomenon, completely distinct from reference.</Paragraph> <Paragraph position="5"> It is more consistent with the theoretical discussions of \[1976\], and \[Webber1983\]. 5 These analyses advocate a discourse-pragmatic I,reatment for both one-anaphora and definite pronouns. The main computational advantage of treating one-anaphora as a discourse problem is that, since definite pronouns are treated this way, little modification is needed to the basic anaphora mechanism to allow it to handle oneanaphora. In contrast, an implementation following the account of Halliday and Hasan would be much more complex and specitic to one-anaphora.</Paragraph> <Paragraph position="6"> The process of reference resolution for one-anaphora occurs in two stages. The first stage is resolution of the anaphor, one, and this is the stage that involves focusing. When the system processes the head noun one, it instantiates it with the category of the first set in the FocusList; (disk drive in this example). 6 In other words, the referent of the noun phrase must be a member of the previously mentioned set of disk drives. The second stage of reference resolution for one-anaphora assigns a specific disk drive as the referent of the entire noun phrase, using the same procedures that would be used for a full noun phrase, a disk drive.</Paragraph> <Paragraph position="7"> The extension of the system to one-anaphora provides the clearest motivation for the choice of a syntactic focus in PUNDIT. Before I discuss the kinds of examples 4 There are exceptions to this generalization. For example, in a sentence like field engineer ordered motor, the motor on order is not part of anything else {yet). In PUNDIT, these cases are assumed to depend on the verb meaning. In this example, the object of ordered is categorized as nort-specifie, and reference resolution is not called. See \[Palmer1986\] for details. s Although not Webber's analysis in \[Webber19781, which advocates an approach similar to Halliday and Hasan's. e Currently the only sets in the FocusL|st are those which were explictly mentioned in the text. ttowever, as pointed out by \[Dahl1982.\], and \[Webber1983, Dahl1984\], other sets besides those explictly mentioned are available for anaphoric reference. These have not yet been added to the system.</Paragraph> <Paragraph position="8"> Focusing and Reference Resolution in PUNDIT which support this approach, I will briefly describe the relevant part of the focusing algorithm based on thematic roles which is proposed by\[Sidner1979\]. After each sentence, the focusing algorithms order the elements in the sentence in the order in which they are to be considered as potential loci in the next sentence. Sidner's ordering and that of PUNDIT are compared in Figure 1.</Paragraph> <Paragraph position="9"> The idea that surface syntax is important in focusing comes from a suggestion by \[Erteschik-Shir1979\], that every sentence has a dominant syntactic constituent, which provides a default topic for tile following utterance 7. Intuitively, the dominant constituent can be thought of as the one to which tile hearer's attention is primarily drawn. Operationally the dominance of a constituent is tested by seeing if a referent with that constituent as the antecedent can be cooperatively referred to with an unstressed pronoun in the following sentence.</Paragraph> <Paragraph position="10"> The feature of onc-anaphora which motivates the syntactic algorithm is that the availability of certain noun phrases as antecedents for onc-anaphora is strongly affected by surface word 'order variations which change syntactic relations, but which do not affect thematic roles. If thematic roles are crucial for focusing, then this pattern would not be observed.</Paragraph> <Paragraph position="11"> Consider the following examples: (i) A: I'd like to plug in this lamp, but the bookcases blocking are outlets.</Paragraph> <Paragraph position="12"> the electrical \]3: Well, can we move one? (2) A: I'd like to plug in this lamp, but the electrical outlets are blocked by the book- null Focusing andReference Resolution in PUNDIT B: Well, can we move one? In (1), rnosl, informants report an initial impression that B is talking about moving the electrical outlets. This does not happen for (2). This indicates that the expected focus following (1) A is the outlets, while it is the bookcases in (1) B. However, in each case, the thematic roles are the same, so an algorithm based on thematic roles would predict no difference between (1) and (2).</Paragraph> <Paragraph position="13"> Similar examples using definite pronouns do not seem to exhibit the same effect.</Paragraph> <Paragraph position="14"> In {3) and (4), they seems to be ambiguous, until world knowledge is brought in. Thus, in order to handle definite pronouns alone, either algorithm would be adequate.</Paragraph> <Paragraph position="15"> (3) A: B: (4) A: B: I'd like to plug in this lamp, but bookcases are blocking the electrical outlets. Well, can we move them? I'd like to plug in this lamp, but the electrical outlets are blocked by the bookcases. null Well, can we move them? (5) and (6)illustrate another example with one-anaphora. In (5) but not in (6), the initial interpretation seems to be that a bug has lost its leaves. As in (1) and (2), however, the thematic roles are the same, so a thematic-role-based algorithm would predict no difference between the sentences.</Paragraph> <Paragraph position="16"> (5) The plants are swarming with the bugs. One's already lost all its leaves. (6) The bugs are swarming over the plants. One's already lost all its leaves. In addition to theoretical considerations, there are a number of obvious practical advantages to defining tbcus on constituents rather than on thematic roles. For example, constituents can often be found more reliably than thematic roles. In addition, thematic roles hage to be defined individually for each verb. 8 Since thematic roles for verbs can vary across domains, defining focus on syntax makes it less domain dependent, and hence more portable. While in principle focus based on thematic roles does not have to be domain-dependent, a general algorithm based on thematic roles would have to rely on a a general, domain-neutral specification of all possible thematic roles and their behavior in focusing. Until such a specification exists, a thematic-role based focusing algorithm must be redefined for each new domain as thedomain requires the definition of new thematic roles, and because of this, will continue to be less portable than an approach based oll syntax.</Paragraph> <Paragraph position="17"> 8 Of course, some generalizations can be made about how arguments map to thematic roles. For example, tile basic definition of the thematic role theme is that, for a verb of motion, the theme is the argument that moves. More generally, the theme is tile argument that is most affected by the action of the verb, and its typical syntactic manifestation is as a direct object of a transitive verb, or the subject of an intransitive verb. However, even if these generalizations are accurate, they are no more than guidelines for finding tile themes of verbs. The verbs still have to be classified individually. Focusing and Reference Resolution in PUNDIT 3. Implementation 3.1. The FocusList and CurrentContext The data structures that retain information from sentence to sentence in the PUNDIT system are the FocuaLiat and the CurrentContext. The FoeuaLiat is a list of all the discourse entities which are eligible to be considered as foci, listed in the order in which they are to be considered. For example, after a Sentence like The field engineer replaced the disk drive, the following FoeuaLiat would be created. \[\[ev eat1 \],\[drivel \],\[engineer 111 The members of the FocusList are unique identifiers that have been assigned to the three discourse entities -- the disk drive, the field engineer, and the event. The CurrentContext contains the information that has been conveyed by the discourse so far. After the example above, the CurrentContext would contain three types of information: (1) Discourse id's, which represent classifications of entities. For example, id(field^engineer,\[engineerl\]) means that \[engineerl\] is a a field engineer. 9 (2) Facts about part-whole relationships (hasparts). In the example in Figure 2, notice that the lack of a representation of time results in both drives being part of the system, which they are, but not at the same time. W~,rk to remedy this problem is in progress.</Paragraph> <Paragraph position="18"> (3) Representations of the events in the discourse. For example, if the event is that of a disk drive having been replaced, the representation consists of a unique identifier (\[eventl\]), the surface verb (replace(time(_))), and the decomposition of the verb with its (known) arguments instantiated 1deg. The thematic roles involved are object1, the replaced disk drive, objeet2, the replacement disk drive, time and instrument which are uninstantiated, and agent, the field engineer. (See\[Palmer1986\], for details of this representation). Figure 2 illustrates how the Curr.entContext looks after the discourse-initial sentence, The field engineer replaced the disk drive.</Paragraph> </Section> <Section position="2" start_page="120" end_page="122" type="sub_section"> <SectionTitle> 3.2. The Focusing Algorithm </SectionTitle> <Paragraph position="0"> The focusing algorithrn used in this system resembles that of \[Sidner1979\], although it does not use the actor focus and uses surface syntax rather than thematic roles, as discussed above. The focusing algorithm is illustrated in Figure 3. Removing candidates from the FoeuaLiat when they are no longer eligible to be the referents of pronouns is not currently done in this system. The conditions determining this have not been fully investigated, and since the texts involved are short, few problems are created in practice. This problem will be addressed by future research.</Paragraph> <Paragraph position="1"> m field'englneer is an example of the representation used in PUNDIT for an idiom.</Paragraph> <Paragraph position="2"> ,s 8176 is an uninstantiated variable representing the time off the replacement. It appears in several places, such M inclnded(objectZ(\[driveZD,tlme(_St7O)), and mlsslng(objectl(\[driveli),time(_8176)).</Paragraph> <Paragraph position="3"> If there is a pronoun in the current sentence, move the focus to the referent of the pronoun. If there is no pronoun, retain the focus from the previous sentence. Order the other elements in the sentence as in (1)-</Paragraph> </Section> </Section> class="xml-element"></Paper>