File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/w99-0106_metho.xml
Size: 19,862 bytes
Last Modified: 2025-10-06 14:15:26
<?xml version="1.0" standalone="yes"?> <Paper uid="W99-0106"> <Title>Discourse Structure and Co-Reference: An Empirical Study</Title> <Section position="4" start_page="0" end_page="46" type="metho"> <SectionTitle> 1. A COLLECT module determines a list of poten- </SectionTitle> <Paragraph position="0"> tial antecedents (LPA) for each anaphor (pronoun, definite noun, proper name, etc.) that have the potential to resolve it, 2. A FILTER module eliminates referees incompatible with the anaphor f~m the LPA. 3. A PREFEI~NCE module detennm&quot; es the most likely antecedent on the basis of an Ordering policy.</Paragraph> <Paragraph position="1"> In most cases,, the COLLECT module determines an LPA by enumerating all antecedents in a window of text that pLeced__es the anaphor under scrutiny (Hobbs, 1978; Lappin and Leass, 1994; Mitkov, 1997; Kameyama, 1997; Ge et al., 1998). This window can be as small as two or three sentences or as large as the entire preceding text. The FILTER module usually imposes semantic constraints by requiring that the anaphor and potential antecedents have the same number and gender, that selectional restrictions are obeyed, etc. The PREFERENCE module imposes preferences on potential antecedents on the basis of their grammatical roles, parallelism, frequency, proximity, etc. In some cases, anaphora resolution systems implement these modules explicitly (I-Iobbs, 1978; Lappin and Leass, 1994; Mitkov, 1997; Kameyama, 1997). In other cases, these modules are integrated by means of statistical (Ge et al., 1998) or uncertainty reasoning techniques (Mitkov, 1997).</Paragraph> <Paragraph position="2"> The fact that current anaphora resolution systems rely exclusively on the linear nature of texts in Order to determine the LPA of an anaphor seems odd, given that several studies have claimed that there is a strong relation between discourse structure and reference (Sidner, 1981; Gmsz and Sidner, 1986; Grosz et aL, 1995; Fox, 1987; Vonk et al., 1992; Azzam et al., 1998; Hitzeman and P .oesio, 1998). These studies claim, on the one hand, that the use of referents in naturally occurring texts imposes constmints on the interpretation of discourse; and, on the other, that the structure of discourse constrains the HAs to which anaphors can be resolved. The oddness of the situation can be explained by the fact that both groups seem primafacie to be righL Empkical experiments studies that employ linear techniques for determining the LPAs of anaphom report recall and precision anaphora resolution results in the range of 80% ~in and I.eass, 1994; Ge et al., 1998). Empirical experiments that investigated the relation between discourse structure and reference also claim that by exploiting the structure of discourse one has the potential of determining correct co-referential links for more than 80% of the referential expressions (Fox, 1987; Cristea et al., 1998) although to date, no discourse-based anaphora resolution system has been implemented. Since no di- null rect comparison of these two classes of approaches has been made, * it is difficult to determine which group is right, and what method is the best.</Paragraph> <Paragraph position="3"> In this paper, we attempt to fill this gap by empirically comparing the potential of linear- and hierarchical models of discourse to correctly establish co-referential links in texts, and hence, their potentiai to correctly resolve anaphors. Since it is likely that both linear- and discourse-based anaphora resolution systems can implement similar FILTER and PREFERENCE strategies, we focus here only on the strategies that can be used to COLLECT lists of potential antecedents. Specifically, we focus on determining whether discourse theories can help an anaphora resolution system determine LPAs that are &quot;better&quot; than the LPAs that can be computed from a linear interpretation of texts. Section 2 outlines the theoretical as.~umptions of our empirical investigation. Section 3 describes our experiment. We conclude with a discussion of the results.</Paragraph> </Section> <Section position="5" start_page="46" end_page="48" type="metho"> <SectionTitle> 2 Background </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="46" end_page="46" type="sub_section"> <SectionTitle> 2.1 Assumptions </SectionTitle> <Paragraph position="0"> Our approach is based on the following assumptions: null 1. For each anaphor in a text, an anaphora resolution s~,stem must produce an LPA that conrains a referent to which the anaphor can be resolved. The size of this LPA varies from system to system, depending on the theory a system implements.</Paragraph> <Paragraph position="1"> 2. The smaller the LPA (while retaining a correct antecedent), the less likely that errors in the FILTER and PREFERENCE modules will affect the ability of a system to select the appropriate referent.</Paragraph> <Paragraph position="2"> . Theory A is better than theory B for the task of reference resolution if theory A produces LPAs that contain more antecedents to which anaphors can be correctly resolved than theory B, and if the LPAs produced by theory A are smaller than those produced by theory B. For example, if for a given anapbor, theory A produces an LPA that contains a referee to which the anaphor can be resolved, while theory B produces an LPA that does not contain such a referee, theory A is better than theory B. Moreover, if for a given anaphor, theory A produces an LPA with two referees and theory B produces an LPA with seven referees (each LPA containing a referee to which the. anaphor can be resolved), theory A is considered better than theory B because it has a higher probability of solving that anaphor correctly.</Paragraph> <Paragraph position="3"> We *consider two Classes of models for determining the LPAs of anaphors in a text: Linearok models. This is a class of linear models in which the LPAs include all the references found in the discourse unit under scrutiny and the k discourse:ufiits that immediately precede it. Linear-O models an approach that assumes that all anaphors can be resolved intra-uuit; Linear-I models an appreach that corresponds roughly to centering (Grosz et aL, 1995). Linear-k is consistent with the assumptions that underlie most current anaphora resolution systems, which look back k units in order to resolve an anaphor.</Paragraph> <Paragraph position="4"> Discourse-VT-k models. In this class of models, LPAs include all the referential expressions found in the discourse unit under scrutiny and the k discourse units that hierarchically precede it. The units that hierarchically precede a given unit are determined according to Veins Theory (VT) (Cristea et al., 1998), which is described briefly below.</Paragraph> </Section> <Section position="2" start_page="46" end_page="47" type="sub_section"> <SectionTitle> 2.2 Veins Theory </SectionTitle> <Paragraph position="0"> VT extends and formalizes the relation between discourse structure and reference proposed by Fox (1987). It identifies &quot;veins&quot;, i.e., chains of elementary discourse units, over discourse stmctme trees that are built according to the requirements put forth in Rhetorical Sa-acture Theory (RST) (Mann and Thompson, 198g).</Paragraph> <Paragraph position="1"> One of the conjectures of VT is that the vein expression of an elementary discourse unit provides a coherent &quot;abstract&quot; of the discourse fragment that contains that unit. As an internally coherent discourse fragment, all anaphors and referential expressions (REs) in a unit must be resolved to referees that occur in the text subsumed by the units in the vein. This conjecture is consistent with Fox's view (1987) that the units that contain referees to which anaphors can be resolved are determined by the nuclearity of the discourse units that precede the anaphors and the.overall structure of discourse. According to VT, REs of both satellites and nuclei can access referees of immediately preceding nucleus nodes. REs of nuclei can only access referees of preceding nuclei nodes and of directly subordinated satellite nodes. And the interposition of a nucleus after a satellite blocks the accessibility of the satellite for all nodes that are lower in the corresponding discourse structure (see (Cristea et el., 1998) for a full definition).</Paragraph> <Paragraph position="2"> Hence, the fundamental intuition underlying VT is that the RST-specific distinction between nuclei and satellites constrains the range of referents to which anaphors can be resolved; in other words, the nucleus-satellite distinction induces for each anaphor (and each referential expression) a Domain of Referential Accessibility (DRA). For each anaphor a in a discourse unit u, VT hypothesizes that a can be resolved by examining referential expressions that were used in a subset of the discourse units that precede u; this subset is called the DRA of u. For any elementary unit u in a text. the corresponding DRA is computed automatically from the rhetorical representation of that text in two steps: . Heads for each node are computed bottom-up over the rhetorical representation tree. Heads * of elementary discoune units are the units themselves. Heads of internal nodes, i.e., discourse spans, are computed by taking the union * of the heads of the immediate child nodes that are nuclei. For example, for the text in Figure 1, whose rhetorical structure is shown in Figure 2, the head of span \[5,7\] is unit 5 because the head of the immediate nucleus, the elementary unit 5, is 5. However, the head of span \[6,7\] is the list (6,7) because both immediate children are nuclei of a multlnuclesr relation. null . Using the results of step 1, Vein expressions are computed top.down for each node in the tree. The vein of the root is its head. Veins of child nodes are computed recursively according to the rules described by Cristea et al.(1998). The DRA of a unit u is given by the units in the vein that precede u.</Paragraph> <Paragraph position="3"> For example, for the text and RST tree in Figures 1 and 2, the vein expression of unit 3, * which contains units 1 and 3, suggests that anaphors from unit 3 should be resolved only to referential expressions in units I and 3. Because unit 2 is a satellite to unit 1, it is considered to be &quot;blocked&quot; to referential links from unit 3. In contrast, the DRA of unit 9, consisting of units I, 8, and 9, reflects the intuition that anaphon from unit 9 can be resolved only to referential expressions from unit 1, which 1. l.~ch--/D. ,.,.\]. cop ~oh.o.~oh~o.</Paragraph> <Paragraph position="4"> manager, moved co ~, a small b~o~chnology concern hero, 2. to becgme~t~ presLdent a'ncl chief ol~srlt~ng officer. I 3. J Pit. c~.asey, 46 years oXcl,\] was~ presAdent; of J~'I HCNoxL Phantaceutxcal subsJ,ldJ.ary, | 4. which ,,qm merged w~th another ~r~r urtlg, OrCho pharnacsut:PSca/ Corp., chLa year in * cosC-cut.t.Jng move.</Paragraph> <Paragraph position="5"> S. Hr. Case~ succeeds N.~rrett, SO,&quot; 6. Mr. l)a~\]c'et, lr, z'em,,;x~ chief execut4ve off.icer 7. and becomes chA~rnan, me 'I * 9. h.ln.e, th, nov. co 10. &quot;becsuse\['h~\]saw hee11:h care mov4ng Coward tec~hnologiea J.$.ke ~gene therapy products.</Paragraph> <Paragraph position="6"> 11. J'X'~be.l.Aeve the. ~'.he fj.eld is energing and As profited I;o brsO. loose, ~.. \['~ mtid.</Paragraph> <Paragraph position="7"> Figure h An example of text and its elementary units. The referential expressions surroundedby boxes and ellipses correspond to two distinct co-referential equivalence classes. Referential expressions surrounded by boxes refer to Mr. Casey; those surrounded by ellipses refer to Genetic Therapy Inc..</Paragraph> <Paragraph position="9"> is the most important unit in span \[1,7\], and to unit 8, a satellite that immediately precedes unit 9. Figure 2 shows the heads and veins of all internal nodes in the rhetorical representation. null</Paragraph> </Section> <Section position="3" start_page="47" end_page="48" type="sub_section"> <SectionTitle> 2.3 Comparing models </SectionTitle> <Paragraph position="0"> The premise underlying our experiment is that there are potentially significant differences in the size of the search space required to resolve referential expressions when using Linear models vs. Discourse-VT models. For example, for text and the RST tree in Figures I and 2, the D/scourse-VT model narrows the search space required to resolve the anaphor the smaller company in unit 9. According to VT, we look for potential antecedents for the smaller Company in the DRA of unit 9, which lists</Paragraph> <Paragraph position="2"> O v-19. 3~9- _w v-~679, v-19. I~ ~I.~ units 1, 8, and 9. The antecedent Genetic Therap3 Inc. appears in unit 1; therefore, using VT we search back 2 units (units 8 and 1) to find a correct antecedent. In contrast, to resolve the same reference using a finear model, four units (units 8, 7. 6, and 5) must be examined before Gene6c Therapy is found. Assuming that referential links are established as the text is processed, Gene~c Therapy would be linked back to pronoun its in unit 2, which would in mm be linked to the first occurrence of the antecedent,Genetic Therapy. Inc., in unit 1, the antecedent determined directly by using Wl'.</Paragraph> <Paragraph position="3"> In general, when hierarchical adjacency is considere& an anaphor may be resolved to a referent that is not the closest in a linear interpretation of a text~ Similarly, a referential expression can be linked to a referee flint is not the closest in a linear interpretation of a text. However, this does not create problems because we are focusing here only on co-referential relations of identity (see section 3). Since these relations induce equivalence classes over the set of referential expressions in a text, it is sufficient that an anaphor or referential expression is resolved to any of the members of the rulev-ant equivalence class, For example, according to VT, the referential expression Mr. Casey in unit 5 in Figure I can be linked directly only to the referee Mr Casey in unit !. because the DRA of unit 5 is { 1,5}. By considen'ng the co-referential links of the REs in the other units, the full equivalence class can be determined. This is consistent with the distinction between &quot;direct&quot; and &quot;indirect&quot; references discussed by Cristea, et ai.(1998).</Paragraph> </Section> </Section> <Section position="6" start_page="48" end_page="48" type="metho"> <SectionTitle> 3 The Experiment </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="48" end_page="48" type="sub_section"> <SectionTitle> 3.1 MaterhJs </SectionTitle> <Paragraph position="0"> We used thirty newspaper texts whose lengths varied widely; the mean o is 408 words and the standard deviation/~ is 376. The texts were annotated manually for co-reference relations of identity (ITh'schman and Chinchor, 1997). The co-reference relations define equivalence classes on the set of all marked referents in a text. The texts were also manually annotated with discourse structures built in the style of Mann and Thompson (1988).</Paragraph> <Paragraph position="1"> Each analysis yielded an average of 52 elementary discourse units. Details of the discourse annotation process are given in (Marcu et al., 1999).</Paragraph> <Paragraph position="2"> 3-~ Comparing potential to establish co-referential links</Paragraph> </Section> </Section> <Section position="7" start_page="48" end_page="49" type="metho"> <SectionTitle> 3~,.1 Method </SectionTitle> <Paragraph position="0"> The annotations for co-reference relations and rhetorical structure trees for the thirty texts were fused, yielding representations tha t ~flect not only the discourse structure, but also the c~reference equivalence classes specific to each text. Based on this information, we evaluated the potential of each of the two classes of models discussed in section</Paragraph> </Section> <Section position="8" start_page="49" end_page="49" type="metho"> <SectionTitle> 2 (Linear-k and Discourse-VT-k) to correctly estab- </SectionTitle> <Paragraph position="0"> * lish co-referential links as follows: For each model, each k, and each marked referential expression a, we determined whether or not the corresponding LPA (defined over k elementary units) contained a referee from the same equivalence class. For example, for the Linear-2 model and referential expression the smaller company in unit 9, we estimated whether a co-referential link could be established between the smaller company and another referential expression in units 7, 8, or 9. For the Discourse-VT-2 model and the same referential expression, we estimated whether a co-referential link could be established between the smaller company and another referential expression in units 1, 8, or 9. which correspond to the DRA of unit 9.</Paragraph> <Paragraph position="1"> To enable a fair comparison of the two models, * when k is larger than the size of the DRA of a given unit, we extend that DRA using the closest units that precede the unit under scrutiny and are not already in the DRA. Hence, for the Linear-3 model and the referential expression the smaller company in unit 9, we estimate whether a co-referential link can be established between the smaller company and another referential expression in units 6, 7, 8, or 9. For the Discourse-VT-3 model and the same referential expression, we estimate whether a co-referential link can be established between the smaller company and another referential expression in units 1, 8, 9, or 7, which c:orrespond to the DRA of unit 9 (units 1, 8, and 9) and to unit 7, the closest unit preceding unit 9 that is not in its DRA.</Paragraph> <Paragraph position="2"> For the Discourse-VT-k models, we assume that the Extended DRA (EDRA) of size k of a unit u (EDRAk(u)) is given by the first I < k units of a sequence that lists, in reverse order, the units of the DRA of u plus the k - I units that precede u but are not in its DRA. For example, for the text in Figme 1, the following relations hold: F_~RAo(9) =</Paragraph> <Paragraph position="4"> For Linear-k models, the EDRAt(u) is given by u and the k units that immediately precede u.</Paragraph> <Paragraph position="5"> The potential p( M, a, EDRAt) of a model M to determine correct co-referential links with respect to a referential expression a in unit u, given a corresponding EDRA of size k (EDRAt(u)), is assigned the value 1 if the EDRA contains a co-referent from the same equivalence class as a. Otherwise, p(M, a, EDRAk) is assigned the value 0. The potential p(M, C, k) of a model M to determine correct co-referential links for all referential expressions in a corpus of texts C, using EDRAs of size k, is computed as the sum of the potentials p(M, a, EDRAk) of all referential expressions a in C. This potential is normalized to a value between 0 and I by dividing p(M, 6&quot;, k) by the number of referential expressions in the corpus that have an antecedent.</Paragraph> <Paragraph position="6"> By examining the potential of each model to correctiy determine co-referential expressions for each k, it is possible to determine the degree to which an implementation of a given approach can contribute to the overall efficiency of anaphora resolution systems. That is, if a given model has the potential to correctly determine a significant percentage of co-referential expressions with small DR/is, an anaphora resolution system implementing that model will have to consider fewer options overall.</Paragraph> <Paragraph position="7"> Hence, the probabifity of error is reduced.</Paragraph> </Section> class="xml-element"></Paper>