XML Viewer - w00-0201

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-0201_metho.xml
Size: 11,513 bytes
Last Modified: 2025-10-06 14:07:21
<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0201">
  <Title>An Interlingual-based Approach to Reference Resolution</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 DOD contract N 66001-99-1-8915
</SectionTitle>
    <Paragraph position="0"> They do not generally consider implicit references or, in the case of Spanish (e.g., Ferrhndez et al, 1998), references to contextually clear possessors using determiners rather than possessive adjectives. All such approaches are supplemented, if not entirely determined, by heuristics which, more recently, have been induced statistically from corpora (e.g., Hirsehman et al, 1998, Popescu-Belis, 1998). The only approaches of which the authors are aware that attempted to account for implicit referents or implicit references are those developed within AI over two decades .ago (e.g., Hobbs, 1979, or DeJong, 1979).</Paragraph>
    <Paragraph position="1"> This approach suggested here differs radically in thaf reference resolution is triggered by elements of interlingual (IL) representation rather than surface text expressions. The referents in the domain of discourse consist of elements of IL representation as well. Thus, implicit references and implicit referents are accounted for and, at the same time, empty references are ignored.</Paragraph>
    <Paragraph position="2"> Below, in Section 2, we outline a proposal for practically implementing an IL-based approach, beginning with a description of a target procedure which resolves the reference of each new IL element as it is being produced, clause by clause. We then present a series of form-based procedures, which are to be gradually replaced (or, in some cases, transmogrified) as the IL analysis system and supporting knowledge bases are extended. In Section 3, we briefly describe the relevant aspects of the sample Spanish used as a basis for the presentation. In Section 4, we examine the operation of the procedures in greater detail and demonstrate how the IL-based approach can resolve those problematic references that are beyond the scope of the form-based approaches.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="2" type="metho">
    <SectionTitle>
2 Resolution Procedures
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="2" type="sub_section">
      <SectionTitle>
2.1 IL-based Resolution Procedure
</SectionTitle>
      <Paragraph position="0"> The proposed approach relies on the capability of a system to provide a reasonably adequate interlingual representation for a text. Here the interlingua we rely on is a variant of Text Meaning Representation (TMR) (see http://crl.nmsu.edu/Research/Projects/mikro/in dex.html) and we focus on a sample Spanish text and its interlingual analysis, which is known to be reproducible automatically.</Paragraph>
      <Paragraph position="1"> The first step of the analysis process is to produce a functional (syntactic) structure for the text. As part of the process of establishing the f-structure, various structurally governed, clause and sentence internal anaphorie relations will be resolved, the relevant anaphors being coindexed (or assigned differing indexes) as determined by the syntax.</Paragraph>
      <Paragraph position="2"> Thus some syntactic co-reference relationships such as those related to clitics and relative pronouns will be identified before the IL procedure begins.</Paragraph>
      <Paragraph position="3"> The second step is to map from the f-structure to the TMR. A TMR includes, among other representational objects, instantiations of object types, relation types and property types. These are constructed from ontological concepts which are associated with the lexical items in f-structure and which are filled out on the basis of the surrounding f-structure.</Paragraph>
      <Paragraph position="4"> For instance, the Spanish verb comprar (to buy) might be associated with the ontological concept named PURCHASE which is a generic frame structure corresponding to purchasing events. It might in part look like:  etc.</Paragraph>
      <Paragraph position="5"> where TIME, LOCATION, AGENT, THEME, human, organization, object, etc. are all ontological concepts. On some particular use of comprar in a text (or more specifically in the f-structure representation of a text), the PURCHASE frame is called forth and instantiated, i.e., indexed and filled in with instantiated representational objects derived from other ontological concepts associated with other lexical items of the surrounding fstructure. null For instance, if the f-structure in question is for:</Paragraph>
    </Section>
    <Section position="2" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
Roche compra Docteur Andreu
Roche and Docteur Andreu will each be
</SectionTitle>
      <Paragraph position="0"> associated with and instantiate an ontological concept for COMPANY. Such a representational element might look like:  etc.</Paragraph>
      <Paragraph position="1"> where ' NAME, HEADQUARTERS, BRANCH, text, office, etc. are all ontological concepts. This frame is instantiated according to the f-structure context for each of the two proper names and then the instantiated frames are inserted (actually indexed) appropriately in the PURCHASE frame. The result is:  etc.</Paragraph>
      <Paragraph position="2"> These instantiated representational objects are, in turn, referents in the discourse context when the next sentence is processed. As part of this mapping, the reference of the various f-structure elements is resolved resulting in the addition of information to certain existing IL objects (coreference) or in the creation of new IL objects which are added to the domain of discourse (initial reference). Which occurs depends on whether a connection can be inferred between the current IL object and an already-existing IL object on the basis of ontological or epistemie information.</Paragraph>
      <Paragraph position="3"> Note that reference resolution is driven by the process of instantiating TMR objects rather than by linguistic forms. At the same time, various aspects of reference resolution are being done on the basis of form. Certain anaphors are coindexed or differently indexed on the basis of morphosyntax as the f-structure is being constructed. Ontological concepts of similar or related type may be called forth during the mapping of similar or related lexical items onto TMR. Articles and other determiners affect the form of the instantiated TMR objects corresponding to the NPs containing them. Finally, aspects of the instantiated TMR objects refer to literals such as the NAME attribute of a COMPANY.</Paragraph>
      <Paragraph position="4"> These can be used to implement string-matching algorithms.</Paragraph>
    </Section>
    <Section position="3" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
2.2 Approach to Implementation
</SectionTitle>
      <Paragraph position="0"> Because the target procedure relies on a highly sophisticated, and as yet incomplete, language analysis system, the approach to implementing it begins by assuming that none of the IL apparatus is available for processing. Instead, we implement an initial set of standard, form-based algorithms which vary according to the syntactic category of referring expression.</Paragraph>
      <Paragraph position="1"> Generally, if the referring expression is a proper noun phrase, a full or partial string match with each prior PN is used to establish a eoreference link. Otherwise, the PN is assumed to refer to a new referent. For standard pronominals, a recency algorithm is used which checks the morphosyntactic constraints (gender, number, etc.) and, if possible, the semantic class to filter potential coreferents.</Paragraph>
      <Paragraph position="2"> When a match is found, the coreference link is established. Common noun phrases follow a bifurcated algorithm. If the noun phrase is indefinite or has no article, then it is assumed to refer to a new referent. If it is a definite noun phrase, then the head noun string is matched against that of previous NPs. If the heads of two NPs match and the complement  strings do not mismatch, the two NPs are assumed to corefer.</Paragraph>
      <Paragraph position="3"> As the IL analysis system and supporting knowledge bases grow and the ability to produce appropriate f-structures and TMRs is extended, the target procedures will bear increasingly more of the resolution task.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="2" end_page="2" type="metho">
    <SectionTitle>
3 The Data
</SectionTitle>
    <Paragraph position="0"> For this presentation, we focus the discussion on a single Spanish text, a newswire article concerning a corporate buyout (see Appendix I for the original text and its translation into English). It is taken from the ARPA Machine Translation Evaluation corpus (White et al, 1994) and contains 347 words, 17 sentences and 11 paragraphs.</Paragraph>
    <Paragraph position="1"> There are 144 referring expressions altogether including 14 proper noun phrases (9.7%), 76 common noun phrases (52.8%), 20 pronominal-like expressions (13.9%), 31 verbal expressions (21.5%) and 3 prepositional phrases (2.1%). Of the common noun phrases (NP), 43 were definite NPs, 8 were indefinite NPs and 25 NPs had no explicit determiner. Of the pronominal expressions, 12 were pronouns (Pron) or deicties such as hey (today), aqui (here) or ahora (now), 3 were ellipted (subject) pronouns (PRO) and 5 were definite articles which function as possessive adjectives (Det \[= Pron\]) as in: El beneficio neto ... se elev6 ....</Paragraph>
    <Paragraph position="2"> The\[= Its\] net profits ... increased ....</Paragraph>
    <Paragraph position="3"> However, there are in addition 129 implicit references made which need to be resolved as well. Implicit references are those that are implied by the slots of TMR objects. They may be of unexpressed participants of an event (say, a seller in a PURCHASE event), or unexpressed times or locations of events.</Paragraph>
    <Paragraph position="4"> Altogether, then, there are 273 references of which 52.75% are explicit and an almost equal amount, 47.25%, are implicit. Taking the implicit references into account, the proper noun phrases (PN) represent 5.1% of the referring expressions, the common noun phrases NP about 27.8%, pronominal-like expressions 7.3%, verbal expressions 11.4% and prepositional phrases about 1.1%. These results are summarized in Tables I and 2.</Paragraph>
    <Paragraph position="5"> As for referents, there are 138 altogether. Of these 108 (78.25%) are referred to explicitly on at least one occasion while 30 (21.75%) are referred to implicitly only. There are 40 (29%) referents that are referred to more than once, of  which 31 (77.5%) are explicitly referred to at some point and 9 (22.5%) are implicitly referred to only.</Paragraph>
    <Paragraph position="6"> Thus, there were 135 eoreferences altogether (273 total references- 138 referents), 41 explicit coreferences and 94 implicit coreferences. Of the explicit coreferences, 9 were made by PNs, 10 by NPs, 20 by pronominal-like expressions (12 by Prons, 3 by PROs, 5 by Dets \[= Pron\]) and 2 by verbal expressions.</Paragraph>
    <Paragraph position="7"> It is perhaps of some interest that very few of these references were figurative, that is, metonymic or metaphorical. There was one clearly metonymic reference, and one or two other possible metonymie references.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML