File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/w00-0201_evalu.xml

Size: 23,862 bytes

Last Modified: 2025-10-06 13:58:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0201">
  <Title>An Interlingual-based Approach to Reference Resolution</Title>
  <Section position="6" start_page="2" end_page="9" type="evalu">
    <SectionTitle>
4.
4 Reference Resolution
</SectionTitle>
    <Paragraph position="0"> What follows here is a discussion of what is needed to resolve the 273 references made in the example Spanish text. In each section, the cases that could be handled by a form-based method are discussed, then cases that would require an IL-based approach for resolution.</Paragraph>
    <Section position="1" start_page="2" end_page="5" type="sub_section">
      <SectionTitle>
4.1 Resolving Proper Noun Phrases
</SectionTitle>
      <Paragraph position="0"> As mentioned, there are 14 PNs in the sample text, 9.7% of the explicit referring expressions or 5.1% of all references. In addition, there were 4 cases of common noun phrases which were in apposition with proper noun phrases and which will be considered here as well.</Paragraph>
      <Paragraph position="1"> For PNs, the basic resolution strategy is to match the form of the expression with that of each of the PNs used previously. If there is a match, assume the current PN is being used to corefer to the referent of the matching PN.</Paragraph>
      <Paragraph position="2"> This applies in 4 of 18 cases. If no match is found, attempt a partial positive form match.</Paragraph>
      <Paragraph position="3"> That is, if any part of either form for which there is a partial match does not match a corresponding substring in the other form, then the match fails. Thus, El grupo Roche matches Roche and Productos Roche SA also matches Roche because there is no substring of the  phrase Roche that does not match with El * grupo or Productos. On the other hand, Productos Roche SA does NOT match El grupo Roche because Productos does not  match El grupo. This handles 3 further cases of the 18 (assuming that having matched El grupo Roche with Roche, Productos Roche SA will not match because it does not match with El grupo Roche).</Paragraph>
      <Paragraph position="4"> If no positive partial match can be found, it is assumed that the PN is not being used to corefer to an existing referent and introduces a new referent. This takes care of 9 additional cases.</Paragraph>
      <Paragraph position="5"> This basic PN resolution procedure, then, handles 16 of the 18 PNs (89%). That leaves 2 cases which it will not handle, Doctor Andreu coreferring to Docteur Andreu and Productos Roche SA coreferring to su compa~ia en Espa~a The second problem is not all that rare.</Paragraph>
      <Paragraph position="6"> Here, the PN is being used to corefer to a referent that was initially introduced by a common noun phrase. The first step is to identify the semantic class of the PN, possibly through some independent PN classifying procedure or possibly by looking at any semantic constraints that arise from the context. In the case of Productos Roche SA, for instance, it might be classed as a COMPANY by some independent PN classifying procedure, say, on the basis of the SA, or by inspecting its context.</Paragraph>
      <Paragraph position="7"> ... la operaci6n realizada entre ... the transaction carried out between  In example (1), the expression la operaci6n is used to refer to some transaction that, as the text goes on to report, was carried out between Productos Roche and Uni6n Explosivos Rio Tinto. If transactions are carried out by companies (general semantic knowledge), then Productos Roche and Uni6n Explosivos must be companies.</Paragraph>
      <Paragraph position="8"> Having established the semantic category of Produetos Roche, the next step is to establish a plausible connection between Productos Roche and an established referent of the same semantic category. That is, the procedure is  now to inspect all the established referents of the category COMPANY (i.e., the Roche group, Doctor Andreu and Roche's company in Spain). We know from prior text that Roche bought Doctor Andreu and that Roche acquired Doctor Andreu through its subsidiary in Spain (epistemie knowledge). From the current text, we know that Productos Roche and Uni6n Explosivos were actually involved in the transaction and that Uni6n Explosive had been a majority shareholder but, by  implication, no longer is (epistemic knowledge). Thus, Uni6n Explosivos appears to be the seller and, by implication, Productos Roche could be the buyer, i.e., Roche's company in Spain.</Paragraph>
      <Paragraph position="9"> The same procedure can be used to establishing that the reference of Doctor Andreu is the same as that ofDocteur Andreu: establish the semantic class of Doctor Andreu, inspect each existing referent of that class to  see whether or not a plausible connection can be established.</Paragraph>
    </Section>
    <Section position="2" start_page="5" end_page="6" type="sub_section">
      <SectionTitle>
4.2 Resolving Pronominals
</SectionTitle>
      <Paragraph position="0"> In the sample Spanish text, there are 19 pronominal expressions, 13.9% of the explicit referring expressions or 7.3% of all references.</Paragraph>
      <Paragraph position="1"> Of these, there are 15 explicit forms and 4 ellipted forms. The 15 explicit forms include 6 possessive pronouns, 4 deictie adverbials and 5 definite articles. The 4 ellipted forms include 3 eilipted subjects of finite verbs and 1 ellipted head of a relative complementizer.</Paragraph>
      <Paragraph position="2"> #.2.1 Explicit pronouns The basic form-based strategy for resolving pronominal reference is to begin by inspecting in reverse order of mention those referring expressions whose forms are compatible with the morphological constraints imposed by the pronominal. This strategy is usually constrained by various syntactic heuristics such as that a non-reflexive pronoun in object position cannot corefer to the subject or a pronominal complement of a noun cannot corefer to the head (e.g., Ferrfindez et al, 1998). Such a resolution procedure will account for 4 of the 6 cases (66%) in the sample text.</Paragraph>
      <Paragraph position="3"> To resolve the remaining cases, it is necessary to check the referent of the antecedent to see whether it is semantically compatible with the contextual function of the anaphor. So, for instance, in resolving the reference ofsu (its, his, her or their) in: El beneficio neto -el mejor de su The\[=its\] net profits the best in its historia- se elev6 a 641,5 millones de history increased to 641.5 million</Paragraph>
      <Paragraph position="5"> the procedure is to first shuffle back through the referring expressions until a third person form is encountered. Here, the first third person referring expression is El beneficio neto (some company's net profits). The procedure next needs to establish through infereneing that the referent of El beneficio neto can serve the function of the anaphor su, that is, can have a history. In this case, a plausible inference cannot be established, and so the procedure moves on to consider the next most recently mentioned referent, the Roche group which is being referred to by the E1 of El beneficio neto which is being used to express a possessor relation. Here, on the basis of ontological knowledge about what companies can or cannot have, su is understood as coreferring to the referent of El.</Paragraph>
      <Paragraph position="6">  For the ellipted pronominals in the sample Spanish text, syntax will have to identify such ellipted elements in order to trigger the reference resolution process. However, once identified, the basic strategy described above for explicit pronouns should apply unaltered although, unlike possessive pronouns, the morphological constraints of the anaphor must be extracted from the morphosyntactic context. So, for instance, the ellipted subject of: ... cuenta PRO con compa~las en m6s ... has companies in more de 50 paises ...</Paragraph>
      <Paragraph position="7"> than 50 countries ...</Paragraph>
      <Paragraph position="8"> must be a third person singular referent (given the conjugation of the verb cuenta). In this case, the basic resolution procedure correctly resolves 2 of the 4 eases (50%).</Paragraph>
      <Paragraph position="9"> For the remainder, the semantic function of the ellipted element is also extracted from context, i.e., it functions as the subject of contar con compa~ias (has companies). Thus, in the example above, it must be something that can own companies. Among the third singular candidate expressions, in reverse order of mention: el diagn6stico (diagnosis), la comercializaci6n (the marketing), la producci6n (the manufacture), el desarrollo (the development), Basilea (Suiza) (Basel, Switzerland) and sede central (home office), none are potential owners of companies. The next most remote referring expression to be inspected is el grupo Roche (the Roche Group) which, as it turns out, is something that can own companies and, therefore, the PRO is identified as coreferring to the same referent.  Deictic elements, such as the adverbs hey (today), aqui (here), ahora (now), and so on, are resolved directly to properties of the utterance context: the day of utterance, the place of utterance, the time of utterance, and so on. There were 4 such pronouns in the sample text, 3 referring (hey, aqul, ahora&amp;quot;) and 1 coreferring (hey). It should be noted that all these elements are, in fact, coreferent with implicit temporal and spatial references of various finite verbs that are used to report certain events or states of affairs.</Paragraph>
      <Paragraph position="10">  Perhaps the most contentious of the pronominal elements to be discussed here are the definite articles of noun phrases which Can be contextually interpreted as having the force of possessive adjectives. There are 6 examples of this in the sample text. However, these are by no means all the definite articles found and, in addition, of these six example, four were translated as possessive adjectives, one as a definite article, the, and one was omitted altogether in translation. In other words, not only is an ambiguity introduced for a very common lexical item, but even when resolved in favor of the possessor interpretation, it may not be translated as a possessive adjective. On the other hand, the major reason for assuming a distinction is that it is very important to establish such relationships in order to understand or translate a document. For instance, if it is not established that the cash flow referred to in: El &amp;quot;cash flow&amp;quot; se increraent6 en un 21 its cash flow increased by about 21 per ciento ...</Paragraph>
      <Paragraph position="11"> per cent ...</Paragraph>
      <Paragraph position="12"> is that of company X, then it will be impossible to determine whether the cash flow referred to later in the text in: ... y el &amp;quot;cash flow&amp;quot; de ...</Paragraph>
      <Paragraph position="13"> ... and its cash flow ...</Paragraph>
      <Paragraph position="14"> is that of company X or of some other company. That means that such information will not only be unavailable for use during translation (e.g., selecting a possessive adjective in the target language) but for any other purpose that might come along (e.g., information extraction).</Paragraph>
      <Paragraph position="15"> In any case, the procedure for resolving the reference of definite articles is the basic pronominal resolution procedure except that no morphological constraints can be placed on the antecedent expression. Still, syntactic constraints on the antecedent may be applied. Following this strategy, 5 of the 6 cases (83%) are resolved correctly although it is not clear whether it leads to false positives. Otherwise, potential referents are considered until one is found which can serve the appropriate possessor function. Since a positive connection must be inferred, the likelihood of false positives is greatly decreased.</Paragraph>
    </Section>
    <Section position="3" start_page="6" end_page="9" type="sub_section">
      <SectionTitle>
4.3 Resolving Common Noun Phrases
</SectionTitle>
      <Paragraph position="0"> If proper noun phrases and pronominals were the only type of referring expressions, form-based resolution techniques might prove sufficient. They account, however, for only 23.6% of the explicit referring expression or 12.4% of all references. It is for the resolution of common noun phrases, clauses and implicit references that an interlingual-based procedure will eventually prove necessary.</Paragraph>
      <Paragraph position="1"> There are 76 common noun expressions in sample Spanish text. Of these 42 are definite noun phrases (32 referring, 10 coreferring), 9 are indefinite noun phrases (all 9 referring), and 25 are noun phrases having no article (all 25 referring).</Paragraph>
      <Paragraph position="2"> Since none of the indefinite noun phrases or the noun phrases without articles are used to corefer, an initial basic resolution strategy for common noun phrases begins by inspecting the form of the referring expression. If it is an indefinite noun phrase or noun phrase without article, it is assumed to refer to something new and a new referent is added to the referents in the domain of discourse. That successfully resolves the reference of 34 of the 76 common noun phrases (45%) in the text.</Paragraph>
      <Paragraph position="3"> Of the 42 definite noun phrases, 36 are used to refer (or corefer) to specific individuals or stuff or are inferrably unique given a general knowledge of individuals, stuff or situations that were being discussed. Of these 36, 24 were used to refer to particular individuals or stuff, 8 to processes, 2 to particular groups of objects and 2 to logically unique objects. Of the remaining 6 definite noun phrases, 4 were used to refer to portions (percentages) of stuff and 2 were used to refer to generic classes.</Paragraph>
      <Paragraph position="4"> In regard to the resolution of definite noun phrases, then, the basic strategy is to first identify whether the expression is being used to refer to specific individuals or stuff, to portions (percentages) of stuff or, if possible, to a generic class. So, for instance, given the expression el 7, 4 por ciento in: la rentabilidad sobre las ventas aument6 its profit &amp;quot;over sales increased del 6,3 al 7,4 por ciento.</Paragraph>
      <Paragraph position="5"> from 6.3 to 7.4 per cent it is sufficient to identify that the expression is being used to refer to a percentage (of sales) in order to assume the the expression is used to refer to something new. Identifying a generic reference on the basis of form is less obvious but, in any case, this will resolve 4 to 6 of the 42 cases.</Paragraph>
      <Paragraph position="6"> Second, for the 36 definite noun phrases used to refer to a particular individual or stuff, the basic procedure is to match the head noun expression against each of the common noun expressions that have been used previously. If one is found, the complement expressions are then matched. If these are compatible, the NP under consideration is assumed to eorefer to the referent of the matching NP. This will successfully resolve 27 of the 36 eases (75%) but leaves 9 incorrectly resolved, 5 setting up new referents when in fact they are coreferring and 4 false positive eases ofcoreference.</Paragraph>
      <Paragraph position="7"> The third step, then, is to inspect each referent for semantic compatibility. Semantic information is established on the basis of the expression's form and context. For instance, in looking for a possible referent for la rentabilidad sobre las ventas (its profit over sales ratio) above, it is first necessary to establish that the potential referent must be some measurement of financial performance which has increased during some particular period of time for some particular company.</Paragraph>
      <Paragraph position="8"> At the time a reference for la rentabilidad sobre las ventas is sought, there are some 98 existing referents (53 objects, 34 events, 11 implicit objects). The more recent of these include &amp;quot;641.5 million Swiss Francs&amp;quot;, &amp;quot;Roche Group's net profit&amp;quot;, &amp;quot;Roche Group's pharmaceutical division&amp;quot;, &amp;quot;41% of Roche Group's total sales&amp;quot;, &amp;quot;8.69 billion Swiss Francs&amp;quot;, &amp;quot;Roche Group's total sales&amp;quot;, &amp;quot;1988&amp;quot; and so on. Of these, only &amp;quot;Roche Group's net profit&amp;quot; and &amp;quot;Roche Group's total sales&amp;quot; are possible measurements of a company's financial performance. However, both these measurements should be ontologically distinct from a company's profit over sales as well as from each other. Thus, they fail to satisfy the semantic requirements of a potential referent for the expression. In the end, no semantically appropriate referent will be found among the pool of existing referents and so a new referent is introduced to the pool.</Paragraph>
      <Paragraph position="9"> If an existing referent meets the informational constraints on potential referent of the expression being processed, the expression is assumed to refer to that referent. If no existing referent satisfies those constraints, the expression is assumed to refer to something new and a new referent is added to the pool of referents.</Paragraph>
      <Paragraph position="10">  Noun phrases, of course, are not the only type of constituent that is used to refer to things in the world. Clauses may be used to refer (or corefer) to particular events or states-of-affairs or to classes of events or states-of-affairs.</Paragraph>
      <Paragraph position="11"> These may be finite (main, relative, complement or adverbial clauses), participials (present or passive), infinitival or absolutive. Of the 45 events referred to in the text, there were 3 events that were referred to on more than one occasion. The first, the purchasing of Doctor Andreu, was coreferred to 4 times. But of these, only one was by way of a (finite) clause. The other three were all by way of NPs, 2 explicit and 1 impli~zit. The second, the announcing of the purchase, was coreferred to only once by way of a NP. The third, Roche's investing in R&amp;D, was coreferred to once by way of an implicit pro-verb introduced for syntactic reasons in the context of a parallel, conjoined structure.</Paragraph>
      <Paragraph position="12">  The only form-based resolution procedure for resolving clausal reference would be to look for prior verbs having the same form and then inspecting the complements for contradictions. This procedure might be extended by inspecting prior verb forms that are related by, say, Spanish WordNet (Rodrfguez, 1998) or an on-line Spanish thesaurus (if any should exist). However, this extension could also open the door to many false positives.</Paragraph>
      <Paragraph position="13"> Such an approach might possibly resolve as many as 26 of the 27 cases of clausal refei'ences to events or states of affairs correctly. In any case, an IL-based approach has the advantage of having the events mentioned in prior text already represented formally aid in a language neutral form. Thus, the need for additional on-line resources for each language is assuaged.</Paragraph>
      <Paragraph position="14">  As mentioned, there were 45 events or states-of-affairs referred to in the sample Spanish text which introduce an additional 30 implied referents (5 times, 22 places and 3 actors).</Paragraph>
      <Paragraph position="15"> These events, and the implicit referents they introduce, need to be identified for successfully carrying out the coreference task. They may act as referents that are later referred to in the text or they may serve to assist in constraining or establishing coreference between later expressions and other existing referents. For instance, of the 22 implied locations, 6 are later referred to in the text and, of the 5 implied times, 3 are later referred to.</Paragraph>
      <Paragraph position="16"> Clearly, there is no obvious form-based resolution procedure for such elements, since they have no explicit form. Thus, in order to resolve these implicitly introduced referents, the basic procedure is to treat them as pronominals. That is, every event or state-of-affairs in the TMR has an implicit &amp;quot;at that time&amp;quot; and &amp;quot;at that place&amp;quot; associated with it which has to be resolved as part of the reference task. Beyond the fact that the potential referents must times and locations respectively, any further constraints will have to be derived from what is known about the event and about its relative (temporal or local) status with respect to the other .events which have previously been mentioned. A primary source of information for dealing with such issues will be scripts (Schank &amp; Ableson, 1977). Similarly, the identification of implied actors will be dependent on the ontological (not lexical) definitions of the class of event or state-of-affairs referred to and any additional information that may be extracted from the particular events or states-of-affairs that have been previously mentioned.</Paragraph>
      <Paragraph position="17"> Given the informational constraints gathered, the procedure is then to inspect referents of like type (time, location, actor type) in reverse order of mention until one is found which is compatible with those additional informational constraints.</Paragraph>
      <Paragraph position="18"> Conclusion The advantages of the interlingual approach to reference resolution include the following: * only expressions related to actual referents are processed for coreference (i.e., no pleonastic pronouns, no clitic pronouns, no relative pronouns, etc.), * implicit as well as explicit referents are processes for coreference, * knowledge-based inferencing (both ontological and epistemic) is available for resolving (many) problematical cases, * ontologically connected actors, say, the different participants in a sequence of events making up a script, can be used to establish coreference, * texts in different languages can be processed in the same way, * all form-based procedures either are or can be implemented in any ease.</Paragraph>
      <Paragraph position="19"> The central disadvantages are: * some surface text level ordering information is lost in the TMR, * discourse-structural information may be lost in the TMR, . .</Paragraph>
      <Paragraph position="20"> * the need for a large and sophisticated knowledge sources, * the need for sound and appropriatelydirected inferencing..</Paragraph>
      <Paragraph position="21"> As a result of the loss of ordering information, strict recency-based resolution procedures cannot be implemented. The referents in the domain are not ordered in terms of when they  were introduced. The processing of the different arguments in f-structure does not necessarily correspond to the surface sequence of their mention. This &amp;quot;defect&amp;quot; could possibly be overcome by simply indexing each new the TMR object with the prior index plus one (assuming the indexes are integers). The tacit assumption is that, at the level of the clause, first the predicate is processed and then the arguments are processed in left to right order as they appear in f-structure.</Paragraph>
      <Paragraph position="22"> As for the lack of information about the discourse structure, it may be the ease that this is a defect of the TMR representation system.</Paragraph>
      <Paragraph position="23"> That is to say, it is not unreasonable to assume that the larger orgafiisational aspects of a text, the topics, their order of presentation, the structure of the argumentation, etc., should in fact be captured in any adequate representation of the text. It has been, however, the goal of TMR to focus on capturing the information content exclusively and not on how the information is presented.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML