File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2166_intro.xml

Size: 11,237 bytes

Last Modified: 2025-10-06 14:06:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2166">
  <Title>Possessive Pronominal Anaphor Resolution in Portuguese Written Texts</Title>
  <Section position="3" start_page="0" end_page="1013" type="intro">
    <SectionTitle>
2. The PPA resolution problem
</SectionTitle>
    <Paragraph position="0"> From the interpretation point of view, PPAs are widely different from other kinds of anaphors, such as personal or demonstrative pronouns. In this section we present some specific characteristics of Portuguese PPAs &amp;quot;seu/sua/seus/suas&amp;quot;, by means of generic examples in natural language. Some of these examples, however, may be inappropriate in  English version, when using pronouns &amp;quot;his/her/their/its&amp;quot;.</Paragraph>
    <Paragraph position="1"> First, we notice the lack of gender or number agreement between PPAs and their antecedents. The English version of example 1 has a trivial solution, based on syntactic constraints, but the Portuguese version is ambiguous: Ex 1: Jo~o falou a Maria sobre seu cachorro. (John told Mary about his dog).</Paragraph>
    <Paragraph position="2"> Example 2 shows that PPAs can occur in several grammatical (usually, nonsubject) positions. Besides, in example 3, we notice that PPAs can refer to different segments of a &amp;quot;NP-of-NP-of-NP...&amp;quot; chain. This kind of structure, with several NPs in the same chain, is typical in our domain.</Paragraph>
    <Paragraph position="3"> Ex 2: Joao viu um cachorro trazendo seu jomal I seu filhote. (John saw a dog bringing his newspaper I its puppy).</Paragraph>
    <Paragraph position="4"> Ex 3: O pai do garotinho vendeu sua casa. (The father of the little boy sold his house).</Paragraph>
    <Paragraph position="5"> O dono do c~o vendeu seu carro I seu filhote. (The owner of the dog sold his car \[ its puppy). In some situations, PPAs like shown in example 2 and 3 can be solved by applying semantic knowledge, since PPAs establish possessive relationships (in concrete or figurative sense) between objects in discourse. For example, a human being can usually possess &amp;quot;his car&amp;quot;, but a dog cannot. However, we have found in our corpus several PPAs, namely abstract anaphors, which cannot be particularly related to any semantic object. For example, we have PPAs such as &amp;quot;their importance&amp;quot;, &amp;quot;their relevance&amp;quot;, etc. Similarly, we have found also some abstract antecedents, such as &amp;quot;the problem&amp;quot;, &amp;quot;the importance&amp;quot;, etc.</Paragraph>
    <Paragraph position="6"> Finally, we notice that, in our corpus, we have to treat long and complex sentence structures, which are typical in the domain (laws) that we are dealing with. Thus, despite PPAs in our corpus are mostly (99%) intrasentential, there is a high number of candidates for each anaphor.</Paragraph>
    <Paragraph position="7"> 3. Factors in PPA resolution This section describes a minimal set of factors in PPA resolution, based on corpus investigation. These factors will be considered in place of traditional syntactic constraints, which are not suitable for our present problem, as shown in section 2. In our proposal, because of the structural complexity of sentences in the domain, we have adopted a practical approach, based on simple heuristic rules, with a view to avoiding syntactic and semantic analysis.</Paragraph>
    <Paragraph position="8"> Similar strategies have been adopted in several recent works in anaphor resolution, such as T. Nasukawa (1994), R. Mitkov (1996), R. Vieira &amp; M. Poesio (1997) and others.</Paragraph>
    <Paragraph position="9"> We have defined 6 simple factors in PPA resolution (F 1 to F6) based on syntactic, semantic and pragmatic knowledge, aiming to determine PPAs antecedents in our specific domain. As a secondary goal, we apply our proposal also to PPAs in a different domain (see section 5). Factors, enunciated as heuristic rules, will act as constraints (F1 to F5) or preferences (F6), as established by J. Carbonell (1988).</Paragraph>
    <Section position="1" start_page="1010" end_page="1011" type="sub_section">
      <SectionTitle>
3.1. Syntactic level
</SectionTitle>
      <Paragraph position="0"> Since typical syntactic constraints are not suitable for PPA resolution, in our approach we have limited the role of syntactic knowledge to simple heuristic rules based on surface patterns. Surface patterns are typical expressions in the domain, which gave information about PPAs antecedents. To each relevant surface pattern, we have associated a heuristic rule. Some of these  rules can directly elect, with high rate of success, the most probable antecedent, whereas others can only exclude a specific candidate: F1 - in the pattern &lt;NP and I or PPA&gt;, &lt;NP&gt; must be elected the most probable antecedent of&lt;PPA&gt;. Ex: &amp;quot;John and his dog&amp;quot;;</Paragraph>
      <Paragraph position="2"> must be elected the most probable antecedent of&lt;PPA&gt;. This rule deals with some cases of syntactic parallelism. Ex: &amp;quot;the death of Suzy, of her children and...&amp;quot;; such as &lt;city owns habitants&gt;, &lt;ecosystem owns natural resources&gt; etc.</Paragraph>
      <Paragraph position="3"> In order to apply this kind of knowledge to the whole corpus, we have defined object classes and possible possessive relationships among them. For example, for the anaphor &amp;quot;their hunt&amp;quot; in our corpus, there is a semantic rule which expects only a member of the class &lt;animals&gt; as a suitable antecedent. Typical members of this class would be &amp;quot;birds&amp;quot;, &amp;quot;mammals&amp;quot; and all related expressions found in our corpus.</Paragraph>
      <Paragraph position="4"> Based on this organization we have defined another factor in PPA resolution: F3 - in the patter &lt;NP of PPA&gt;, &lt;NP&gt; is not a valid candidate for &lt;PPA&gt;. Ex: in &amp;quot;the death of his son&amp;quot;, &amp;quot;death&amp;quot; is not a valid</Paragraph>
      <Paragraph position="6"> NP&gt;, only the full chain and the last NP can be considered candidates for PPAs antecedents, i.e., NPs in the middle of the chain can be discarded. This rule adapts the study developed by L. Kister (1995) for NP chains in French, and it constitutes an important mechanism for reducing the high number of candidates in our current problem.</Paragraph>
    </Section>
    <Section position="2" start_page="1011" end_page="1011" type="sub_section">
      <SectionTitle>
3.2. Semantic level
</SectionTitle>
      <Paragraph position="0"> Heuristic rules based on surface patterns are not sufficient to discriminate among a large set of candidates, as we found in our domain. Thus, we also use semantic knowledge in order to increase the results.</Paragraph>
      <Paragraph position="1"> Our semantic approach considers possessive relationship rules in the form &lt;Obj 1 owns Obj2 &gt;, used to represent &amp;quot;part-of&amp;quot; relationships between typical entities of the domain, according to J. Pustejovsky's (1995) semantic theory. For example, in our corpus some PPAs can be solved with knowledge F5 There must be a valid possessive relationship between a PPA and its antecedent.</Paragraph>
    </Section>
    <Section position="3" start_page="1011" end_page="1013" type="sub_section">
      <SectionTitle>
3.3. Pragmatic level
</SectionTitle>
      <Paragraph position="0"> Working together, surface patterns and possessive relationships can deal with many PPAs found in our corpus, but we still have two problems to be solved: semantic ambiguity among two or more acceptable candidates and abstract anaphors/antecedents, which cannot be solved by simply applying possessive relationship rules.</Paragraph>
      <Paragraph position="1"> For these cases, and possibly for some other cases not included in previous rules, we suggest a pragmatic factor, adapted from S.</Paragraph>
      <Paragraph position="2"> Brennan's et al (1987) centering algorithm.</Paragraph>
      <Paragraph position="3"> Although sentence center plays a crucial role in many works in anaphor resolution, usually limiting the number of candidates to be considered, we notice that, because PPAs can refer to almost any NP in the sentence (rather than, for example, personal pronouns, which are often related to the sentence center), pragmatic knowledge plays only a secondary - but still important - role in our approach.</Paragraph>
      <Paragraph position="4"> We have adapted basic aspects of center algorithm, considering subject/object preference, and domain concepts preference,  suggested by R. Mitkov (1996), aiming to estimate the most probable center for intrasentential PPAs. Thus, in case of ambiguity among candidates (after applying factors F1 to F5), we will consider the estimated center as the preferable PPA  antecedent. This constitutes our final rule: F6 - the sentence center will be preferred among remaining candidates.</Paragraph>
      <Paragraph position="5"> 4. A distributed architecture for PPA resolution  Factors have been grouped in three knowledge sources (KSs), as part of a blackboard architecture, based on D.</Paragraph>
      <Paragraph position="6"> Corkill's (1991) work, as shown in figure 1. KSs are independent modules specialized in different aspects of PPA resolution problem (surface patterns, possessive relationships, sentence center), providing both knowledge and procedure distribution among autonomous entities (specialists).</Paragraph>
      <Paragraph position="7"> Since in our proposal knowledge and procedure are represented by heuristic rules, KSs have been implemented as reflexive agents, according to S. J Russel &amp; P. Norvig (1995) work. A reflexive agent is a rule-based entity, which acts according to the perceived environment (the blackboard structure).</Paragraph>
      <Paragraph position="8"> The blackboard is a global database containing information about the problem  (PPA) to be solved: sentence structure information and a set of hypotheses (candidates) to be evaluated by specialists (KSs). The specialists watch the blackboard, looking for a PPA problem to be solved, and evaluate the given data. Specialists can elect, discard or assign preferable candidates, according to their condition-action rules. The resolution process is coordinated by PPA solver agent, a specialist in PPA resolution. When the PPA solver agent receives a PPA resolution requirement, it writes the initial data (in our current implementation, for intrasentential PPAs, all previous NPs in the sentence are considered as part of the initial set of candidates) onto the blackboard and activates the specialists. After each contribution, the PPA solver evaluates the number of remaining candidates and the possible need for further contributions. At the end of the cycle, in case of ambiguity, the PPA solver will choose the preferred candidate, as determined by the sentence center specialist.</Paragraph>
      <Paragraph position="9"> The motivations for adopting a blackboard architecture are the benefits of heterogeneous knowledge distribution and independence among KSs. These benefits will allow us to expand the architecture, adding new factors in PPA resolution or even adding new specialists, dedicated to different anaphoric phenomena.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML