File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2143_metho.xml

Size: 12,187 bytes

Last Modified: 2025-10-06 14:15:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2143">
  <Title>United Kingdom</Title>
  <Section position="3" start_page="0" end_page="871" type="metho">
    <SectionTitle>
2. The approach
</SectionTitle>
    <Paragraph position="0"> With a view to avoiding complex syntactic, semantic and discourse analysis (which is vital for real-world applications), we developed a robust, knowledge-poor approach to pronoun resolution which does not parse and analyse the input in order to identify antecedents of anaphors. It makes use of only a part-of-speech tagger, plus simple noun phrase rules (sentence constituents are identified at the level of noun phrase at most) and operates on the basis of antecedent-tracking preferences (referred to hereafter as &amp;quot;antecedent indicators&amp;quot;). The approach works as follows: it takes as an input the output of a text processed by a part-of-speech tagger, identifies the noun phrases which precede the anaphor within a distance of 2 sentences, checks them for gender and number agreement with the anaphor and then applies the genre-specific antecedent indicators to the remaining candidates (see next section). The noun phrase with the highest aggregate score is proposed as antecedent; in the rare event of a tie, priority is given to the candidate with the higher score for immediate reference. If immediate reference has not been identified, then priority is given to the candi- null date with the best collocation pattern score. If this does not help, the candidate with the higher score for indicating verbs is preferred. If still no choice is possible, the most recent from the remaining candidates is selected as the antecedent.</Paragraph>
    <Section position="1" start_page="869" end_page="869" type="sub_section">
      <SectionTitle>
2.1 Antecedent indicators
</SectionTitle>
      <Paragraph position="0"> Antecedent indicators (preferences) play a decisive role in tracking down the antecedent from a set of possible candidates. Candidates are assigned a score (-1, 0, 1 or 2) for each indicator; the candidate with the highest aggregate score is proposed as the antecedent. The antecedent indicators have been identified empirically and are related to salience (definiteness, givenness, indicating verbs, lexical reiteration, section heading preference, &amp;quot;non-prepositional&amp;quot; noun phrases), to structural matches (collocation, immediate reference), to referential distance or to preference of terms. Whilst some of the indicators are more genre-specific (term preference) and others are less genre-specific (&amp;quot;immediate reference&amp;quot;), the majority appear to be genreindependent. In the following we shall outline some the indicators used and shall illustrate them by examples. null</Paragraph>
    </Section>
    <Section position="2" start_page="869" end_page="869" type="sub_section">
      <SectionTitle>
Indicating verbs
</SectionTitle>
      <Paragraph position="0"> If a verb is a member of the Verb_set = {discuss, present, illustrate, identify, summarise, examine, describe, define, show, check, develop, review, report, outline, consider, investigate, explore, assess, analyse, synthesise, study, survey, deal, cover}, we consider the first NP following it as the preferred antecedent (scores 1 and 0). Empirical evidence suggests that because of the salience of the noun phrases which follow them, the verbs listed above are particularly good indicators.</Paragraph>
      <Paragraph position="1"> Lexical reiteration Lexically reiterated items are likely candidates for antecedent (a NP scores 2 if is repeated within the same paragraph twice or more, 1 if repeated once and 0 if not). Lexically reiterated items include repeated synonymous noun phrases which may often be preceded by definite articles or demonstratives.</Paragraph>
      <Paragraph position="2"> Also, a sequence of noun phrases with the same head counts as lexical reiteration (e.g. &amp;quot;toner bottle&amp;quot;, &amp;quot;bottle of toner&amp;quot;, &amp;quot;the bottle&amp;quot;).</Paragraph>
      <Paragraph position="3"> Section heading preference</Paragraph>
    </Section>
    <Section position="3" start_page="869" end_page="871" type="sub_section">
      <SectionTitle>
Definiteness
</SectionTitle>
      <Paragraph position="0"> Definite noun phrases in previous sentences are more likely antecedents of pronominal anaphors than indefinite ones (definite noun phrases score 0 and indefinite ones are penalised by -1). We regard a noun phrase as definite if the head noun is modified by a definite article, or by demonstrative or possessive pronouns. This rule is ignored if there are no definite articles, possessive or demonstrative pronouns in the paragraph (this exception is taken into account because some English user's guides tend to omit articles).</Paragraph>
      <Paragraph position="1"> Givenness Noun phrases in previous sentences representing the &amp;quot;given information&amp;quot; (theme) 1 are deemed good candidates for antecedents and score 1 (candidates not representing the theme score 0). In a coherent text (Firbas 1992), the given or known information, or theme, usually appears first, and thus forms a co-referential link with the preceding text. The new information, or rheme, provides some information about the theme.</Paragraph>
      <Paragraph position="2"> lWe use the simple heuristics that the given information is the first noun phrase in a non-imperative sentence.  If a noun phrase occurs in the heading of the section, part of which is the current sentence, then we consider it as the preferred candidate (1, 0).</Paragraph>
      <Paragraph position="3"> &amp;quot;Non-prepositional&amp;quot; noun phrases A &amp;quot;pure&amp;quot;, &amp;quot;non-prepositional&amp;quot; noun phrase is given a higher preference than a noun phrase which is part of a prepositional phrase (0, -1). Example: Insert the cassette i into the VCR making sure it i is suitable for the length of recording.</Paragraph>
      <Paragraph position="4"> Here &amp;quot;the VCR&amp;quot; is penalised (-1) for being part of the prepositional phrase &amp;quot;into the VCR&amp;quot;.</Paragraph>
      <Paragraph position="5"> This preference can be explained in terms of salience from the point of view of the centering theory. The latter proposes the ranking &amp;quot;subject, direct object, indirect object&amp;quot; (Brennan et al. 1987) and noun phrases which are parts of prepositional phrases are usually indirect objects.</Paragraph>
      <Paragraph position="6"> Collocation pattern preference This preference is given to candidates which have an identical collocation pattern with a pronoun (2,0).</Paragraph>
      <Paragraph position="7"> The collocation preference here is restricted to the patterns &amp;quot;noun phrase (pronoun), verb&amp;quot; and &amp;quot;verb, noun phrase (pronoun)&amp;quot;. Owing to lack of syntactic information, this preference is somewhat weaker than the collocation preference described in (Dagan &amp; Itai 1990). Example: Press the key i down and turn the volume up... Press it i again.</Paragraph>
      <Paragraph position="8"> Immediate reference In technical manuals the &amp;quot;immediate reference&amp;quot; clue can often be useful in identifying the antecedent.</Paragraph>
      <Paragraph position="9"> The heuristics used is that in constructions of the form &amp;quot;...(You) V l NP ... con (you) V 2 it (con (you) V 3 it)&amp;quot;, where con ~ {and/or/before/after...}, the noun phrase immediately after V l is a very likely candidate for antecedent of the pronoun &amp;quot;it&amp;quot; immediately following V 2 and is therefore given preference (scores 2 and 0).</Paragraph>
      <Paragraph position="10"> This preference can be viewed as a modification of the collocation preference. It is also quite frequent with imperative constructions. Example: To print the paper, you can stand the printer i up or lay it i flat.</Paragraph>
      <Paragraph position="11"> To turn on the printer, press the Power button i and hold it i down for a moment.</Paragraph>
      <Paragraph position="12"> Unwrap the paper i, form it i and align it i, then load it i into the drawer.</Paragraph>
      <Paragraph position="13"> Referential distance In complex sentences, noun phrases in the previous clause 2 are the best candidate for the antecedent of an anaphor in the subsequent clause, followed by noun phrases in the previous sentence, then by nouns situated 2 sentences further back and finally nouns 3 sentences further back (2, 1, 0, -1). For anaphors in simple sentences, noun phrases in the previous sentence are the best candidate for antecedent, followed by noun phrases situated 2 sentences further back and finally nouns 3 sentences further back (1, 0, -1). Term preference NPs representing terms in the field are more likely to be the antecedent than NPs which are not terms (score 1 if the NP is a term and 0 if not).</Paragraph>
      <Paragraph position="14">  As already mentioned, each of the antecedent indicators assigns a score with a value {-1, 0, 1, 2}. These scores have been determined experimentally on an empirical basis and are constantly being updated. Top symptoms like &amp;quot;lexical reiteration&amp;quot; assign score &amp;quot;2&amp;quot; whereas &amp;quot;non-prepositional&amp;quot; noun phrases are given a negative score of &amp;quot;-1&amp;quot;. We should point out that the antecedent indicators are preferences and not absolute factors. There might be cases where one or more of the antecedent indicators do not &amp;quot;point&amp;quot; to the correct antecedent. For instance, in the sentence &amp;quot;Insert the cassette into the VCR i making sure it i is turned on&amp;quot;, the indicator &amp;quot;non-prepositional noun phrases&amp;quot; would penalise the correct antecedent. When all preferences (antecedent indicators) are taken into account, however, the right antecedent is still very likely to be tracked down - in the above example, the &amp;quot;non-prepositional noun phrases&amp;quot; heuristics (penalty) would be overturned by the &amp;quot;collocational preference&amp;quot; heuristics.</Paragraph>
    </Section>
    <Section position="4" start_page="871" end_page="871" type="sub_section">
      <SectionTitle>
2.2 Informal description of the algorithm
</SectionTitle>
      <Paragraph position="0"> The algorithm for pronoun resolution can be described informally as follows: 1. Examine the current sentence and the two preceding sentences (if available). Look for noun phrases 3 only to the left of the anaphor 4  2. Select from the noun phrases identified only those which agree in gender and number 5 with the pronominal anaphor and group them as a set of potential candidates 3. Apply the antecedent indicators to each potential candidate and assign scores; the candidate with the highest aggregate score is proposed as 3A sentence splitter would already have segmented the  text into sentences, a POS tagger would already have determined the parts of speech and a simple phrasal grammar would already have detected the noun phrases  guages other than English (e.g. German); on the other hand, there are certain collective nouns in English which do not agree in number with their antecedents (e.g. &amp;quot;government&amp;quot;, &amp;quot;team&amp;quot;, &amp;quot;parliament&amp;quot; etc. can be referred to by &amp;quot;they&amp;quot;; equally some plural nouns (e.g. &amp;quot;data&amp;quot;) can be referred to by &amp;quot;it&amp;quot;) and are exempted from the agreement test. For this purpose we have drawn up a comprehensive list of all such cases; to our knowledge, no other computational treatment of pronominal anaphora resolution has addressed the problem of &amp;quot;agreement exceptions&amp;quot;. null antecedent. If two candidates have an equal score, the candidate with the higher score for immediate reference is proposed as antecedent.</Paragraph>
      <Paragraph position="1"> If immediate reference does not hold, propose the candidate with higher score for collocational pattern. If collocational pattern suggests a tie or does not hold, select the candidate with higher score for indicating verbs. If this indicator does not hold again, go for the most recent candidate.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML