File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1023_intro.xml

Size: 2,095 bytes

Last Modified: 2025-10-06 14:02:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1023">
  <Title>Using the Web in Machine Learning for Other-Anaphora Resolution</Title>
  <Section position="3" start_page="3" end_page="3" type="intro">
    <SectionTitle>
2 Data Collection and Preparation
</SectionTitle>
    <Paragraph position="0"> We collected 500 other-anaphors with NP antecedents from the Wall Street Journal corpus (Penn Treebank, release 2). This data sample excludes several types of expressions containing &amp;quot;other&amp;quot;: (a) list-contexts (Ex. 4) and other-than contexts (footnote 2), in which the antecedents are available structurally and thus a relatively unsophisticated procedure would suffice to find them; (b) idiomatic and discourse connective &amp;quot;other&amp;quot;, e.g., &amp;quot;on the other  In parallel, efforts have been made to enrich WordNet by adding information in glosses (Harabagiu et al., 1999).</Paragraph>
    <Paragraph position="1"> hand&amp;quot;, which are not anaphoric; and (c) reciprocal &amp;quot;each other&amp;quot; and &amp;quot;one another&amp;quot;, elliptic phrases e.g. &amp;quot;one X . . . the other(s)&amp;quot; and one-anaphora, e.g., &amp;quot;the other/another one&amp;quot;, which behave like pronouns and thus would require a different search method. Also excluded from the data set are samples of other-anaphors with non-NP antecedents (e.g., adjectival and nominal pre- and postmodifiers and clauses).</Paragraph>
    <Paragraph position="2"> Each anaphor was extracted in a 5-sentence context. The correct antecedents were manually annotated to create a training/test corpus. For each anaphor, we automatically extracted a set of potential NP antecedents as follows. First, we extracted all base NPs, i.e., NPs that contain no further NPs within them. NPs containing a possessive NP modifier, e.g., &amp;quot;Spain's economy&amp;quot;, were split into a possessor phrase, &amp;quot;Spain&amp;quot;, and a possessed entity, &amp;quot;economy&amp;quot;. We then filtered out null elements and lemmatised all antecedents and anaphors.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML