File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1023_intro.xml
Size: 2,095 bytes
Last Modified: 2025-10-06 14:02:02
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1023"> <Title>Using the Web in Machine Learning for Other-Anaphora Resolution</Title> <Section position="3" start_page="3" end_page="3" type="intro"> <SectionTitle> 2 Data Collection and Preparation </SectionTitle> <Paragraph position="0"> We collected 500 other-anaphors with NP antecedents from the Wall Street Journal corpus (Penn Treebank, release 2). This data sample excludes several types of expressions containing &quot;other&quot;: (a) list-contexts (Ex. 4) and other-than contexts (footnote 2), in which the antecedents are available structurally and thus a relatively unsophisticated procedure would suffice to find them; (b) idiomatic and discourse connective &quot;other&quot;, e.g., &quot;on the other In parallel, efforts have been made to enrich WordNet by adding information in glosses (Harabagiu et al., 1999).</Paragraph> <Paragraph position="1"> hand&quot;, which are not anaphoric; and (c) reciprocal &quot;each other&quot; and &quot;one another&quot;, elliptic phrases e.g. &quot;one X . . . the other(s)&quot; and one-anaphora, e.g., &quot;the other/another one&quot;, which behave like pronouns and thus would require a different search method. Also excluded from the data set are samples of other-anaphors with non-NP antecedents (e.g., adjectival and nominal pre- and postmodifiers and clauses).</Paragraph> <Paragraph position="2"> Each anaphor was extracted in a 5-sentence context. The correct antecedents were manually annotated to create a training/test corpus. For each anaphor, we automatically extracted a set of potential NP antecedents as follows. First, we extracted all base NPs, i.e., NPs that contain no further NPs within them. NPs containing a possessive NP modifier, e.g., &quot;Spain's economy&quot;, were split into a possessor phrase, &quot;Spain&quot;, and a possessed entity, &quot;economy&quot;. We then filtered out null elements and lemmatised all antecedents and anaphors.</Paragraph> </Section> class="xml-element"></Paper>