File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/p95-1017_intro.xml

Size: 2,696 bytes

Last Modified: 2025-10-06 14:05:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="P95-1017">
  <Title>Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Anaphora resolution is an important but still difficult problem for various large-scale natural language processing (NLP) applications, such as information extraction and machine tr~slation. Thus far, no theories of anaphora have been tested on an empirical basis, and therefore there is no answer to the &amp;quot;best&amp;quot; anaphora resolution algorithm. I Moreover, an anaphora resolution system within an NLP system for real applications must handle: * degraded or missing input (no NLP system has complete lexicons, grammars, or semantic knowledge and outputs perfect results), and * different anaphoric phenomena in different domains, languages, and applications.</Paragraph>
    <Paragraph position="1"> Thus, even if there exists a perfect theory, it might not work well with noisy input, or it would not cover all the anaphoric phenomena.</Paragraph>
    <Paragraph position="2">  a~ad Pollard's centering approach (Brennan et al., 1987) with Hobbs' algorithm (Hohbs, 1976) on a theoretical basis.</Paragraph>
    <Paragraph position="3"> These requirements have motivated us to develop robust, extensible, and trainable anaphora resolution systems. Previously (Aone and Mc-Kee, 1993), we reported our data-driven multilingual anaphora resolution system, which is robust, exteusible, and manually trainable. It uses discourse knowledge sources (KS's) which are manually selected and ordered. (Henceforth, we call the system the Manually-Designed Resolver, or MDR.) We wanted to develop, however, truly automatically trainable systems, hoping to improve resolution performance and reduce the overhead of manually constructing and arranging such discourse data.</Paragraph>
    <Paragraph position="4"> In this paper, we first describe one approach we are taking to build an automatically trainable anaphora resolution system. In this approach, we tag corpora with discourse information, and use them as training examples for a machine learning algorithm. (Henceforth, we call the system the Machine Learning-based Resolver, or MLR.) Specifically, we have tagged Japanese newspaper articles about joint ventures and used the C4.5 decision tree algorithm by Quinlan (Quinlan, 1993). Then, we evaluate and compare the results of the MLR with those produced by the MDR. Finally, we compare our algorithms with existing theories of anaphora, in particular, Japanese zero pronouns.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML