File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-0705_intro.xml
Size: 3,068 bytes
Last Modified: 2025-10-06 14:02:28
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0705"> <Title>Applying Coreference to Improve Name Recognition</Title> <Section position="3" start_page="0" end_page="1" type="intro"> <SectionTitle> 6 Evaluation (Grishman and Sundheim, 1996). A </SectionTitle> <Paragraph position="0"> wide variety of machine learning methods have been applied to this problem, including Hidden Markov Models (Bikel et al. 1997), Maximum Entropy methods (Borthwick et al. 1998, Chieu and Ng 2002), Decision Trees (Sekine et al. 1998), Conditional Random Fields (McCallum and Li 2003), Class-based Language Model (Sun et al.</Paragraph> <Paragraph position="1"> 2002), Agent-based Approach (Ye et al. 2002) and Support Vector Machines. However, the performance of even the best of these models has been limited by the amount of labeled training data available to them and the range of features which they employ. In particular, most of these methods classify an instance of a name based on the information about that instance alone, and very local context of that instance - typically, one or The best results reported for Chinese named entity recognition, on the MET-2 test corpus, are 0.92 to 0.95 F-measure for the different name types (Ye et al. 2002). two words preceding and following the name. If a name has not been seen before, and appears in a relatively uninformative context, it becomes very hard to classify.</Paragraph> <Paragraph position="2"> We propose to use more global information to improve the performance of name recognition.</Paragraph> <Paragraph position="3"> Some name taggers have incorporated a name cache or similar mechanism which makes use of names previously recognized in the document. In our approach, we perform coreference analysis and then use detailed evidence from other phrases in the document which are co-referential with this name in order to disambiguate the name. This allows us to perform a richer set of corrections than with a name cache. We then go one step further and process similar documents containing instances of the same name, and combine the evidence from these additional instances. At each step we are able to demonstrate a small but consistent improvement in named entity recognition.</Paragraph> <Paragraph position="4"> The rest of the paper is organized as follows. Section 2 briefly describes the baseline name tagger and coreference resolver used in this paper. Section 3 considers methods for assessing the confidence of name tagging decisions. Section 4 examines the distribution of name errors, as a motivation for using coreference information.</Paragraph> <Paragraph position="5"> Section 5 shows the coreference features we use and how they are incorporated into a statistical name filter. Section 6 describes additional rules using coreference to improve name recognition.</Paragraph> <Paragraph position="6"> Section 7 provides the flow graph of the improved system. Section 8 reports and discusses the experimental results while Section 9 summarizes the conclusions.</Paragraph> </Section> class="xml-element"></Paper>