XML Viewer - w03-0428

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0428_intro.xml
Size: 2,037 bytes
Last Modified: 2025-10-06 14:01:56
<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0428">
  <Title>Named Entity Recognition with Character-Level Models</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> For most sequence-modeling tasks with word-level evaluation, including named-entity recognition and part-of-speech tagging, it has seemed natural to use entire words as the basic input features. For example, the classic HMM view of these two tasks is one in which the observations are words and the hidden states encode class labels. However, because of data sparsity, sophisticated unknown word models are generally required for good performance. A common approach is to extract word-internal features from unknown words, for example suffix, capitalization, or punctuation features (Mikheev, 1997, Wacholder et al., 1997, Bikel et al., 1997). One then treats the unknown word as a collection of such features. Having such unknown-word models as an add-on is perhaps a misplaced focus: in these tasks, providing correct behavior on unknown words is typically the key challenge.</Paragraph>
    <Paragraph position="1"> Here, we examine the utility of taking character sequences as a primary representation. We present two models in which the basic units are characters and character a4 -grams, instead of words and word phrases. Earlier papers have taken a character-level approach to named entity recognition (NER), notably Cucerzan and Yarowsky (1999), which used prefix and suffix tries, though to our knowledge incorporating all character a4 -grams is new. In section 2, we discuss a character-level HMM, while in section 3 we discuss a sequence-free maximum-entropy (maxent) classifier which uses a4 -gram substring features. Finally, in section 4 we add additional features to the maxent model, and chain these models into a conditional markov model (CMM), as used for tagging (Ratnaparkhi, 1996) or earlier NER work (Borthwick, 1999).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML