File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-0116_intro.xml

Size: 1,550 bytes

Last Modified: 2025-10-06 14:03:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0116">
  <Title>Chinese Named Entity Recognition with Conditional Random Fields</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Named Entity Recognition task in the 2006 Sighan Bakeoff includes three corpora: Microsoft Research (MSRA), City University of Hong Kong (CityU), and Linguistic Data Consortium (LDC).</Paragraph>
    <Paragraph position="1"> There are four types of Named Entities in the corpora: Person Name, Organization Name, Location Name, and Geopolitical Entity (only included in LDC corpus).</Paragraph>
    <Paragraph position="2"> We attend the close track of all three corpora. In the close track, we can not use any external resources. Thus except basic features, we define some additional features by applying statistics in training corpus to replace external resources. Firstly, we perform word segmentation using a simple left-to-right maximum matching algorithm, in which we use a word dictionary generated by doing n-gram statistics. Then we define the features based on word boundaries. Secondly, we generate several lists according to the relative position to Named Entity (NE). We define another type of features based on these lists. Using these features, we build a Conditional Random Fields(CRFs)-based Named Entity Recognition (NER) System. We use the system to generate n-best results for every sentence, and then perform a post-processing.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML