File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-0131_intro.xml

Size: 1,402 bytes

Last Modified: 2025-10-06 14:03:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0131">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics POC-NLW Template for Chinese Word Segmentation</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In Chinese word segmentation, there are two problems still remain, one is the resolution of ambiguity, and the other is the identification of so-called out-of-vocabulary (OOV) or unknown words. In order to resolve these two problems, a two-stage statistical word segmentation strategy is adopted in our system. The first stage is optional, and the whole segmentation can be accomplished in the second stage. In the first stage, the n-gram language model is employed to implement basic word segmentation including disambiguation. In the second stage, a language tagging template named POC-NLW (position of a character within an n-length word) is introduced to accomplish unknown word identification as template-based character tagging.</Paragraph>
    <Paragraph position="1"> The remainder of this paper is organized as follows. In section 2 and section 3, a briefly description of the main methods adopted in our system is given. Results of our system at this bakeoff are reported in section 4. At last, conclusions are derived in section 5.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML