File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/w06-0138_abstr.xml

Size: 1,017 bytes

Last Modified: 2025-10-06 13:45:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0138">
  <Title>Using Part-of-Speech Reranking to Improve Chinese Word Segmentation</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Chinese word segmentation and Part-of-Speech (POS) tagging have been commonly considered as two separated tasks.</Paragraph>
    <Paragraph position="1"> In this paper, we present a system that performs Chinese word segmentation and POS tagging simultaneously. We train a segmenter and a tagger model separately based on linear-chain Conditional Random Fields (CRF), using lexical, morphological and semantic features. We propose an approximated joint decoding method by reranking the N-best segmenter output, based POS tagging information. Experimental results on SIGHAN Bakeoff dataset and Penn Chinese Treebank show that our reranking method significantly improve both segmentation and POS tagging accuracies.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML