File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/w04-3236_abstr.xml

Size: 1,581 bytes

Last Modified: 2025-10-06 13:44:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3236">
  <Title>Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based?</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Chinese part-of-speech (POS) tagging assigns one POS tag to each word in a Chinese sentence. However, since words are not demarcated in a Chinese sentence, Chinese POS tagging requires word segmentation as a prerequisite. We could perform Chinese POS tagging strictly after word segmentation (one-at-a-time approach), or perform both word segmentation and POS tagging in a combined, single step simultaneously (all-atonce approach). Also, we could choose to assign POS tags on a word-by-word basis, making use of word features in the surrounding context (word-based), or on a character-by-character basis with character features (character-based). This paper presents an in-depth study on such issues of processing architecture and feature representation for Chinese POS tagging, within a maximum entropy framework. We found that while the all-at-once, character-based approach is the best, the one-at-a-time, character-based approach is a worthwhile compromise, performing only slightly worse in terms of accuracy, but taking shorter time to train and run. As part of our investigation, we also built a state-of-the-art Chinese word segmenter, which outperforms the best SIGHAN 2003 word segmenters in the closed track on 3 out of 4 test corpora.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML