File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-0138_intro.xml

Size: 2,392 bytes

Last Modified: 2025-10-06 14:03:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0138">
  <Title>Using Part-of-Speech Reranking to Improve Chinese Word Segmentation</Title>
  <Section position="4" start_page="205" end_page="205" type="intro">
    <SectionTitle>
2 Algorithm
</SectionTitle>
    <Paragraph position="0"> Given an observed Chinese character sequence</Paragraph>
    <Paragraph position="2"> mentation sequence and a POS tagging sequence over X. Our goal is to find a segmentation sequence ^S and a POS tagging sequence ^T that maximize the posterior probability :</Paragraph>
    <Paragraph position="4"> Applying chain rule, we can further derive from Equation 1 the following:</Paragraph>
    <Paragraph position="6"> Since we have factorized the joint probability in Equation 1 into two terms, we can now model these two components using conditional random fields (Lafferty et al., 2001). Linear-chain CRF models define conditional probability, P(Z|X), by linear-chain Markov random fields. In our case, X is the sequence of characters or words, and Z is the segmentation labels for characters (START or NON-START, used to indicate word boundaries) or the POS tagging for words (NN, VV, JJ, etc.).</Paragraph>
    <Paragraph position="7"> The conditional probability is defined as:</Paragraph>
    <Paragraph position="9"> where N(X) is a normalization term to guarantee that the summation of the probability of all label sequences is one. fk(Z,X,t) is the kth localfeaturefunction at sequence position t. It maps a pair of X and Z and an index t to {0,1}.</Paragraph>
    <Paragraph position="10"> (l1,...,lK) is a weight vector to be learned from training set. A large positive value of li means that the ith feature function's value is frequent to be 1, whereas a negative value of li means the ith feature function's value is unlikely to be 1.</Paragraph>
    <Paragraph position="11"> At decoding time, we are interested in finding the segmentation sequence ^S and POS tagging sequence ^T that maximizes the probability defined in Equation 2. Instead of exhaustively searching the whole space of all possible segmentations, we restrict our searching to S = {S1,S2,...,SN}, where S is the restricted search space consisting of N-best decoded segmentation sequences. This N-best list of segmentation sequences, S, can be obtained using modified Viterbi algorithm and A* search (Schwartz and Chow, 1990).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML