File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-0138_intro.xml
Size: 2,392 bytes
Last Modified: 2025-10-06 14:03:49
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0138"> <Title>Using Part-of-Speech Reranking to Improve Chinese Word Segmentation</Title> <Section position="4" start_page="205" end_page="205" type="intro"> <SectionTitle> 2 Algorithm </SectionTitle> <Paragraph position="0"> Given an observed Chinese character sequence</Paragraph> <Paragraph position="2"> mentation sequence and a POS tagging sequence over X. Our goal is to find a segmentation sequence ^S and a POS tagging sequence ^T that maximize the posterior probability :</Paragraph> <Paragraph position="4"> Applying chain rule, we can further derive from Equation 1 the following:</Paragraph> <Paragraph position="6"> Since we have factorized the joint probability in Equation 1 into two terms, we can now model these two components using conditional random fields (Lafferty et al., 2001). Linear-chain CRF models define conditional probability, P(Z|X), by linear-chain Markov random fields. In our case, X is the sequence of characters or words, and Z is the segmentation labels for characters (START or NON-START, used to indicate word boundaries) or the POS tagging for words (NN, VV, JJ, etc.).</Paragraph> <Paragraph position="7"> The conditional probability is defined as:</Paragraph> <Paragraph position="9"> where N(X) is a normalization term to guarantee that the summation of the probability of all label sequences is one. fk(Z,X,t) is the kth localfeaturefunction at sequence position t. It maps a pair of X and Z and an index t to {0,1}.</Paragraph> <Paragraph position="10"> (l1,...,lK) is a weight vector to be learned from training set. A large positive value of li means that the ith feature function's value is frequent to be 1, whereas a negative value of li means the ith feature function's value is unlikely to be 1.</Paragraph> <Paragraph position="11"> At decoding time, we are interested in finding the segmentation sequence ^S and POS tagging sequence ^T that maximizes the probability defined in Equation 2. Instead of exhaustively searching the whole space of all possible segmentations, we restrict our searching to S = {S1,S2,...,SN}, where S is the restricted search space consisting of N-best decoded segmentation sequences. This N-best list of segmentation sequences, S, can be obtained using modified Viterbi algorithm and A* search (Schwartz and Chow, 1990).</Paragraph> </Section> class="xml-element"></Paper>