File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1728_metho.xml
Size: 8,222 bytes
Last Modified: 2025-10-06 14:08:37
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1728"> <Title>Right Boundary (R) Not Right Boundary (M) Left Boundary (L) LR LM Not Left Boundary (M) MR MM Table 1: LMR Tagging 2 Tagging Algorithms</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Tagging Algorithms </SectionTitle> <Paragraph position="0"> Our algorithm consists of two parts. We first implement two Maximum Entropy taggers, one of which scans the input from left to right and the other scans the input from right to left. Then we implement a Transformation Based Algorithm to combine the results of the two taggers.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 The Maximum Entropy Tagger </SectionTitle> <Paragraph position="0"> The Maximum Entropy Markov Model (MEMM) has been successfully used in some tagging problems. MEMM models are capable of utilizing a large set of features that generative models cannot use. On the other hand, MEMM approaches scan the input incrementally as generative models do.</Paragraph> <Paragraph position="1"> The Maximum Entropy Markov Model used in POS-tagging is described in detail in (Ratnaparkhi, 1996) and the LMR tagger here uses the same probability model. The probability model is defined over a12a14a13a16a15 , where a12 is the set of possible contexts or &quot;histories&quot; and a15 is the set of possible tags. The model's joint probability of a history a17 and a tag a18 is defined as</Paragraph> <Paragraph position="3"> are the model parameters and a51a56a55 a37 a22 a3a49a3a49a3 a22a57a55 a32a40a54 are known as features, where a55 a34 a20 a17a58a22a24a18a59a25a61a60a62a51a64a63a9a22 a8 a54 . Each feature a55 a34 has a corresponding parameter a34 , that effectively serves as a &quot;weight&quot; of this feature. In the training process, given a sequence of characters a32a4a54 that maximize the likelihood of the training data using a19 :</Paragraph> <Paragraph position="5"> The success of the model in tagging depends to a large extent on the selection of suitable features.</Paragraph> <Paragraph position="6"> Given a20 a17a23a22a24a18a24a25 , a feature must encode information that helps to predict a18 . The features we used in our experiments are instantiations of the feature templates in (1). Feature templates (b) to (e) represent character features while (f) represents tag features. In the following list, a79a81a80a83a82 a3a49a3a49a3 a79a84a82 are characters and a15 a80a83a82 a3a49a3a49a3a15 a82 are LMR tags.</Paragraph> <Paragraph position="7"> (1) Feature templates (a) Default feature (b) The current character (a79a84a85 ) (c) The previous (next) two characters (a79a81a80a83a86 , a79a81a80 a37 , a79 a37 , a79a84a86 ) (d) The previous (next) character and the current character (a79 a80 a37 a79 a85 , a79 a85 a79 a37 ), the previous two characters (a79a81a80a83a86a64a79a81a80 a37 ), and the next two characters (a79 a37 a79a84a86 ) (e) The previous and the next character (a79a81a80 a37 a79 a37 ) (f) The tag of the previous character (a15 a80 a37 ), and the tag of the character two before the current character (a15 a80a83a86 )</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Transformation-Based Learning </SectionTitle> <Paragraph position="0"> One potential problem with the MEMM is that it can only scan the input in one direction, from left to right or from right to left. It is noted in (Lafferty et al., 2001) that non-generative finite-state models, MEMM models included, share a weakness which they call the Label Bias Problem (LBP): a transition leaving a given state compete only against all other transitions in the model. They proposed Conditional Random Fields (CRFs) as a solution to address this problem.</Paragraph> <Paragraph position="1"> A partial solution to the LBP is to compute the probability of transitions in both directions. This way we can use two MEMM taggers, one of which scans the input from left to right and the other scans the input from right to left. This strategy has been successfully used in (Shen and Joshi, 2003). In that paper, pairwise voting (van Halteren et al., 1998) has been used to combine the results of two supertaggers that scan the input in the opposite directions.</Paragraph> <Paragraph position="2"> The pairwise voting is not suitable in this application because we must make sure that the LMR tags assigned to consecutive words are compatible.</Paragraph> <Paragraph position="3"> For example, an LM tag cannot immediately follow an MM. Pairwise voting does not use any contextual information, so it cannot prevent incompatible tags from occurring. Therefore, in our experiments described here, we use the Transformation-Based Learning (Brill, 1995) to combine the results of two MEMM taggers. The feature set used in the TBL algorithm is similar to those used in the NP Chunking task in (Ngai and Florian, 2001).</Paragraph> </Section> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Experiments </SectionTitle> <Paragraph position="0"> We conducted closed track experiments on three data sources: the Academia Sinica (AS) corpus, the Beijing University (PKU) corpus and the Hong Kong City University (CityU) corpus. We first split the training data from each of the three sources into two portions. a0a7a87a4a8 a63 of the official training data is used to train the MEMM taggers, and the other a8a64a87a4a8 a63 is held out as the development test data (the development set). The development set is used to estimate the optimal number of iterations in the MEMM training. Figure (1), (2) and (3) show the curves of F-scores on the development set with respect to the number of iterations in MEMM training.</Paragraph> <Paragraph position="1"> dataset of the Academia Sinica corpus. X-axis stands for the number of iteration in training. Y-axis stands for the a89 -score.</Paragraph> <Paragraph position="2"> Experiments show that the MEMM models dataset of the Beijing Univ. corpus.</Paragraph> <Paragraph position="3"> achieve the best results after 500 and 400 rounds (iterations) of training on the AS data and the PKU data respectively. However, the results on the CityU data is not very clear. From Round 100 through 200, the F-score on the development set almost stays unchanged. We think this is because the CityU data is from three different sources, which differ in the optimal number of iterations. We decided to train the MEMM taggers for 160 iterations the HK City University data.</Paragraph> <Paragraph position="4"> We implemented two MEMM taggers, one scans the input from left to right and one from right to left. We then used these two MEMM taggers to tag both the training and the development data. We use the LMR tagging output to train a Transformation-Based learner, using fast TBL (Ngai and Florian, 2001). The middle in Table 2 shows the F-score on the development set achieved by the MEMM tagger that scans the input from left to right and the last column is the results after the Transformation-Based Learner is applied. The results show that using Transformation-Based learning only give rise to slight improvements. It seems that the bidirectional approach does not help much for the LMR tagging.</Paragraph> <Paragraph position="5"> Therefore, we only submitted the results of our left-to-right MEMM tagger, retrained on the entire training sets, as our official results.</Paragraph> <Paragraph position="6"> The results on the official test data is similar to what we have got on our development set, except that the F-score on the Beijing Univ. corpus is over 2a6 lower in absolute accuracy than what we expected. The reason is that in the training data of Beijing University corpus, all the numbers are encoded in GBK, while in the test data many numbers are encoded in ASCII, which are unknown to our tagger. With this problem fixed, the results of the official test data are compatible with the results on our development set. However, we have withdrawn our segmentation results on the Beijing University corpus.</Paragraph> </Section> class="xml-element"></Paper>