File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/p02-1062_evalu.xml
Size: 2,811 bytes
Last Modified: 2025-10-06 13:58:53
<?xml version="1.0" standalone="yes"?> <Paper uid="P02-1062"> <Title>Ranking Algorithms for Named-Entity Extraction: Boosting and the Voted Perceptron</Title> <Section position="7" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Experiments </SectionTitle> <Paragraph position="0"> We applied the voted perceptron and boosting algorithms to the data described in section 2.3. Only features occurring on 5 or more distinct training sentences were included in the model. This resulted 5Note that, for reasons of explication, the decoding algorithm we present is less efficient than necessary. For example, when a247 a53a77a248 a247 a53 a106a23a95 it is preferable to use some book-keeping to</Paragraph> <Paragraph position="2"> ures in parantheses are relative improvements in error rate over the maximum-entropy model. All figures are percentages.</Paragraph> <Paragraph position="3"> in 93,777 distinct features. The two methods were trained on the training portion (41,992 sentences) of the training set. We used the development set to pick the best values for tunable parameters in each algorithm. For boosting, the main parameter to pick is the number of rounds, a154 . We ran the algorithm for a total of 300,000 rounds, and found that the optimal value for F-measure on the development set occurred after 83,233 rounds. For the voted perceptron, the representation a178 a22 a16a73a24 was taken to be a vector a147a4a3 a135 a22 a16a25a24a31a30a33a32 a34 a22 a16a25a24 a47a4a47a4a47 a32 a163 a22 a16a73a24 a148 where a3 is a parameter that influences the relative contribution of the log-likelihood term versus the other features. A value of a3 a38 a3a104a47a6a5 was found to give the best results on the development set. Figure 5 shows the results for the three methods on the test set. Both of the reranking algorithms show significant improvements over the baseline: a 15.6% relative reduction in error for boosting, and a 17.7% relative error reduction for the voted perceptron.</Paragraph> <Paragraph position="4"> In our experiments we found the voted perceptron algorithm to be considerably more efficient in training, at some cost in computation on test examples. Another attractive property of the voted perceptron is that it can be used with kernels, for example the kernels over parse trees described in (Collins and Duffy 2001; Collins and Duffy 2002). (Collins and Duffy 2002) describe the voted perceptron applied to the named-entity data in this paper, but using kernel-based features rather than the explicit features described in this paper. See (Collins 2002) for additional work using perceptron algorithms to train tagging models, and a more thorough description of the theory underlying the perceptron algorithm applied to ranking problems.</Paragraph> </Section> class="xml-element"></Paper>