File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-2010_metho.xml
Size: 3,570 bytes
Last Modified: 2025-10-06 14:08:11
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-2010"> <Title>Named Entity Recognition as a House of Cards: Classifier Stacking</Title> <Section position="4" start_page="70" end_page="70" type="metho"> <SectionTitle> 3 Breaking-Up the Task </SectionTitle> <Paragraph position="0"> Munoz et al. (1999) examine a different method of chunking, called Open/Close (O/C) method: 2 classifiers are used, one predicting open brackets and one predicting closed brackets. A final optimization stage pairs open and closed brackets through a global search.</Paragraph> <Paragraph position="1"> We propose here a method that is similar in spirit to the O/C method, and also to Carreras and Marquez (2001), Arevalo et al. (2002): 1. In the first stage, detect only the entity boundaries, without identifying their type, using the fnTBL system 3 ; 2. Using a forward-backward type algorithm (FB henceforth), determine the most probable type of each entity detected in the first step. This method has some enticing properties: AF Detecting only the entity boundaries is a simpler problem, as different entity types share common features; Table 3 shows the performance obtained by the fnTBL system - the performance is sensibly higher than the one shown in Table 2; AF The FB algorithm allows for a global search for the optimum, which is beneficial since both fnTBL and Snow perform only local optimizations; null AF The FB algorithm has access to both entityinternal and external contextual features (as first described in McDonald (1996)); furthermore, since the chunks are collapsed, the local area is also larger in span.</Paragraph> <Paragraph position="2"> The input to the FB algorithm consists of a series probability. These probabilities are computed using the standard Markov assumption of independence, and the forward-backward algorithm .</Paragraph> <Paragraph position="3"> Both internal and external models are using 5-gram language models, smoothed using the modified discount method of Chen and Goodman (1998). In the case of unseen words, backoff to the capitalization tag is performed: if DB is assumed to be exponentially distributed.</Paragraph> <Paragraph position="4"> Table 4 shows the results obtained by stacking the FB algorithm on top of fnTBL. Comparing the results with the ones in Table 2, one can observe that the global search does improve the performance by 3 F-measure points when compared with fnTBL+Snow and 5 points when compared with the fnTBL system. Also presented in Table 4 is the performance of the algorithm on perfect boundaries; more than 6 F-measure points can be gained by improving the boundary detection alone. Table 5 presents the detailed performance of the FB algorithm on all four data sets, broken by entity type. A quick analysis of the results revealed that most errors were made on the unknown words, both in We use the notation DB It is notable here that the best entity type for a chunk is computed by selecting the best entity in all combinations of the other entity assignments in the sentence. This choice is made because it reflects better the scoring method, and makes the algorithm more similar to the HMM's forward-backward algorithm (Jelinek, 1997, chapter 13) rather than the Viterbi algorithm.</Paragraph> <Paragraph position="5"> Spanish and Dutch: the accuracy on known words is 97.4%/98.9% (Spanish/Dutch), while the accuracy on unknown words is 83.4%/85.1%. This suggests that lists of entities have the potential of being extremely beneficial for the algorithm.</Paragraph> </Section> class="xml-element"></Paper>