File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-0429_evalu.xml
Size: 1,743 bytes
Last Modified: 2025-10-06 13:59:00
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0429"> <Title>Named Entity Recognition using Hundreds of Thousands of Features</Title> <Section position="5" start_page="0" end_page="0" type="evalu"> <SectionTitle> 4 Results </SectionTitle> <Paragraph position="0"> We evaluated our approach using the CoNLL-2003 English and German training and test sets, and the conlleval scoring software. We ran two baseline tests using Thorsten Brants' TnT tagger (2000), and two tests of SVM-Lattice: 1. TnT: The TnT tagger applied as distributed. 2. TnT+subcat: The TnT tagger applied to a refined tag set. Each tag type was subcategorized into about forty subtag types; each instance of a tag in the text was then replaced by the appropriate subtag. For example, a number (e.g., 221) that was part of a loca-tion received an I-LOC-alldigits tag; a location with an initial capital letter (e.g., Baker) received an I-LOC-initcap tag; and one of the 30 most common words (e.g., of) that was part of a location received a (word-specific) I-LOC-of tag. This run served both to calibrate the SVM-Lattice performance scores, and to provide input for the SVM-Lattice+ run below. null 3. SVM-Lattice: Features 1-10 (listed above in the Features section) 4. SVM-Lattice+: Features 1-11, using the output of runs SVM-Lattice and TnT+subcat as input features. null Scores for each English test are shown in Table 1; German tests are shown in Table 2. Table 3 shows the results of the SVM-Lattice+ run in more detail. The results show that the technique performs well, at least compared with the baseline technique provided with the CoNLL-2003 data (whose English Test B Fb=1 measure is 59.61 English and 30.30 German).</Paragraph> </Section> class="xml-element"></Paper>