File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-1658_evalu.xml
Size: 2,380 bytes
Last Modified: 2025-10-06 13:59:52
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1658"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Entity Annotation based on Inverse Index Operations</Title> <Section position="7" start_page="498" end_page="499" type="evalu"> <SectionTitle> 5.2 Results </SectionTitle> <Paragraph position="0"> In our first experiment, we performed annotation of the two corpora for 4 annotation types using 2 JAPE rules for each type. The 4 annotation types were 'Person name', 'Organization', 'Location' and 'Date'. A sample JAPE rule for identifying person names is shown in Figure 9. This rule identifies a sequence of words as a person name when each word in the sequence starts with an alphabet in upper-case and when the sequence is immediately preceded by a word from a dictionary of 'INITIAL's. Example words in the 'INITIAL' dictionary are: 'Mr.', 'Dr.', 'Lt.', etc.</Paragraph> <Paragraph position="1"> Table 1 compares the time taken by the index-based annotator against that taken by GATE for the 8 JAPE rules. The index-based annotator performs 8-13 times faster than GATE. Table 2 splits the time mentioned for the index-based annotator in Table 1 into the time taken for the task of computing postings lists for basic entities and derived entities (c.f. Section 2) for each of the data sets. We can also observe that a greater speedup is achieved for the larger corpus.</Paragraph> <Paragraph position="2"> postings lists of entity types An important advantage of performing annotations over the inverse index is that index entries for basic entity types can be preserved and reused for annotation types as additional rules for annotation are specified by users. For instance, the index entry for 'Capsword' might find reuse in several annotation rules. As against this, a document-based annotator has to process each document from scratch for every newly introduced annotation rule. To verify this, we introduced 1 additional rule for each of the 4 named entity types. In Table 3, we compare the time required by the index-based annotator against that required by GATE for annotating the two corpora using the 4 additional rules. We achieve a greater speedup factor of 23-37 for incremental annotation.</Paragraph> <Paragraph position="3"> notations using the two techniques for the additional 4 rules</Paragraph> </Section> class="xml-element"></Paper>