File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/93/w93-0114_concl.xml
Size: 2,512 bytes
Last Modified: 2025-10-06 13:57:08
<?xml version="1.0" standalone="yes"?> <Paper uid="W93-0114"> <Title>CATEGORIZING AND STANDARDIZING PROPER NOUNS FOR EFFICIENT INFORMATION RETRIEVAL</Title> <Section position="7" start_page="158" end_page="159" type="concl"> <SectionTitle> 8. Conclusion </SectionTitle> <Paragraph position="0"> To compare our proper noun categorization results to the evaluation of a system with similar goals in the fiterature, we chose Coates-Stephcns' (1992) result on acquiring genus information of proper nouns to compare our overall precision. While his approach is to acquire information about unknown proper nouns' detailed genus and differentia description, we consider our approach of assigning a category from a classification scheme of 30 classes to an unknown proper noun generally similar in purpose to his acquisition of genus information. However, it should be noted that our method for assigning categories to proper nouns is different from Coates-Stephens' method, as we rely more on built-in knowledge bases while his approach relies more on context.</Paragraph> <Paragraph position="1"> Based on 100 unseen documents which had 535 unknown proper nouns, FUNES (Coates-Stephens, 1992) successfully acquired genus information for 340 proper nouns. Of the 195 proper nouns not acquired, 92 were due to the system's parse failure. Thus, the success ratio based on only the proper nouns which were analyzed by the system, was 77%. DR-LINK's proper noun categorizer's overall precision, which is computed with the same formula, was 93%, including proper nouns which were correctly categorized as miscellaneous.</Paragraph> <Paragraph position="2"> Katoh's (1991) evaluation of his machinetranslation system, which was based on translating the 1,000 most frequent names in the AP news corpus, successfully analyzed 94% of the 1,000 names. Our precision figure of categorizing person names was 90%.</Paragraph> <Paragraph position="3"> Finally, the evaluation result from Rau's (1991) company name extractor is compared to the precision figure of our company name categorization. Both systems relied heavily on company name suffixes.</Paragraph> <Paragraph position="4"> Rau's result showed 97.5% success ratio of the program's extraction of company names that had company name suffixes. Our system's precision figure was 88%. However, it should be noted that our result is based on all company names, including those which did not have any clear company name suffixes or prefixes.</Paragraph> </Section> class="xml-element"></Paper>