File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/93/h93-1062_concl.xml
Size: 2,753 bytes
Last Modified: 2025-10-06 13:57:02
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1062"> <Title>INTERPRETATION OF PROPER NOUNS FOR INFORMATION RETRIEVAL</Title> <Section position="7" start_page="312" end_page="312" type="concl"> <SectionTitle> 6. CONCLUSION </SectionTitle> <Paragraph position="0"> In comparing our proper noun categorization result to others in the literature, Coates-Stephens' (1992) result on acquiring genus information of proper nouns was contrasted to our overall precision. While his approach is to acquire information about unknown proper nouns' detailed genus and differentia description, we consider our approach of assigning a category from a classification scheme of 30 classes to an unknown proper noun generally similar in purpose to his acquisition of genus information.</Paragraph> <Paragraph position="1"> Based on 100 unseen documents which had 535 unknown proper nouns, FUNES (Coates-Stephens, 1992) successfully acquired genus information of 340 proper nouns. Of the 195 proper nouns not acquired, 92 were due to the system's parse failure. Thus, the success ratio based on only the proper nouns which were analyzed by the system, was 77%. DR-LINK proper noun eategorizer's overall precision, which is computed with the same formula, was 75%, including proper nouns which were correctly categorized as miscellaneous.</Paragraph> <Paragraph position="2"> Katoh's (1991) evaluation of his machine translation system, which was based on translating the 1,000 most frequent names in the AP news corpus, 94% of the 1,000 names were analyzed successfully. Our precision figure of categorizing person names was 46%. However, Katoh's system kept a list of 3,000 entries as a system lexicon before the testing. Thus, a considerable number of the 1,000 most frequent names would have been already known, while DR-LINK system's proper noun categorizer had only 47 entries of person names in the proper noun knowledge base before the testing.</Paragraph> <Paragraph position="3"> Therefore, we believe that the performance of our person name categorization will improve significantly by the addition of a list of common first names in our knowledge base.</Paragraph> <Paragraph position="4"> Finally, the evaluation result from Rau's (1991) company name extractor is compared to the precision figure of our company name categorization. Both system relied heavily on company name suffixes. Ran's result showed 97.5% success ratio of the program's extraction of company names that had company name suffixes. Our system's precision figure was 87%.</Paragraph> <Paragraph position="5"> However, it should be noted that our results are based on all company names, even those which did not have any clear suffixes or prefixes.</Paragraph> </Section> class="xml-element"></Paper>