File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/93/w93-0105_concl.xml
Size: 2,645 bytes
Last Modified: 2025-10-06 13:57:07
<?xml version="1.0" standalone="yes"?> <Paper uid="W93-0105"> <Title>Identifying Unknown Proper Names in Newswire Text</Title> <Section position="6" start_page="50" end_page="51" type="concl"> <SectionTitle> 5 Conclusion </SectionTitle> <Paragraph position="0"> The system has been run on one million words of text (two years of WSJ training corpus as well as the \[Kahaner, 91\] email corpus). The identification of person names and geographical locations is in place, as well as a rudimentary organization tagger (which does not extract any interesting attributes regarding the organization). The pegs-based Coreference KS has been implemented, but the breaking of a link from a mention to a peg is not as yet propagated to other pegs. We have not yet implemented a treatment of partial dependents, which involve modeling inter-relationships among pegs. Problems we are currently working on include conjunctions (e.g. is &quot;AVX and Kyocera&quot; a single entity?), the treatment of partial dependents and references to sets (e.g. the discourse &quot;Indira Gandhi .... Rajiv Gandhi....the Gandhis&quot;). We are also investigating the applicability of Bayesian inference networks to the overall problem.</Paragraph> <Paragraph position="1"> Recently, we conducted an empirical evaluation of the system. In a nutshell (details are deferred to a separate paper), the evaluation was carried out on a test set of 42 hand-tagged WSJ articles, using a scoring program we developed. The hand-tagging marked only the type of the tag (person, organization, or location), ignoring attributes. Scores on <precision, recall> varied from <76%, 72%> to <84%, 80%>, depending on whether partial matches (e.g. only a fragment of a name in the program's tag, or a title identified as part of a name) were accepted. We soon expect to more directly evaluate the Coreference KS, but in the meantime we can offer the observation that the Coreference KS has been observed to be extremely effective (apart from the exceptions we mentioned earlier) for name mentions in the WSJ, especially for people mentions.</Paragraph> <Paragraph position="2"> In conclusion, then, we have found that a treatment of proper names as potentially context-dependent linguistic expressions can be effectively applied to the problem of unknown name identification in newswire text, especially when combined with local-context based text skimming. In addition to determining more precisely the genre limitations of such an approach, one future direction would be to consider porting the system to another language.</Paragraph> </Section> class="xml-element"></Paper>