File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/93/w93-0106_concl.xml
Size: 2,457 bytes
Last Modified: 2025-10-06 13:57:08
<?xml version="1.0" standalone="yes"?> <Paper uid="W93-0106"> <Title>Customizing a Lexicon to Better Suit a Computational Task</Title> <Section position="9" start_page="65" end_page="65" type="concl"> <SectionTitle> 6 Conclusions </SectionTitle> <Paragraph position="0"> We have discussed two approaches to augmenting and rearranging the components of a lexicon, in effect adding new features to its members, by making use of lexical association information from a large corpus. We've used lexical cooecurrence statistics in combination with a modified lexicon to classify proper names, associate more specific senses to broadly defined terms, and classify new words into existing categories with some degree of success.</Paragraph> <Paragraph position="1"> We've also used these statistics to suggest how to rearrange a lexicon with a taxonymic structure into more frame-like categories, and assigned more general main-topic labels to texts based on these categories.</Paragraph> <Paragraph position="2"> One conclusion that may be drawn from this work, especially the results in Section 4, is that we have provided a mechanism for successfully combining hand-built lexicon information with knowledge-free, statistically-derived information. The combined information from the categories derived from WordNet provided the clusters from which WordSpace centroids could be created, and these centroids in turn provided candidate words to improve the categories.</Paragraph> <Paragraph position="3"> In future, in addition to expanding the evaluation of the results described here, we would like to try reversing the experiment; that is, starting with WordSpace vectors, see which parts of WordNet should be interlinked into schematic categories.</Paragraph> <Paragraph position="4"> Acknowledgments The authors would like to thank Jan Pedersen for his help and encouragement. We are also indebted to Mike Berry for SVDPACK. The first author's research was sponsored in part by the Advanced Research Projects Agency under Grant No. MDA972-92-J-1029 with the Corporation for National Research Initiatives (CNRI), in part by an internship at Xerox Palo Alto Research Center; and this material is based in part upon work supported by the National Science Foundation under Infrastructure Grant No. CDA-8722788. The second author was supported in part by the National Center for Supercomputing Applications under grant BNS930000N.</Paragraph> </Section> class="xml-element"></Paper>