File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/93/w93-0113_concl.xml
Size: 2,261 bytes
Last Modified: 2025-10-06 13:57:08
<?xml version="1.0" standalone="yes"?> <Paper uid="W93-0113"> <Title>Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntactic and Window Based Approaches</Title> <Section position="7" start_page="151" end_page="151" type="concl"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> In this paper wc presented a general method for comparing tile results of two similarity extraction techniques via gold standards. 'Fhis method can be used when no application-specific evaluation technique exists and provides a relative measurement of techniques against human-generated standard semantic resources. We showed how these gold standards could be processed to produce a tool for measuring overlap between their contents and the results of a semantic extraction method. We applied these gold standard evaluations to two different semantic extraction techniques passed over the same 4 megabyte corpus. The syntactic-based technique produced greater overlap with the gold standards derived from thesauri for the characteristic vocabulary of the corpus, while the window-based technique provided relatively better results for rare words.</Paragraph> <Paragraph position="1"> This dichotomous result suggests that no one statistical technique is adapted to all ranges of frequencies of words from a corpus. Everyday experience suggests that frequently occurring events can be more finely analyzed than rarer ones. In the domain of corpus linguistics, the same reasoning can be applied. For frequent words, finer grained context such as that provided by even rough syntactic analysis, is rich enough to judge similarity.</Paragraph> <Paragraph position="2"> For less frequent words, reaping more though less exact information such as that given by windows of N words provides more information about each word. For rare words, the context may have to be extended beyond a window, to the paragraph, or section, or entire document level, as Crouch (1990) did for rarely appearing words.</Paragraph> <Paragraph position="3"> Acknowledgements. This research was performed under the auspices of the Laboratory for Computational Linguistics (Carnegie Mellon University) directed by Professor David A. Evans.</Paragraph> </Section> class="xml-element"></Paper>