File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/p01-1046_concl.xml
Size: 3,090 bytes
Last Modified: 2025-10-06 13:53:06
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1046"> <Title>Evaluating Smoothing Algorithms against Plausibility Judgements</Title> <Section position="5" start_page="3" end_page="4" type="concl"> <SectionTitle> 4 Conclusions </SectionTitle> <Paragraph position="0"> This paper investigated the validity of smoothing techniques by using them to recreate the frequencies of adjective-noun pairs that fail to occur in a 100 million word corpus. We showed that the recreated frequencies are significantly correlated with plausibility judgements. These results were then extended by applying the same smoothing techniques to adjective-noun pairs that occur in the corpus. These recreated frequencies were significantly correlated with the actual frequencies, as well as with plausibility judgements.</Paragraph> <Paragraph position="1"> Our results provide independent evidence for the validity of the smoothing techniques we employed. In contrast to previous work, our evaluation does not presuppose that the recreated frequencies are used in a specific natural language processing task. Rather, we established an independent criterion for the validity of smoothing techniques by comparing them to plausibility judgements, which are known to correlate with co-occurrence frequency. We also carried out a comparison of different smoothing methods, and found that class-based smoothing outperforms distance-weighted averaging.</Paragraph> <Paragraph position="2"> From a practical point of view, our findings provide a very simple account of adjective-noun plausibility. Extending the results of Lapata et al. (1999), we confirmed that co-occurrence frequency can be used to estimate the plausibility of an adjective-noun pair. If no co-occurrence counts are available from the corpus, then counts can be recreated using the corpus and a structured source of taxonomic knowledge (for the class-based approach). Distance-weighted averaging can be seen as a 'cheap' way to obtain this sort of taxonomic knowledge. However, this method does not draw upon semantic information only, but is also sensitive to the syntactic distribution of the target word. This explains the fact that distance-weighted averaging yielded a lower correlation with perceived plausibility than class-based smoothing. A taxonomy like WordNet provides a cleaner source of conceptual information, which captures essential aspects of the type of knowledge needed for assessing the plausibility of an adjective-noun combination.</Paragraph> <Paragraph position="3"> Two anonymous reviewers point out that this conclusion only holds for an approach that computes similarity based on adjective-noun co-occurrences. Such co-occurrences might not reflect semantic relatedness very well, due to the idiosyncratic nature of adjective-noun combinations. It is possible that distance-weighted averaging would yield better results if applied to other co-occurrence data (e.g., subject-verb, verbobject), which could be expected to produce more reliable information about semantic similarity.</Paragraph> </Section> class="xml-element"></Paper>