File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/p01-1046_intro.xml
Size: 2,836 bytes
Last Modified: 2025-10-06 14:01:13
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1046"> <Title>Evaluating Smoothing Algorithms against Plausibility Judgements</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Certain combinations of adjectives and nouns are perceived as more plausible than others. A classical example is strong tea, which is highly plausible, as opposed to powerful tea, which is not. On the other hand, powerful car is highly plausible, whereas strong car is less plausible. It has been argued in the theoretical literature that the plausibility of an adjective-noun pair is largely a collocational (i.e., idiosyncratic) property, in contrast to verb-object or noun-noun plausibility, which is more predictable (Cruse, 1986; Smadja, 1991).</Paragraph> <Paragraph position="1"> The collocational hypothesis has recently been investigated in a corpus study by Lapata et al. (1999). This study investigated potential statistical predictors of adjective-noun plausibility by using correlation analysis to compare judgements elicited from human subjects with five corpus-derived measures: co-occurrence frequency of the adjective-noun pair, noun frequency, conditional probability of the noun given the adjective, the log-likelihood ratio, and Resnik's (1993) selectional association measure.</Paragraph> <Paragraph position="2"> All predictors but one were positively correlated with plausibility; the highest correlation was obtained with co-occurrence frequency. Resnik's selectional association measure surprisingly yielded a significant negative correlation with judged plausibility. These results suggest that the best predictor of whether an adjective-noun combination is plausible or not is simply how often the adjective and the noun collocate in a record of language experience.</Paragraph> <Paragraph position="3"> As a predictor of plausibility, co-occurrence frequency has the obvious limitation that it cannot be applied to adjective-noun pairs that never occur in the corpus. A zero co-occurrence count might be due to insufficient evidence or might reflect the fact that the adjective-noun pair is inherently implausible. In the present paper, we address this problem by using smoothing techniques (distance-weighted averaging and class-based smoothing) to recreate missing co-occurrence counts, which we then compare to plausibility judgements elicited from human subjects. By demonstrating a correlation between recreated frequencies and plausibility judgements, we show that these smoothing methods produce realistic frequency estimates for missing co-occurrence data. This approach allows us to establish the validity of smoothing methods independent from a specific natural language processing task.</Paragraph> </Section> class="xml-element"></Paper>