File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/05/h05-1113_relat.xml
Size: 7,458 bytes
Last Modified: 2025-10-06 14:15:43
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1113"> <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 899-906, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features</Title> <Section position="5" start_page="899" end_page="901" type="relat"> <SectionTitle> 3 Related Work </SectionTitle> <Paragraph position="0"> (Breidt, 1995) has evaluated the usefulness of the Point-wise Mutual Information measure (as suggested by (Church and Hanks, 1989)) for the extraction of V-N collocations from German text corpora. Several other measures like Log-Likelihood (Dunning, 1993), Pearson's a2a4a3 (Church et al., 1991), Z-Score (Church et al., 1991) , Cubic Association Ratio (MI3), etc., have been also proposed. These measures try to quantify the association of two words but do not talk about quantifying the non-compositionality of MWEs. Dekang Lin proposes a way to automatically identify the non-compositionality of MWEs (Lin, 1999). He suggests that a possible way to separate compositional phrases from non-compositional ones is to check the existence and mutual-information values of phrases obtained by replacing one of the words with a similar word. According to Lin, a phrase is probably non-compositional if such substitutions are not found in the collocations database or their mutual information values are significantly different from that of the phrase. Another way of determining the non-compositionality of V-N collocations is by using 'distributed frequency of object' (DFO) in V-N collocations (Tapanainen et al., 1998). The basic idea in there is that &quot;if an object appears only with one verb (or few verbs) in a large corpus we expect that it has an idiomatic nature&quot; (Tapanainen et al., 1998).</Paragraph> <Paragraph position="1"> Schone and Jurafsky (Schone and Jurafsky, 2001) applied Latent-Semantic Analysis (LSA) to the analysis of MWEs in the task of MWE discovery, by way of rescoring MWEs extracted from the corpus. An interesting way of quantifying the relative compositionality of a MWE is proposed by Baldwin, Bannard, Tanaka and Widdows (Baldwin et al., 2003).</Paragraph> <Paragraph position="2"> They use LSA to determine the similarity between an MWE and its constituent words, and claim that higher similarity indicates great decomposability. In terms of compositionality, an expression is likely to be relatively more compositional if it is decomposable. They evaluate their model on English NN compounds and verb-particles, and showed that the model correlated moderately well with the Word-net based decomposability theory (Baldwin et al., 2003).</Paragraph> <Paragraph position="3"> McCarthy, Keller and Caroll (McCarthy et al., 2003) judge compositionality according to the degree of overlap in the set of most similar words to the verb-particle and head verb. They showed that the correlation between their measures and the human ranking was better than the correlation between the statistical features and the human ranking. We have done similar experiments in this paper where we compare the correlation value of the ranks provided by the SVM based ranking function with the ranks of the individual features for the V-N collocations. We show that the ranks given by the SVM based ranking function which integrates all the features provides a significantly better correlation than the individual features.</Paragraph> <Paragraph position="4"> 4 Data used for the experiments The data used for the experiments is British National Corpus of 81 million words. The corpus is parsed using Bikel's parser (Bikel, 2004) and the Verb-Object Collocations are extracted. There are 4,775,697 V-N collocations of which 1.2 million are unique. All the V-N collocations above the frequency of 100 (n=4405) are taken to conduct the experiments so that the evaluation of the system is feasible. These 4405 V-N collocations were searched in Wordnet, American Heritage Dictionary and SAID dictionary (LDC,2003). Around 400 were found in at least one of the dictionaries. Another 400 were extracted from the rest so that the evaluation set has roughly equal number of compositional and non-compositional expressions. These 800 expressions were annotated with a rating from 1 to 6 by using guidelines independently developed by the authors. 1 denotes the expressions which are totally non-compositional while 6 denotes the expressions which are totally compositional. The brief explanation of the various ratings is as follows: (1) No word in the expression has any relation to the actual meaning of the expression. Example : &quot;leave a mark&quot;. (2) Can be replaced by a single verb. Example : &quot;take a look&quot;. (3) Although meanings of both words are involved, at least one of the words is not used in the usual sense. Example : &quot;break news&quot;. (4) Relatively more compositional than (3). Example : &quot;prove a point&quot;. (5) Relatively less compositional than (6). Example : &quot;feel safe&quot;. (6) Completely compositional. Example : &quot;drink coffee&quot;. 5 Agreement between the Judges The data was annotated by two fluent speakers of English. For 765 collocations out of 800, both the annotators gave a rating. For the rest, at least one of the annotators marked the collocations as &quot;don't know&quot;. Table 1 illustrates the details of the annotations provided by the two judges.</Paragraph> <Paragraph position="5"> tators From the table 1 we see that annotator1 distributed the rating more uniformly among all the collocations while annotator2 observed that a significant proportion of the collocations were completely compositional. To measure the agreement between the two annotators, we used the Kendall's TAU (a0 ) (Siegel and Castellan, 1988). a0 is the correlation between the rankings1 of collocations given by the two annotators. a0 ranges between 0 (little agreement) and 1 (full agreement). a0 is defined as, where a0a2a1 's are the rankings of annotator1 and a3a4a1 's are the rankings of annotator2, n is the number of collocations, a5a6a1 is the number of values in the a7a9a8 a10 group of tied a0 values and a11 a1 is the number of values in the a7 a8 a10 group of tied a3 values.</Paragraph> <Paragraph position="6"> We obtained a a0 score of 0.61 which is highly significant. This shows that the annotators were in a good agreement with each other in deciding the rating to be given to the collocations. We also compare the ranking of the two annotators using Pearson's Rank-Correlation coefficient (a12a14a13 ) (Siegel and Castellan, 1988). We obtained a a12a14a13 score of 0.71 indicating a good agreement between the annotators.</Paragraph> <Paragraph position="7"> A couple of examples where the annotators differed are (1) &quot;perform a task&quot; was rated 3 by annotator1 while it was rated 6 by annotator2 and (2) &quot;pay tribute&quot; was rated 1 by annotator1 while it was rated 4 by annotator2.</Paragraph> <Paragraph position="8"> The 765 samples annotated by both the annotators were then divided into a training set and a testing set in several possible ways to cross-validate the results of ranking (section 8).</Paragraph> </Section> class="xml-element"></Paper>