File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/h05-1113_metho.xml

Size: 11,832 bytes

Last Modified: 2025-10-06 14:09:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="H05-1113">
  <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 899-906, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features</Title>
  <Section position="6" start_page="901" end_page="903" type="metho">
    <SectionTitle>
6 Features
</SectionTitle>
    <Paragraph position="0"> Each collocation is represented by a vector whose dimensions are the statistical features obtained from the British National Corpus. The features used in our experiments can be classified as (1) Collocation based features and (2) Context based features.</Paragraph>
    <Section position="1" start_page="901" end_page="902" type="sub_section">
      <SectionTitle>
6.1 Collocation based features
</SectionTitle>
      <Paragraph position="0"> Collocation based features consider the entire collocation as an unit and compute the statistical properties associated with it. The collocation based features that we considered in our experiments are (1) Frequency, (2) Point-wise Mutual Information, (3) Least mutual information difference with similar collocations, (4) Distributed frequency of object and  (5) Distributed frequency of object using the verb information.</Paragraph>
      <Paragraph position="1">  This feature denotes the frequency of a collocation in the British National Corpus. Cohesive expressions have a high frequency. Hence, greater the frequency, the more is the likelihood of the expression to be a MWE.</Paragraph>
      <Paragraph position="2">  Point-wise Mutual information of a collocation (Church and Hanks, 1989) is defined as,</Paragraph>
      <Paragraph position="4"> where, a11 is the verb and a22 is the object of the collocation. The higher the Mutual information of a collocation, the more is the likelihood of the expression to be a MWE.</Paragraph>
      <Paragraph position="5"> 6.1.3 Least mutual information difference with similar collocations (a38 ) This feature is based on Lin's work (Lin, 1999). He suggests that a possible way to separate compositional phrases from non-compositional ones is to check the existence and mutual information values of similar collocations (phrases obtained by replacing one of the words with a similar word). For example, 'eat apple' is a similar collocation of 'eat pear'. For a collocation, we find the similar collocations by substituting the verb and the object with their similar words2. The similar collocation having the least mutual information difference is chosen and the difference in their mutual information values is noted.</Paragraph>
      <Paragraph position="6"> If a collocation a39 has a set of similar collocations</Paragraph>
      <Paragraph position="8"> where a52a61a54a43a56 a17a19a0a62a24 returns the absolute value of a0 and a11 a42 and a22 a42 are the verb and object of the collocation a39 respectively. If similar collocations do not exist for a collocation, then this feature is assigned the highest among the values assigned in the previous equation.</Paragraph>
      <Paragraph position="9"> In this case, a38 is defined as,</Paragraph>
      <Paragraph position="11"> where a11 and a22 are the verb and object of collocations for which similar collocations do not exist. The higher the value of a38 , the more is the likelihood of the collocation to be a MWE.</Paragraph>
      <Paragraph position="12"> 2obtained from Lin's (Lin, 1998) automatically generated thesaurus (http://www.cs.ualberta.ca/a66 lindek/downloads.htm). We obtained the best results (section 8) when we substituted top-5 similar words for both the verb and the object. To measure the compositionality, semantically similar words are more suitable than synomys. Hence, we choose to use Lin's thesaurus (Lin, 1998) instead of Wordnet (Miller et al., 1990).  The distributed frequency of object is based on the idea that &amp;quot;if an object appears only with one verb (or few verbs) in a large corpus, the collocation is expected to have idiomatic nature&amp;quot; (Tapanainen et al., 1998). For example, 'sure' in 'make sure' occurs with very few verbs. Hence, 'sure' as an object is likely to give a special sense to the collocation as it cannot be used with any verb in general. It is defined as,</Paragraph>
      <Paragraph position="14"> where a48 is the number of verbs occurring with the object (a22 ), a11 a1 's are the verbs cooccuring with a22 and a15a28a17a19a11a44a1 a20a23a22a25a24a4a3a6a5 . As the number of verbs (a48 ) increases, the value of a0a6a17a57a22a14a24 decreases. Here, a5 is a threshold which can be set based on the corpus. This feature treats 'point finger' and 'polish finger' in the same way as it does not use the information specific to the verb in the collocation. Here, both the collocations will have the value a0a6a17a8a7a23a15a6a7a49a48a10a9a12a11a65a12a14a13 a13 a24 . The 3 collocations having the highest value of this feature are (1) come true, (2) become difficult and (3) make sure.</Paragraph>
    </Section>
    <Section position="2" start_page="902" end_page="902" type="sub_section">
      <SectionTitle>
6.1.5 Distributed Frequency of Object using
</SectionTitle>
      <Paragraph position="0"> the Verb information (a15 ) Here, we have introduced an extension to the feature a0 such that the collocations like 'point finger' and 'polish finger' are treated differently and more appropriately. This feature is based on the idea that &amp;quot;a collocation is likely to be idiomatic in nature if there are only few other collocations with the same object and dissimilar verbs&amp;quot;. We define this feature as,</Paragraph>
      <Paragraph position="2"> where a48a20a19a22a21a24a23 is the number of verbs occurring with a22 , a11a14a1 's are the verbs cooccuring with a22 and</Paragraph>
      <Paragraph position="4"> the verb a11 and a11a44a1 . It is calculated using the wordnet similarity measure defined by Hirst and Onge (Hirst and St-Onge, 1998). In our experiments, we considered top-50 verbs which co-occurred with the object</Paragraph>
    </Section>
    <Section position="3" start_page="902" end_page="903" type="sub_section">
      <SectionTitle>
6.2 Context based features
</SectionTitle>
      <Paragraph position="0"> Context based measures use the context of a word/collocation to measure their properties. We represented the context of a word/collocation using a LSA model. LSA is a method of representing words/collocations as points in vector space.</Paragraph>
      <Paragraph position="1"> The LSA model we built is similar to that described in (Schutze, 1998) and (Baldwin et al., 2003). First, 1000 most frequent content words (i.e., not in the stop-list) were chosen as &amp;quot;content-bearing words&amp;quot;. Using these content-bearing words as column labels, the 50,000 most frequent terms in the corpus were assigned row vectors by counting the number of times they occurred within the same sentence as content-bearing words. Principal component analysis was used to determine the principal axis and we get the transformation matrix a27 a0a29a28a30a28a30a28a32a31 a0a29a28a30a28 which can be used to reduce the dimensions of the 1000 dimensional vectors to 100 dimensions.</Paragraph>
      <Paragraph position="2"> We will now describe in Sections 6.2.1 and 6.2.2 the features defined using LSA model.</Paragraph>
      <Paragraph position="3"> 6.2.1 Dissimilarity of the collocation with its constituent verb using the LSA model (a33 ) If a collocation is highly dissimilar to its constituent verb, it implies that the usage of the verb in the specific collocation is not in a general sense. For example, the sense of 'change' in 'change hands' would be very different from its usual sense. Hence, the greater the dissimilarity between the collocation and its constituent verb, the more is the likelihood that it is a MWE. The feature is defined as</Paragraph>
      <Paragraph position="5"> where, a39 is the collocation, a11 a42 is the verb of the collocation and lsa(a0 ) is representation of a0 using the LSA model.</Paragraph>
      <Paragraph position="6">  verb-form of the object using the LSA model (a44 ) If a collocation is highly similar to the verb form of an object, it implies that the verb in the collocation does not contribute much to the meaning of the collocation. The verb either acts as a sort of  support verb, providing perhaps some additional aspectual meaning. For example, the verb 'give' in 'give a smile' acts merely as a support verb. Here, the collocation 'give a smile' means the same as the verb-form of the object i.e., 'to smile'. Hence, the greater is the similarity between the collocation and the verb-form of the object, the more is the likelihood that it is a MWE. This feature is defined as</Paragraph>
      <Paragraph position="8"> where, a39 is the collocation and a11a4a22 a42 is the verb-form of the object a22 a42 . We obtained the verb-form of the object from the wordnet (Miller et al., 1990) using its 'Derived forms'. If the object doesn't have a verbal form, the value of this feature is 0. Table 2 contains the top-6 collocations according to this feature. All the collocations in Table 2 (except 'receive award' which does not mean the same as 'to award') are good examples of MWEs.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="903" end_page="903" type="metho">
    <SectionTitle>
7 SVM based ranking function/algorithm
</SectionTitle>
    <Paragraph position="0"> The optimal rankings on the training data is computed using the average ratings of the two users.</Paragraph>
    <Paragraph position="1"> The goal of the learning function is to model itself according to this rankings. It should take a ranking function a15 from a family of ranking functions a1 that maximizes the empirical a0 (Kendall's Tau). a0 expresses the similarity between the optimal ranking (a12a3a2 ) and the ranking (a12a5a4 ) computed by the function a15 . SVM-Light4 is a tool developed by Joachims (Joachims, 2002) which provides us such a function.</Paragraph>
    <Paragraph position="2"> We briefly describe the algorithm in this section.</Paragraph>
    <Paragraph position="3"> Maximizing a0 is equivalent to minimizing the number of discordant pairs (the pairs of collocations which are not in the same order as in the optimal ranking). This is equivalent to finding the weight  vector a6a7 so that the maximum number of inequalities are fulfilled.</Paragraph>
    <Paragraph position="5"> where a39a43a1 and a39a60a64 are the collocations, a17a57a39a65a1a33a20a23a39a60a64a53a24a16a9 a12a3a2 if the collocation a39 a1 is ranked higher than a39 a64 for the optimal ranking a12 a2 , a13 a17a57a39 a1 a24 and a13 a17a57a39a60a64 a24 are the mapping onto features (section 6) that represent the properties of the V-N collocations a39a65a1 and a39a60a64 respectively and a6a7 is the weight vector representing the ranking function a15a5a17 .</Paragraph>
    <Paragraph position="6"> Adding SVM regularization for margin maximization to the objective leads to the following optimization problem (Joachims, 2002).</Paragraph>
    <Paragraph position="7">  where a23 a1 a63a64 are the (non-negative) slack variables and C is the margin that allows trading-off margin size against training error. This optimization problem is equivalent to that of a classification SVM on pairwise difference vectors a13 a17a57a39 a1 a24 - a13 a17a57a39 a64 a24 . Due to similarity, it can be solved using decomposition algorithms similar to those used for SVM classification (Joachims, 1999).</Paragraph>
    <Paragraph position="8"> Using the learnt function a15a35a34a17a37a36 ( a6  a2 is the learnt weight vector), the collocations in the test set can be ranked by computing their values using the formula below.</Paragraph>
    <Paragraph position="10"/>
  </Section>
class="xml-element"></Paper>
Download Original XML