File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1203_intro.xml

Size: 4,976 bytes

Last Modified: 2025-10-06 14:03:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1203">
  <Title>Automatic Identification of Non-Compositional Multi-Word Expressions using Latent Semantic Analysis</Title>
  <Section position="3" start_page="0" end_page="12" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Identifying non-compositional (or idiomatic) multi-word expressions (MWEs) is an important subtask for any computational system (Sag et al., 2002), and significant attention has been paid to practical methods for solving this problem in recent years (Lin, 1999; Baldwin et al., 2003; Villada Moir'on and Tiedemann, 2006). While corpus-based techniques for identifying collocational multi-word expressions by exploiting statistical properties of the co-occurrence of the component words have become increasingly sophisticated (Evert and Krenn, 2001; Evert, 2004), it is well known that mere co-occurrence does not well distinguish compositional from non-compositional expressions (Manning and Sch&amp;quot;utze, 1999, Ch. 5).</Paragraph>
    <Paragraph position="1"> While expressions which may potentially have idiomatic meanings can be identified using various lexical association measures (Evert and Krenn, 2001; Evert and Kermes, 2003), other techniques must be used to determining whether or not a particular MWE does, in fact, have an idiomatic use. In this paper we explore the hypothesis that the local linguistic context can provide adequate cues for making this determination and propose one method for doing this.</Paragraph>
    <Paragraph position="2"> We characterize our task on analogy with word-sense disambiguation (Sch&amp;quot;utze, 1998; Ide and V'eronis, 1998). As noted by Sch&amp;quot;utze, WSD involves two related tasks: the general task of sense discrimination--determining what senses a given word has--and the more specific task of sense selection--determining for a particular use of the word in context which sense was intended. For us the discrimination task involves determining for a given expression whether it has a non-compositional interpretation in addition to its compositional interpretation, and the selection task involves determining in a given context, whether a given expression is being used compositionally or non-compostionally. The German expression ins Wasser fallen, for example, has a non-compositional interpretation on which it means 'to fail to happen' (as in (1)) and a compositional interpretation on which it means 'to fall into water (as in (2)).1  (1) Das Kind war beim Baden von einer Luftmatratze ins Wasser gefallen.</Paragraph>
    <Paragraph position="3"> 'The child had fallen into the water from an a air matress while swimming' (2) Die Er&amp;quot;ofnung des Skateparks ist ins Wasser gefallen.</Paragraph>
    <Paragraph position="4">  'The opening of the skatepark was cancelled' The discrimination task, then, is to identify ins Wasser fallen as an MWE that has an idiomatic meaning and the selection task is to determine that  in (1) it is the compositional meaning that is intended, while in (2) it is the non-compositional meaning.</Paragraph>
    <Paragraph position="5"> Following Sch&amp;quot;utze (1998) and Landauer &amp; Dumais (1997) our general assumption is that the meaning of an expression can be modelled in terms of the words that it co-occurs with: its co-occurrence signature. To determine whether a phrase has a non-compositional meaning we compute whether the co-occurrence signature of the phrase is systematically related to the co-occurrence signatures of its parts. Our hypothesis is that a systematic relationship is indicative of compositional interpretation and lack of a systematic relationship is symptomatic of noncompositionality. In other words, we expect compositional MWEs to appear in contexts more similar to those in which their component words appear than do non-compositional MWEs.</Paragraph>
    <Paragraph position="6"> In this paper we describe two experiments that test this hypothesis. In the first experiment we seek to confirm that the local context of a known idiom can reliably distinguish idiomatic uses from non-idiomatic uses. In the second experiment we attempt to determine whether the difference between the contexts in which an MWE appears and the contexts in which its component words appear can indeed serve to tell us whether the MWE has an idiomatic use.</Paragraph>
    <Paragraph position="7"> In our experiments we make use of lexical semantic analysis (LSA) as a model of contextsimilarity (Deerwester et al., 1990). Since this technique is often used to model meaning, we will speak in terms of &amp;quot;meaning&amp;quot; similiarity. It should be clear, however, that we are only using the LSA vectors--derived from context of occurrence in a corpus--to model meaning and meaning composition in a very rough way. Our hope is simply that this rough model is sufficient to the task of identifying non-compositional MWEs.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML