File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1041_intro.xml

Size: 2,144 bytes

Last Modified: 2025-10-06 14:06:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1041">
  <Title>Automatic Identification of Non-compositional Phrases</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Non-compositional expressions present a special challenge to NLP applications. In machine translation, word-for-word translation of non-compositional expressions can result in very misleading (sometimes laughable) translations. In information retrieval, expansion of words in a non-compositional expression can lead to dramatic decrease in precision without any gain in recall. Less obviously, non-compositional expressions need to be treated differently than other phrases in many statistical or corpus-based NLP methods. For example, an underlying assumption in some word sense disambiguation systems, e.g., (Dagan and Itai, 1994; Li et al., 1995; Lin, 1997), is that if two words occurred in the same context, they are probably similar. Suppose we want to determine the intended meaning of &amp;quot;product&amp;quot; in &amp;quot;hot product&amp;quot;. We can find other words that are also modified by &amp;quot;hot&amp;quot; (e.g., &amp;quot;hot car&amp;quot;) and then choose the meaning of &amp;quot;product&amp;quot; that is most similar to meanings of these words. However, this method fails when non-compositional expressions are involved. For instance, using the same algorithm to determine the meaning of &amp;quot;line&amp;quot; in &amp;quot;hot line&amp;quot;, the words &amp;quot;product&amp;quot;, &amp;quot;merchandise&amp;quot;, &amp;quot;car&amp;quot;, etc., would lead the algorithm to choose the &amp;quot;line of product&amp;quot; sense of &amp;quot;line&amp;quot;. We present a method for automatic identification of non-compositional expressions using their statistical properties in a text corpus. The intuitive idea behind the method is that the metaphorical usage of a non-compositional expression causes it to have a different distributional characteristic than expressions that are similar to its literal meaning.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML