File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1041_intro.xml
Size: 2,144 bytes
Last Modified: 2025-10-06 14:06:58
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1041"> <Title>Automatic Identification of Non-compositional Phrases</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Non-compositional expressions present a special challenge to NLP applications. In machine translation, word-for-word translation of non-compositional expressions can result in very misleading (sometimes laughable) translations. In information retrieval, expansion of words in a non-compositional expression can lead to dramatic decrease in precision without any gain in recall. Less obviously, non-compositional expressions need to be treated differently than other phrases in many statistical or corpus-based NLP methods. For example, an underlying assumption in some word sense disambiguation systems, e.g., (Dagan and Itai, 1994; Li et al., 1995; Lin, 1997), is that if two words occurred in the same context, they are probably similar. Suppose we want to determine the intended meaning of &quot;product&quot; in &quot;hot product&quot;. We can find other words that are also modified by &quot;hot&quot; (e.g., &quot;hot car&quot;) and then choose the meaning of &quot;product&quot; that is most similar to meanings of these words. However, this method fails when non-compositional expressions are involved. For instance, using the same algorithm to determine the meaning of &quot;line&quot; in &quot;hot line&quot;, the words &quot;product&quot;, &quot;merchandise&quot;, &quot;car&quot;, etc., would lead the algorithm to choose the &quot;line of product&quot; sense of &quot;line&quot;. We present a method for automatic identification of non-compositional expressions using their statistical properties in a text corpus. The intuitive idea behind the method is that the metaphorical usage of a non-compositional expression causes it to have a different distributional characteristic than expressions that are similar to its literal meaning.</Paragraph> </Section> class="xml-element"></Paper>