File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/p94-1052_intro.xml

Size: 3,625 bytes

Last Modified: 2025-10-06 14:05:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="P94-1052">
  <Title>CONCEPTUAL ASSOCIATION FOR COMPOUND NOUN ANALYSIS</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Compound Nouns: Compound nouns (CNs) are a commonly occurring construction in language consisting of a sequence of nouns, acting as a noun; pottery coffee mug, for example. For a detailed linguistic theory of compound noun syntax and semantics, see Levi (1978). Compound nouns are analysed syntactically by means of the rule N --C/ N N applied recursively. Compounds of more than two nouns are ambiguous in syntactic structure. A necessary part of producing an interpretation of a CN is an analysis of the attachments within the compound. Syntactic parsers cannot choose an appropriate analysis, because attachments are not syntactically governed. The current work presents a system for automatically deriving a syntactic analysis of arbitrary CNs in English using corpus statistics.</Paragraph>
    <Paragraph position="1"> Task description: The initial task can be formulated as choosing the most probable binary bracketing for a given noun sequence, known to form a compound noun, without knowledge of the context.</Paragraph>
    <Paragraph position="2"> E.G.: (pottery (coffee mug)); ((coffee mug) holder) Corpus Statistics: The need for wide ranging lexical-semantic knowledge to support NLP, commonly referred to as the ACQUISITION PROBLEM, has generated a great deal of research investigating automatic means of acquiring such knowledge. Much work has employed carefully constructed parsing systems to extract knowledge from machine readable dictionaries (e.g., Vanderwende, 1993). Other approaches have used rather simpler, statistical analyses of large corpora, as is done in this work.</Paragraph>
    <Paragraph position="3"> Hindle and Rooth (1993) used a rough parser to extract lexical preferences for prepositional phrase (PP) attachment. The system counted occurrences of unambiguously attached PPs and used these to define LEXICAL ASSOCIATION between prepositions and the nouns and verbs they modified. This association data was then used to choose an appropriate attachment for ambiguous cases. The counting of unambiguous cases in order to make inferences about ambiguous ones is adopted in the current work. An explicit assumption is made that lexical preferences are relatively independent of the presence of syntactic ambiguity.</Paragraph>
    <Paragraph position="4"> Subsequently, Hindle and Rooth's work has been extended by Resnik and Hearst (1993). Resnik and Hearst attempted to include information about typical prepositional objects in their association data. They introduced the notion of CONCEPTUAL ASSOCIATION in which associations are measured between groups of words considered to represent concepts, in contrast to single words. Such class-based approaches are used because they allow each observation to be generalized thus reducing the amount of data required. In the current work, a freely available version of Roget's thesaurus is used to provide the grouping of words into concepts, which then form the basis of conceptual association. The research presented here can thus be seen as investigating the application of several key ideas in Hindle and Rooth (1993) and in Resnik and Hearst (1993) to the solution of an analogous problem, that of compound noun analysis. However, both these works were aimed solely at syntactic disambiguation. The goal of semantic interpretation remains to be investigated.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML