XML Viewer - w95-0102

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/w95-0102_intro.xml
Size: 6,082 bytes
Last Modified: 2025-10-06 14:05:57
<?xml version="1.0" standalone="yes"?>
<Paper uid="W95-0102">
  <Title>Lexical Heads, Phrase Structure and the Induction of Grammar</Title>
  <Section position="3" start_page="14" end_page="15" type="intro">
    <SectionTitle>
2. LINGUISTIC AND STATISTICAL BASIS OF PHRASE STRUCTURE
</SectionTitle>
    <Paragraph position="0"> Let us look at a particular example. In English, the word sequence &amp;quot;walking on ice&amp;quot; is generally assumed to have an internal structure similar to (A). 1  answers: on ice can move and delete as one unit, whereas walking on can not. Thus, &amp;quot;it is on ice that I walked&amp;quot; and &amp;quot;it is walking that I did on ice&amp;quot; and &amp;quot;it is ice that I walked on&amp;quot; are sentences but there in no equivalent form for relocating walking on. Similarly, &amp;quot;they walked and jumped on ice&amp;quot; is grammatical but &amp;quot;they walked on and jumped on ice&amp;quot; is awkward. Therefore, if movement and conjunction is of single constituents, phrase-structures (A-D) explain this evidence but (E-H) do not.</Paragraph>
    <Paragraph position="1"> In languages like German where case is overtly manifested in affix and determiner choice, the noun ice clearly receives case from the preposition rather than the verb. It seems to make for a simpler theory of language if case is assigned through the government relation, which holds between the preposition and noun in (A-D) but not in (E-H).</Paragraph>
    <Paragraph position="2"> The phrase walking on ice acts like a verb: it can conjoin with a verb ( &amp;quot;John walked on ice and sang&amp;quot;), and takes verbal modifiers ( &amp;quot;John walked on ice slowly&amp;quot;). So it makes little sense to call it a prepositional phrase or noun phrase, as in (C) or (D). on ice does not behave as a noun, so (A) is a better description than (B).</Paragraph>
    <Paragraph position="3"> These deductive steps leading to (A) require some assumptions about language: that constituent structure and category labels introduce specific constraints on sentence buildi~ng operations, and that the range of hypothetical grammars is small (our enumeration A-H was over grammars of binary rules where the category of a phrase is tied to the category of one of its constituents, its head).</Paragraph>
    <Paragraph position="4"> aWe will be defiberately vague about what such dominance and precedence relations represent; obviously different researchers have very different conceptions about the relevence and implications of heirarchical phrase-structure. The specific use of the representations is somewhat irrelevent to our immediate discussion, though various interpretaions will be discussed throughout the paper.</Paragraph>
    <Paragraph position="5">  Statistical phrase-structure models of language 2, such as SCFGs, are motivated by different assumptions about language, principally that a phrase grouping several words is a constraint on co-occurrence that makes it possible to better predict one of those words given another. In terms of language acquisition and parsing, if we assume that a sequence of words has been generated from a phrase-structure grammar, it suggests that we can recover internal structure by grouping sub-sequences of words with high mutual information. This is the approach taken by (Magerman and Marcus, 1990) for parsing sentences, who use mutual information rather than a grammar to reconstruct phrase-structure. The hope is that by searching for a phrase-structure or phrase-structure grammar that maximizes the likelihood of an observed sequence, we will find the generating structure or grammar itself.</Paragraph>
    <Paragraph position="6"> Unfortunately, there is anecdotal and quantitative evidence that simple techniques for estimating phrase-structure grammars by minimizing entropy do not lead to the desired grammars (grammars that agree with structure (A), for instance). (Pereira and Schabes, 1992) explore this topic, demonstrating that a stochastic context free grammar trained on part-of-speech sequences from English text can have an entropy as low or lower than another but bracket the text much more poorly (tested on hand-annotations). And (Magerman and Marcus, 1990) provide evidence that grouping sub-sequences of events with high mutual information is not always a good heuristic; they must include in their parsing algorithm a list of event sequences (such as noun-preposition) that should not be grouped together in a single phrase, in order to prevent their method from mis-bracketing. To understand why, we can look at an example from a slightly different domain. (Ofivier, 1968) seeks to acquire a lexicon from unsegmented (spaceless) character sequences by treating each word as a stochastic context free rule mapping a common nonterminal (call it W) to a sequence of letters; a sentence is a sequence of any number of words and the probabifity of a sentence is the product over each word of the probabifity of W expanding to that word. Learning a lexicon consists of finding a grammar that reduces the entropy of a training character sequence. Olivier's learning algorithm soon creates rules such as W ~ THE and W ~ TOBE. But it also hypothesizes words like edby. edby is a common English character sequence that occurs in passive constructions fike &amp;quot;She was passed by the runner&amp;quot;. Here -ed and by occur together not because they are part of a common word, but because Engfish syntax and semantics places these two morphemes sideby-side. At a syntactic level, this is exactly why the algorithm of (Magerman and Marcus, 1990) has problems: English places prepositions after nouns not because they are in the same phrase, but because prepositional phrases often adjoin to noun phrases. Any greedy algorithm (such as (Magerman and Marcus, 1990) and the context-free grammar induction method of (Stolcke, 1994)) that builds phrases by grouping events with high mutual information will consequently fail to derive linguistically-plausible phrase structure in many situations.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML