File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/p02-1017_intro.xml

Size: 3,703 bytes

Last Modified: 2025-10-06 14:01:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="P02-1017">
  <Title>A Generative Constituent-Context Model for Improved Grammar Induction</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Previous Work
</SectionTitle>
    <Paragraph position="0"> Early work on grammar induction emphasized heuristic structure search, where the primary induction is done by incrementally adding new productions to an initially empty grammar (Olivier, 1968; Wolff, 1988). In the early 1990s, attempts were made to do grammar induction by parameter search, where the broad structure of the grammar is fixed in advance and only parameters are induced (Lari and Young, 1990; Carroll and Charniak, 1992).1 However, this appeared unpromising and most recent work has returned to using structure search. Note that both approaches are local. Structure search requires ways of deciding locally which merges will produce a coherent, globally good grammar. To the extent that such approaches work, they work because good local heuristics have been engineered (Klein and Manning, 2001a; Clark, 2001).</Paragraph>
    <Paragraph position="2"> bracketing. Distituent yields and contexts are not shown, but are modeled.</Paragraph>
    <Paragraph position="3"> Parameter search is also local; parameters which are locally optimal may be globally poor. A concrete example is the experiments from (Carroll and Charniak, 1992). They restricted the space of grammars to those isomorphic to a dependency grammar over the POS symbols in the Penn treebank, and then searched for parameters with the inside-outside algorithm (Baker, 1979) starting with 300 random production weight vectors. Each seed converged to a different locally optimal grammar, none of them nearly as good as the treebank grammar, measured either by parsing performance or data-likelihood.</Paragraph>
    <Paragraph position="4"> However, parameter search methods have a potential advantage. By aggregating over only valid, complete parses of each sentence, they naturally incorporate the constraint that constituents cannot cross - the bracketing decisions made by the grammar must be coherent. The Carroll and Charniak experiments had two primary causes for failure. First, random initialization is not always good, or necessary. The parameter space is riddled with local likelihood maxima, and starting with a very specific, but random, grammar should not be expected to work well. We duplicated their experiments, but used a uniform parameter initialization where all productions were equally likely. This allowed the interaction between the grammar and data to break the initial symmetry, and resulted in an induced grammar of higher quality than Carroll and Charniak reported.</Paragraph>
    <Paragraph position="5"> This grammar, which we refer to as DEP-PCFG will be evaluated in more detail in section 4. The second way in which their experiment was guaranteed to be somewhat unencouraging is that a delexicalized dependency grammar is a very poor model of language, even in a supervised setting. By the F1 measure used in the experiments in section 4, an induced dependency PCFG scores 48.2, compared to a score of 82.1 for a supervised PCFG read from local trees of the treebank. However, a supervised dependency PCFG scores only 53.5, not much better than the unsupervised version, and worse than a right-branching baseline (of 60.0). As an example of the inherent shortcomings of the dependency grammar, it is structurally unable to distinguish whether the subject or object should be attached to the verb first. Since both parses involve the same set of productions, both will have equal likelihood.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML