File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/95/w95-0102_concl.xml

Size: 3,212 bytes

Last Modified: 2025-10-06 13:57:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="W95-0102">
  <Title>Lexical Heads, Phrase Structure and the Induction of Grammar</Title>
  <Section position="8" start_page="23" end_page="24" type="concl">
    <SectionTitle>
5. CONCLUSIONS
</SectionTitle>
    <Paragraph position="0"> We have argued that there is little reason to believe SCFGs of the sort commonly used for grammar induction will ever converge to linguistically plausible grammars, and we have suggested a modification (namely, incorporating mutual information between phrase heads) that should help fix the problem. We have also argued that the standard context-free grammar estimation procedure, the inside-outside algorithm, is essentially incapable of finding an optimal grammar without bracketing help.</Paragraph>
    <Paragraph position="1"> We now suggest that a representation that explicitly represents relations between phrase heads, such as link grammars (Sleator and Temperley, 1991), is far more amenable to language acquisition problems. Let us look one final time at the sequence V P N. There are only three words here, and therefore three heads. Assuming a head-driven bigram model as before, there are only three possibile anlayses of this sequence, which we write by listing the pairs of words that enter into predictive relationships:  To map back into traditional phrase structure grammars, linking two heads X-Y is the same as specifying that there is some phrase XP headed by X which is a sibling to some phrase YP headed by Y. Of course, using this representation all of the optimal phrase structure grammars (A,C,F and H) are identical. Thus we have a representation which has factored out many details of phrase structure that are unimportant as far as minimizing entropy is concerned.</Paragraph>
    <Paragraph position="2"> Simplifying the search space reaps additional benefits. A greedy approach to grammar acquisition that iteratively hypothesizes relations between the words with highest mutual information will first link V to P, then P to N, producing exactly the desired result for this example. And the distance in parse or grammar space between competing proposals is at most one relation (switching V-P to V-N, for instance), whereas three different rule probabilities may need to be changed in the SCFG representation. This suggests that learning algorithms based on this representation are far less likely to encounter local maximums. Finally, since what would have been multiple parse hypotheses are now one, a Viterbi learning scheme is more likely to estimate accurate counts. This is important, given the computational complexity of estimating long-distance word-pair probabilities from unbracketed corpora.</Paragraph>
    <Paragraph position="3"> We have implemented a statistical parser and training mechanism based on the above notions, but results are too preliminary to include here. Stochastic link-grammar based models have been discussed (Lafferty et al., 1992) but the only test results we have seen (Della-Pietra et ai., 1994) assume a very restricted subset of the model and do not explore the &amp;quot;phrase structures&amp;quot; that result from training on English text.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML