File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/99/p99-1010_relat.xml

Size: 3,588 bytes

Last Modified: 2025-10-06 14:16:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1010">
  <Title>Supervised Grammar Induction using Training Data with Limited Constituent Information *</Title>
  <Section position="3" start_page="73" end_page="74" type="relat">
    <SectionTitle>
2 Related Work on Grammar
Induction
</SectionTitle>
    <Paragraph position="0"> * Grammar induction is the process of inferring the structure of a language by learning from example sentences drawn from the language. The degree of difficulty in this task depends on three factors. First, it depends on the amount of supervision provided. Charniak (1996), for instance, has shown that a grammar can be easily constructed when the examples are fully labeled parse trees. On the other hand, if the examples consist of raw sentences with no extra structural information, grammar induction is very difficult, even theoretically impossible (Gold, 1967). One could take a greedy approach such as the well-known Inside-Outside re-estimation algorithm (Baker, 1979), which induces locally optimal grammars by iteratively improving the parameters of the grammar so that the entropy of the training data is minimized. In practice, however, when trained on unmarked data, the algorithm tends to converge on poor grammar models. For even a moderately complex domain such as the ATIS corpus, a grammar trained on data with constituent bracketing information produces much better parses than one trained on completely unmarked raw data (Pereira and Schabes, 1992). Part of our work explores the in-between case, when only some constituent labels are available. Section 3 defines the different types of annotation we examine.</Paragraph>
    <Paragraph position="1"> Second, as supervision decreases, the learning process relies more on search. The success of the induction depends on the initial parameters of the grammar because a local search strategy may converge to a local minimum. For finding a good initial parameter set, Lari and Young (1990) suggested first estimating the probabilities with a set of regular grammar rules. Their experiments, however, indicated that the main benefit from this type of pretraining is one of run-time efficiency; the improvement in the quality of the induced grammar was minimal.</Paragraph>
    <Paragraph position="2"> Briscoe and Waegner (1992) argued that one should first hand-design the grammar to encode some linguistic notions and then use the re-estimation procedure to fine-tune the parameters, substituting the cost of hand-labeled training data with that of hand-coded grammar. Our idea of grammar adaptation can be seen as a form of initialization. It attempts to seed the grammar in a favorable search space by first training it with data from an existing corpus.</Paragraph>
    <Paragraph position="3"> Section 4 discusses the induction strategies in more detail.</Paragraph>
    <Paragraph position="4"> A third factor that affects the learning process is the complexity of the data. In their study of parsing the WSJ, Schabes et al. (1993) have shown that a grammar trained on the Inside-Outside re-estimation algorithm can perform quite well on short simple sentences but falters as the sentence length increases. To take this factor into account, we perform our experiments  (with ((at most one) stop))))))) is labeled under each category. The third and fourth columns list the percentage break-down of brackets in each category for ATIS and WSJ respectively. on both a simple domain (ATIS) and a complex one (WSJ). In Section 5, we describe the experiments and report the results.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML