File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/e93-1040_intro.xml

Size: 4,498 bytes

Last Modified: 2025-10-06 14:05:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="E93-1040">
  <Title>Parsing the Wall Street Journal with the Inside-Outside Algorithm</Title>
  <Section position="2" start_page="0" end_page="341" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Most broad coverage natural language parsers have been designed by incorporating hand-crafted rules.</Paragraph>
    <Paragraph position="1"> These rules are also very often further refined by statistical training. Furthermore, it is widely believed that high performance can only be achieved by disambiguating lexically sensitive phenomena such as prepositional attachment ambiguity, coordination or subcategorizadon. null So far, grammar inference has not been shown to be effective for designing wide coverage parsers.</Paragraph>
    <Paragraph position="2"> Baker (1979) describes a training algorithm for stochastic context-free grammars (SCFG) which can be used for grammar reestimation (Fujisaki et al. 1989, Sharrnan et al. 1990, Black et al. 1992, Briscoe and Waegner 1992) or grammar inference from scratch (Lari and Young 1990). However, the application of SCFGs and the original inside-outside algorithm for grammar inference has been inconclusive for two reasons. First, each iteration of the algorithm on a gr,-unmar with n nonterminals requires O(n31wl 3) time per t~ning sentence w. Second, the inferred grammar imposes bracketings which do not agree with linguistic judgments of sentence structure. null Pereira and Schabes (1992) extended the inside-outside algorithm for inferring the parameters of a stochastic context-free grammar to take advantage of constituent bracketing information in the training text.</Paragraph>
    <Paragraph position="3"> Although they report encouraging experiments (90% bracketing accuracy) on h'mguage transcriptions in the Texas Instrument subset of the Air Travel Information System (ATIS), the small size of the corpus (770 bracketed sentences containing a total of 7812 words), its linguistic simplicity, and the computation time required to vain the grammar were reasons to believe that these results may not scale up to a larger and more diverse corpus. null We report grammar inference experiments with this algorithm from the parsed Wall Street Journal corpus.</Paragraph>
    <Paragraph position="4">  The experiments prove the feasibility and effectiveness of the inside-outside algorithm on a htrge corpus.</Paragraph>
    <Paragraph position="5"> Such experiments are made possible by assumi'ng a right br~mching structure whenever the parsed corpus leaves portions of the parsed tree unspecified. This pre-processing of the corpus makes it fully bracketed. By taking adv~mtage of this fact in the implementation of the inside-outside ~dgorithm, its complexity becomes line~tr with respect to the input length (as noted by Pereira and Schabes, 1992) ,and therefore tractable for large corpora. We report experiments using several kinds of initial gr~unmars ~md a variety of subsets of the corpus as training data. When the entire Wall Street Journal corpus was used as training material, the time required for training has been further reduced by using a par~dlel implementation of the inside-outside ~dgorithm.</Paragraph>
    <Paragraph position="6"> The inferred grammar is evaluated by measuring the percentage of compatible brackets of the bracketing imposed by the inferred grammar with the partial bracketing of held out sentences. Surprisingly high bracketing accuracy is achieved with only 1042 sentences as train* ing materi,'d: 94.4% for test sentences shorter th,-m 10 words ~md 90.2% for sentences shorter than 15 words.</Paragraph>
    <Paragraph position="7"> Furthermore, the bracketing accuracy does not drop drastic~dly as longer sentences ,are considered. These results ,are surprising since the training uses part-of-speech tags as the only source of lexical information. This raises questions about the statistical distribution of sentence structures observed in naturally occurring text.</Paragraph>
    <Paragraph position="8"> After having described the training material used, we report experiments using several subsets of the available training material ,and evaluate the effect of the training size on the bracketing perform,'mce. Then, we describe a method for reducing the number of parameters in the inferred gr~unmars. Finally, we suggest a stochastic model for inferring labels on the produced binary br~mching trees.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML