File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/h94-1051_intro.xml

Size: 2,249 bytes

Last Modified: 2025-10-06 14:05:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="H94-1051">
  <Title>AUTOMATIC GRAMMAR ACQUISITION</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Designing and refining a natural language grammar is a difficult and time-intensive task, often consuming months or even years of skilled effort. The resulting grammar is usually not completely satisfactory, failing to cover a significant fraction of the sentences in the intended domain. Conversely, the grammar is likely to overgenerate, leading to multiple interpretations for a single sentence, many of which are incorrect. With the increasing availability of large, machine-readable, parsed corpora such as the University of Pennsylvania Treebank \[Santorini, 90\], it has become worthwhile to consider automatic grammar acquisition through the application of machine learning techniques. By learning a grammar that completely covers a training set for some domain, it is hoped that coverage will also be increased for new sentences in that domain. Additionally, machine learning techniques may be useful in reducing overgeneration through a variety of techniques that have been suggested in recent literature. One suggestion is to introduce local contextual information into a grammar \[Simmons and Yu, 92\], based on the premise that local context provides useful information for selecting among competing grammar rules. A second suggestion is to introduce probabilities in the form of a probabilistic context-free grammar \[Chitaro and Gfishman, 9% based on the premise that a combination of local probability measures provides a useful estimate of the probability of an entire parse.</Paragraph>
    <Paragraph position="1"> J In this work, we Investigate both of these suggestions and compare them with a simple, automatically learned, context-free grammar. In each case, the grammar is acquired from a subset of parsed Wall Street Journal articles taken from the University of Pennsylvania Treebank. We then apply the acquired grammar to the problem of producing a single unambiguous parse for each sentence of an independent test set derived from the same SOUrce.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML