File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/c92-1060_intro.xml

Size: 2,411 bytes

Last Modified: 2025-10-06 14:05:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-1060">
  <Title>An Algorithm for Estimating the Parameters of Unrestricted I-Iidden Stochastic Context-Free Grammars</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> This paper describes an iterative method for estimating the parameters of a hidden stochastic context-free grammar (SCFG). The &amp;quot;hidden&amp;quot; aspect arises from the fact that ~ome information is not available when the grammar is trained. When a parsed corpus is used for training, production probabilities can be estimated by counting the number of times each production is used in the parsed corpus. In the case of a hidden SCFG, the characteristic grammar is defined but the parse trees associated with the training corpus are not available. To proceed in this circumstance, some initim prohabilitie~ are assigned which are iteratively re-estimated from their current values, and the training corpus. They are adjusted to (locally) maximize the likelihood of generating the training corpus. The EM algorithm (Dempster, 1977) embodies the approach just mentioned; the new algorithm can be viewed as its application to arbitrary SCFG's. The use of unparsed training corpora is desirable because changes in the grammar rules could conceivably require manually reparsing the training corpus several times during grammar development. Stochastic grammarsenable ambigu- ity resolution to performed on the rational basis of niost likely interpretation. They also acconnnodate the development of more robust grammars having high coverage where the attendant ambiguity is generally higher.</Paragraph>
    <Paragraph position="1"> Previous approaches to the problem of estimating hidden SCFG's include parsing schemes ill which MI derivations of all sentences in the training corpus are enumerated (Fujisaki et al., 1989; Chitrao &amp; Grishman, 1990)). An efficient alternative is the Inside/Outside (I/O) algorithm (Baker, 1979) which like the new algorithm, is limited to cubic complexity in both the number of nonterminais and length of a ~entence. The I/O algorithm requires that tile grammar be in Chonmky normal form (CNF). Tile new algorithm hal the same complexity, but does not have this restriction, dispensing with the need to transform to and from GNF.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML