File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/h91-1046_intro.xml

Size: 3,128 bytes

Last Modified: 2025-10-06 14:05:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="H91-1046">
  <Title>A A Trellis-Based Algorithm For Estimating The Parameters Of Hidden Stochastic Context-Free Grammar</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> The algorithm described in this paper is concerned with using hidden Markov methods for estimation of the parameters of a stochastic context-free grammar from free text.</Paragraph>
    <Paragraph position="1"> The Forward/Backward (F/B) algorithm (Baum, 1972) is capable of estimating the parameters of a hidden Markov model (i.e. a hidden stochastic regular grammar) and has been used with success to train text taggers (Jehnek, 1985).</Paragraph>
    <Paragraph position="2"> In the tagging apphcation the observed symbols are words and their underlying lexical categories are the hidden states of the model.</Paragraph>
    <Paragraph position="3"> A context-free grammar comprises both lexical (terminai) categories and grammatical (nonterminai) categories. One iterative method of estimation in this case involves parsing each sentence in the training corpus and for each derivation, accumulating counts of the number of times each rule is used. This method has been used by Fujisald et ai.</Paragraph>
    <Paragraph position="4"> (1989), and Chitrao &amp; Grishman (1990). A more efficient method is the Inside/Outside algorithm, devised by Baker (1979) for grammars that are expressed in Chomsky normal form. The algorithm described in this paper relaxes the requirement for a grammar to be expressed in a normal form, and it is based on a trellis representation that is closely related to the F/B algorithm, and which reduces to it for finite-state networks.</Paragraph>
    <Paragraph position="5"> The development of the algorithm has various motivations. Grammars must provide a large coverage to accommodate the diversity of expression present in large collections of unrestricted text. As a result they become more ambiguous. A stochastic grammar provides the capability to resolve ambiguity on a probabilistic basis, providing a practical approach to the problem. It also provides a way of modeling conditional dependence for incomplete grammars, or in the absence of any specific structural information. The latter is exemplified by the approach taken in many current taggers, which have a uniform model of second-order dependency between word categories. Kupiec (1989) has experimented with the inclusion of networks to model mixed-order dependencies.</Paragraph>
    <Paragraph position="6"> The use of hidden Markov methods is motivated by the flexibility they afford. Text corpora from any domain can be used for training, and there are no restrictions on a grammar due to conventions used during labehng. The methods also lend themselves to multi-hngual application.</Paragraph>
    <Paragraph position="7"> The representation used by the algorithm can be related to constituent structures used in other parsers such as chart parsers, providing a means of embedding this technique in them.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML