File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/a00-2021_intro.xml

Size: 4,150 bytes

Last Modified: 2025-10-06 14:00:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="A00-2021">
  <Title>Exploiting auxiliary distributions in stochastic unification-based grammars</Title>
  <Section position="2" start_page="0" end_page="154" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> &amp;quot;Unification-based&amp;quot; Grammars (UBGs) can capture a wide variety of linguistically important syntactic and semantic constraints. However, because these constraints can be non-local or context-sensitive, developing stochastic versions of UBGs and associated estimation procedures is not as straight-forward as it is for, e.g., PCFGs. Recent work has shown how to define probability distributions over the parses of UBGs (Abney, 1997) and efficiently estimate and use conditional probabilities for parsing (Johnson et al., 1999). Like most other practical stochastic grammar estimation procedures, this latter estimation procedure requires a parsed training corpus.</Paragraph>
    <Paragraph position="1"> Unfortunately, large parsed UBG corpora are not yet available. This restricts the kinds of models one can realistically expect to be able to estimate. For example, a model incorporating lexical selectional preferences of the kind * This research was supported by NSF awards 9720368, 9870676 and 9812169.</Paragraph>
    <Paragraph position="2"> described below might have tens or hundreds of thousands of parameters, which one could not reasonably attempt to estimate from a corpus with on the order of a thousand clauses.</Paragraph>
    <Paragraph position="3"> However, statistical models of lexical selectional preferences can be estimated from very large corpora based on simpler syntactic structures, e.g., those produced by a shallow parser. While there is undoubtedly disagreement between these simple syntactic structures and the syntactic structures produced by the UBG, one might hope that they are close enough for lexical information gathered from the simpler syntactic structures to be of use in defining a probability distribution over the UBG's structures.</Paragraph>
    <Paragraph position="4"> In the estimation procedure described here, we call the probability distribution estimated from the larger, simpler corpus an auxiliary distribution. Our treatment of auxiliary distributions is inspired by the treatment of reference distributions in Jelinek's (1997) presentation of Maximum Entropy estimation, but in our estimation procedure we simply regard the logarithm of each auxiliary distribution as another (real-valued) feature. Despite its simplicity, our approach seems to offer several advantages over the reference distribution approach. First, it is straight-forward to utilize several auxiliary distributions simultaneously: each is treated as a distinct feature. Second, each auxiliary distribution is associated with a parameter which scales its contribution to the final distribution.</Paragraph>
    <Paragraph position="5"> In applications such as ours where the auxiliary distribution may be of questionable relevance to the distribution we are trying to estimate, it seems reasonable to permit the estimation procedure to discount or even ignore the auxiliary distribution. Finally, note that neither Jelinek's nor our estimation procedures require that an auxiliary or reference distribution Q be a prob- null ability distribution; i.e., it is not necessary that Q(i2) -- 1, where f~ is the set of well-formed linguistic structures.</Paragraph>
    <Paragraph position="6"> The rest of this paper is structured as follows. Section 2 reviews how exponential models can be defined over the parses of UBGs, gives a brief description of Stochastic Lexical-Functional Grammar, and reviews why maximum pseudo-likelihood estimation is both feasible and sufficient of parsing purposes. Section 3 presents our new estimator, and shows how it is related to the minimization of the Kullback-Leibler divergence between the conditional estimated and auxiliary distributions. Section 4 describes the auxiliary distribution used in our experiments, and section 5 presents the results of those experiments.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML