XML Viewer - w04-3202

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3202_intro.xml
Size: 6,282 bytes
Last Modified: 2025-10-06 14:02:50
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3202">
  <Title>Active Learning and the Total Cost of Annotation</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Parse selection for Redwoods
</SectionTitle>
    <Paragraph position="0"> We now briefly describe the Redwoods treebanking environment (Oepen et al., 2002), our parse selection models and their performance.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 The Redwoods Treebank
</SectionTitle>
      <Paragraph position="0"> The Redwoods treebank project provides tools and annotated training material for creating parse selection models for the English Resource Grammar (ERG, Flickinger (2000)). The ERG is a hand-built broad-coverage HPSG grammar that provides an explicit grammar for the treebank. Using this approach has the advantage that analyses for within-coverage sentences convey more information than just phrase structure: they also contain derivations, semantic interpretations, and basic dependencies.</Paragraph>
      <Paragraph position="1"> For each sentence, Redwoods records all analyses licensed by the ERG and indicates which of them, if any, the annotators selected as being contextually correct. When selecting such distinguished parses, rather than simply enumerating all parses and presenting them to the annotator, annotators make use of discriminants which disambiguate the parse forest more rapidly, as described in section 3.</Paragraph>
      <Paragraph position="2"> In this paper, we report results using the third growth of Redwoods, which contains English sentences from appointment scheduling and travel planning domains of Verbmobil. In all, there are 5302 sentences for which there are at least two parses and a unique preferred parse is identified. These sentences have 9.3 words and 58.0 parses on average.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Modeling parse selection
</SectionTitle>
      <Paragraph position="0"> As is now standard for feature-based grammars, we mainly use log-linear models for parse selection (Johnson et al., 1999). For log-linear models, the conditional probability of an analysis ti given a sentence with a set of analyses t = {t...} is given as:</Paragraph>
      <Paragraph position="2"> where fj(ti) returns the number of times feature j occurs in analysis t, wj is a weight from model Mk, and Z(s) is a normalization factor for the sentence. The parse with the highest probability is taken as the preferred parse for the model. We use the limited memory variable metric algorithm to determine the weights. We do not regularize our log-linear models since labeled data -necessary to set hyperparameters- is in short supply in AL.</Paragraph>
      <Paragraph position="3"> We also make use of simpler perceptron models for parse selection, which assign scores rather than probabilities. Scores are computed by taking the inner product of the analysis' feature vector with the parameter vector:</Paragraph>
      <Paragraph position="5"> The preferred parse is that with the highest score out of all analyses. We do not use voted perceptrons here (which indeed have better performance) as for the reuse experiments described later in section 6 we really do wish to use a model that is (potentially) worse than a log-linear model.</Paragraph>
      <Paragraph position="6"> Later for AL , it will be useful to map perceptron scores into probabilities, which we do by exponentiating and renormalizing the score:</Paragraph>
      <Paragraph position="8"> Z(s) is again a normalizing constant.</Paragraph>
      <Paragraph position="9"> The previous parse selection models (equations 1 and 3) use a single model (feature set). It is possible to improve performance using an ensemble parse selection model. We create our ensemble model (called a product model) using the productof-experts formulation (Hinton, 1999):</Paragraph>
      <Paragraph position="11"> Note that each individual model Mi is a well-defined distribution usually taken from a fixed set of models. Z(s) is a constant to ensure the product distribution sums to one over the set of possible parses. A product model effectively averages the contributions made by each of the individual models. Though simple, this model is sufficient to show enhanced performance when using multiple models.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Parse selection performance
</SectionTitle>
      <Paragraph position="0"> Osborne and Baldridge (2004) describe three distinct feature sets - configurational, ngram, and conglomerate - which utilize the various structures made available in Redwoods: derivation trees, phrase structures, semantic interpretations, and elementary dependency graphs. They incorporate different aspects of the parse selection task; this is crucial for creating diverse models for use in product parse selection models as well as for ensemble-based AL methods. Here, we also use models created from a subset of the conglomerate feature set: the mrs feature set. This only has features from the semantic interpretations.</Paragraph>
      <Paragraph position="1"> The three main feature sets are used to train three log-linear models - LL-CONFIG, LL-NGRAM, and LL-CONGLOM-- and a product ensemble of those three feature sets, LL-PROD, using equation 4. Additionally, we use a perceptron with the conglomerate feature set, P-CONGLOM. Finally, we include a log-linear model that uses the mrs feature set, LL-MRS, and a perceptron, P-MRS.</Paragraph>
      <Paragraph position="2"> Parse selection accuracy is measured using exact match. A model is awarded a point if it picks some parse for a sentence and that parse is the correct analysis indicated by the corpus. To deal with ties, the accuracy is given as 1/m when a model ranks m parses highest and the best parse is one of them.</Paragraph>
      <Paragraph position="3"> The results for a chance baseline (selecting a parse at random), the base models and the product model are given in Table 1. These are 10-fold cross-validation results, using all the training data for estimation and the test split for evaluation. See section</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML