File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/n06-1019_evalu.xml

Size: 10,758 bytes

Last Modified: 2025-10-06 13:59:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1019">
  <Title>Partial Training for a Lexicalized-Grammar Parser</Title>
  <Section position="6" start_page="147" end_page="149" type="evalu">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> The resource used for the experiments is CCGbank (Hockenmaier, 2003), which consists of normal-form CCG derivations derived from the phrase-structure trees in the Penn Treebank. It also contains predicate-argument dependencies which we use for development and final evaluation.</Paragraph>
    <Section position="1" start_page="147" end_page="147" type="sub_section">
      <SectionTitle>
4.1 Accuracy of Dependency Extraction
</SectionTitle>
      <Paragraph position="0"> Sections 2-21 of CCGbank were used to investigate the accuracy of the partial dependency structures returned by the extraction procedure. Full, correct dependency structures for the sentences in 2-21 were created by running our CCG parser (Clark and Curran, 2004b) over the gold-standard derivation for each sentence, outputting the dependencies. This resultedinfulldependencystructuresfor37,283ofthe null sentences in sections 2-21.</Paragraph>
      <Paragraph position="1"> Table 1 gives precision and recall values for the dependencies obtained from the extraction procedure, for the 37,283 sentences for which we have  umn gives the percentage of training sentences for which the partial dependency structures are completely correct. For a given sentence, the extraction procedure returns all dependencies occurring in at least k% of the derivations licenced by the gold-standard lexical category sequence. The lexical category sequences for the sentences in 2-21 can easily be read off the CCGbank derivations.</Paragraph>
      <Paragraph position="2"> The derivations licenced by a lexical category sequence were created using the CCG parser described inClarkandCurran(2004b). Theparserusesasmall number of combinatory rules to combine the categories, along with the CKY chart-parsing algorithm described in Steedman (2000). It also uses some unary type-changing rules and punctuation rules obtainedfromthederivationsinCCGbank.3 Theparser builds a packed representation, and counting the number of derivations in which a dependency occurs can be performed using a dynamic programming algorithm similar to the inside-outside algorithm.</Paragraph>
      <Paragraph position="3"> Table 1 shows that, by varying the value of k, it is possible to get the recall of the extracted dependencies as high as 85.9%, while still maintaining a precision value of over 99%.</Paragraph>
    </Section>
    <Section position="2" start_page="147" end_page="149" type="sub_section">
      <SectionTitle>
4.2 Accuracy of the Parser
</SectionTitle>
      <Paragraph position="0"> The training data for the dependency model was created by first supertagging the sentences in sections 2-21, using the supertagger described in Clark and Curran (2004b).4 The average number of categories  the absence of derivation data, the use of such rules may appear suspect. However, we argue that the type-changing and punctuation rules could be manually created for a new domain by examining the lexical category data.</Paragraph>
      <Paragraph position="1"> 4An improved version of the supertagger was used for this paper in which the forward-backward algorithm is used to calculate the lexical category probability distributions.  assigned to each word is determined by a parameter, b, in the supertagger. A category is assigned to a word if the category's probability is within b of the highest probability category for that word.</Paragraph>
      <Paragraph position="2"> For these experiments, we used a b value of 0.01, which assigns roughly 1.6 categories to each word, on average; we also ensured that the correct lexical category was in the set assigned to each word. (We did not do this when parsing the test data.) For some sentences, the packed charts can become very large. Thesupertaggingapproachweadoptfortraining differs to that used for testing: if the size of the chart exceeds some threshold, the value of b is increased, reducing ambiguity, and the sentence is supertagged and parsed again. The threshold which limits the size of the charts was set at 300000 individual entries. Two further values of b were used: 0.05 and 0.1.</Paragraph>
      <Paragraph position="3"> Packed charts were created for each sentence and stored in memory. It is essential that the packed charts for each sentence contain at least one derivation leading to the gold-standard dependency structure. Not all rule instantiations in CCGbank can be produced by our parser; hence it is not possible to produce the gold standard for every sentence in Sections 2-21. For the full-data model we used 34336 sentences (86.7% of the total). For the partial-data models we were able to use slightly more, since the partial structures are easier to produce. Here we used 35,709 sentences (k = 0.85).</Paragraph>
      <Paragraph position="4"> Since some of the packed charts are very large, we used an 18-node Beowulf cluster, together with a parallel version of the BFGS training algorithm.</Paragraph>
      <Paragraph position="5"> The training time and number of iterations to convergencewere172minutesand997iterationsforthe null full-data model, and 151 minutes and 861 iterations for the partial-data model (k = 0.85). Approximate memory usage in each case was 17.6 GB of RAM.</Paragraph>
      <Paragraph position="6"> The dependency model uses the same set of features described in Clark and Curran (2004b): dependency features representing predicate-argument dependencies (with and without distance measures); rule instantiation features encoding the combining categories together with the result category (with and without a lexical head); lexical category features, consisting of word-category pairs at the leaf nodes; and root category features, consisting of headword-category pairs at the root nodes. Further  generalised features for each feature type are formed by replacing words with their POS tags.</Paragraph>
      <Paragraph position="7"> Only features which occur more than once in the training data are included, except that the cutoff for the rule features is 10 or more and the counting is performed across all derivations licenced by the gold-standard lexical category sequences. The larger cutoff was used since the productivity of the grammarcanleadtolargenumbersofthesefeatures.</Paragraph>
      <Paragraph position="8"> The dependency model has 548590 features. In ordertoprovideafaircomparison, thesamefeatureset was used for the partial-data and full-data models.</Paragraph>
      <Paragraph position="9"> The CCG parsing consists of two phases: first the supertagger assigns the most probable categories to each word, and then the small number of combinatory rules, plus the type-changing and punctuation rules, are used with the CKY algorithm to build a packedchart.5 WeusethemethoddescribedinClark and Curran (2004b) for integrating the supertagger with the parser: initially a small number of categories is assigned to each word, and more categories are requested if the parser cannot find a spanning analysis. The &amp;quot;maximum-recall&amp;quot; algorithm described in Clark and Curran (2004b) is used to find the highest scoring dependency structure.</Paragraph>
      <Paragraph position="10"> Table2givestheaccuracyoftheparseronSection 00 of CCGbank, evaluated against the predicate-argument dependencies in CCGbank.6 The table gives labelled precision, labelled recall and F-score, and lexical category accuracy. Numbers are given for the partial-data model with various values of k, and for the full-data model, which provides an up5Gold-standard POS tags from CCGbank were used for all the experiments in this paper.</Paragraph>
      <Paragraph position="11"> 6There are some dependency types produced by our parser which are not in CCGbank; these were ignored for evaluation.  per bound for the partial-data model. We also give a lower bound which we obtain by randomly traversing a packed chart top-down, giving equal probability to each conjunctive node in an equivalence class. The precision and recall figures are over those sentences for which the parser returned an analysis (99.27% of Section 00).</Paragraph>
      <Paragraph position="12"> The best result is obtained for a k value of 0.85, which produces partial dependency data with a precision of 99.7 and a recall of 81.3. Interestingly, the results show that decreasing k further, which results in partial data with a higher recall and only a slight loss in precison, harms the accuracy of the parser.</Paragraph>
      <Paragraph position="13"> The Random result also dispels any suspicion that the partial-model is performing well simply because of the supertagger; clearly there is still much work to be done after the supertagging phase.</Paragraph>
      <Paragraph position="14"> Table 3 gives the accuracy of the parser on Section23, usingthebestperformingpartial-datamodel on Section 00. The precision and recall figures are over those sentences for which the parser returned an analysis (99.63% of Section 23). The results show that the partial-data model is only 1.3% F-score short of the upper bound.</Paragraph>
    </Section>
    <Section position="3" start_page="149" end_page="149" type="sub_section">
      <SectionTitle>
4.3 Further Experiments with Inside-Outside
</SectionTitle>
      <Paragraph position="0"> In a final experiment, we attempted to exploit the high accuracy of the partial-data model by using it to provide new training data. For each sentence in Section 2-21, we parsed the gold-standard lexical category sequences and used the best performing partial-data model to assign scores to each dependency in the packed chart. The score for a dependency was the sum of the probabilities of all derivations producing that dependency, which can be calculated using the inside-outside algorithm. (This is the score used by the maximum-recall parsing algorithm.) Partial dependency structures were then created by returning all dependencies whose score was above some threshold k, as before. Table 4 gives the accuracy of the data created by this procedure. Note how these values differ to those reported in Table 1.</Paragraph>
      <Paragraph position="1"> We then trained the dependency model on this partial data using the same method as before. However, the peformance of the parser on Section 00 using these new models was below that of the previous best performing partial-data model for all values of k. We report this negative result because we had hypothesised that using a probability model to score the dependencies, rather than simply the number of derivations in which they occur, would lead to improved performance.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML