XML Viewer - p01-1038

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/p01-1038_metho.xml
Size: 23,832 bytes
Last Modified: 2025-10-06 14:07:37
<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1038">
  <Title>Generation of VP Ellipsis: A Corpus-Based Approach</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 The Corpus
</SectionTitle>
    <Paragraph position="0"> All our examples are taken from the Wall Street Journal corpus of the Penn Treebank (PTB). We collected both negative and positive examples from Sections 5 and 6 of the PTB. The negative examples were collected using a mixture of manual and automatic techniques. First, candidate examples were identified automatically if there were two occurrences of the same verb, separated by fewer than 10 intervening verbs. Then, the collected examples were manually examined to determine whether the two verb phrases had identical meanings or not.3 If not, the examples were eliminated. This yielded 111 negative examples.</Paragraph>
    <Paragraph position="1"> The positive examples were taken from the corpus collected in previous work (Hardt, 1997).</Paragraph>
    <Paragraph position="2"> This is a corpus of several hundred examples of VPE from the Treebank, based on their syntactic analysis. VPE is not annotated uniformly in the PTB. We found several different bracketing patterns and searched for these patterns, but one cannot be certain that no other bracketing patterns were used in the PTB. This yielded 15 positive examples in Sections 5 and 6. The negative and positive examples from Sections 5 and 6 - 126 in total - form our basic corpus, which we will refer to as SECTIONS5+6.</Paragraph>
    <Paragraph position="3"> While not pathologically peripheral, VPE is a  known to permit various complications, such as &amp;quot;sloppy identity&amp;quot; and &amp;quot;vehicle change&amp;quot; (see (Fiengo and May, 1994) and references therein).</Paragraph>
    <Paragraph position="4"> fairly rare phenomenon, and 15 positive examples is a fairly small number. We created a second corpus by extending SECTIONS5+6 with positive examples from other sections of the PTB so that the number of positive examples equals that of the negative examples. Specifically, we included all positive examples from Section 8 through 13. The result is a corpus with 111 negative examples those from SECTIONS5+6 - and 121 positive examples (including the 15 positive examples from SECTIONS5+6). We call this corpus BALANCED; clearly BALANCED does not reflect the distribution of VPE in naturally occurring text, as does SECTIONS5+6; we therefore use it only in examining factors affecting VPE in Section 4, and we do not use it in algorithm evaluation in Section 5.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Factors Examined
</SectionTitle>
    <Paragraph position="0"> Each example was coded for several features, each of which has figured implicitly or explicitly in the research on VPE. The following surface-oriented features were added automatically.</Paragraph>
    <Paragraph position="1"> a0 Sentential Distance (sed): Measures distance between possible antecedent and candidate, in sentences. A value of 0 means that the VPs are in the same sentence.</Paragraph>
    <Paragraph position="2"> a0 Word Distance (vpd): Measures distance between possible antecedent and candidate, in words.</Paragraph>
    <Paragraph position="4"> of the antecedent VP, in words.</Paragraph>
    <Paragraph position="5"> All subsequent features were coded by hand by two of the authors. The following morphological features were used: a0 Auxiliaries (in1 and in2): Two features, for antecedent and candidate VP. The value is the list of full forms of the auxiliaries (and verbal particle to) on the antecedent and candidate verbs. This information can be annotated reliably (a1a3a2a5a4a7a6a9a8 a10a12a11a14a13a16a15 and a1a17a2a5a4a19a18a20a8  a22 statistic to estimate reliability of annotation. We assume that values a22a24a23a26a25a27 show reliability, and values a28a29a25a30a32a31a34a33a35a22a36a33a37a28a29a25a27 show sufficient reliability for drawing conclusions, given that the other variable we are comparing these variables to (VPE) is coded 100% correctly.</Paragraph>
    <Paragraph position="6"> The following syntactic features were coded: a0 Voice (vox): Grammatical voice (active/passive) of antecedent and candidate.</Paragraph>
    <Paragraph position="7"> This information can be annotated reliably (a1a38a8 a10a12a11a14a13a16a39 ).</Paragraph>
    <Paragraph position="8"> a0 Syntactic Structure (syn): This feature describes the syntactic relation between the head verbs of the two VPs, i.e., conjunction (which includes &amp;quot;conjunction&amp;quot; by juxtaposition of root sentences), subordination, comparative constructions, and as-appositive (for example, the index maintains a level below 50%, as it has for the past couple of months). This information can be annotated reasonably reliably (a1a40a8 a10a12a11a42a41a43a39 ). a0 Subcategorization frame for each verb.</Paragraph>
    <Paragraph position="9"> Standard distinctions between intransitive and transitive verbs with special categories for other subcategorization frames (total of six possible values). These two features can be annotated highly reliably (a1a38a8 a10a12a11a14a21a16a39 ). We now turn to semantic and discourse features. null a0 Adjuncts (adj): that the arguments have the same meaning is a precondition for VPE, and it is also a precondition for us to include a negative example in the corpus. Therefore, semantic similarity of arguments need not be coded. However, we do need to code for the semantic similarity of adjuncts, as they may differ in the case of VPE: in (3) above, the second (elided) VP has the additional adverb historically. We distinguish the following cases: adjuncts being identical in meaning, similar in meaning (of the same semantic category, such as temporal adjuncts), only the antecedent or candidate VP having an adjunct, the adjuncts being different, there being no adjuncts at all. This information can be annotated reliably at a satisfactory level (a1a38a8 a10a12a11a14a44a16a13 ).</Paragraph>
    <Paragraph position="10"> a0 In-Quotes (qut): Is the antecedent and/or the candidate within a quoted passage, and if yes, is it semantically the same quote. This information can be annotated highly reliably (a1a38a8a46a45 ).</Paragraph>
    <Paragraph position="11"> a0 Discourse Structure (dst): Are the discourse segments containing the antecedent and candidate directly related in the discourse structure? Possible values are Y and N. Here, &amp;quot;directly related&amp;quot; means that the two VPs are in the same segment, the segments are directly related to each other, or the segments are both directly related to the same third discourse segment. For this feature, inter-annotator agreement could not be achieved to a satisfactory degree (a1a38a8 a10a12a11a14a47a7a48 ), but the feature was not identified as useful during machine learning anyway. In future research, we hope to use independently coded discourse structure in order to investigate its interaction with ellipsis decisions. a0 Polarity (pol): Does the antecedent or candidate sentence contain the negation marker not or one of its contractions. This information can be annotated highly reliably (a1a49a8 a45 ).</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Analysis of Data
</SectionTitle>
    <Paragraph position="0"> In this section, we analyze the data to find which factors correlate with the presence of absence of VPE. We use the ANOVA test (or a linear model in the case of continuous-valued independent variables) and report the probability of the a50 value. We follow general practice in assuming that a value of a51a34a52 a11a53a10a54a47 means that there is significant correlation.</Paragraph>
    <Paragraph position="1"> We present results for both of our corpora: the SECTIONS5+6 corpus consisting only of examples from Sections 5 and 6 of the Penn Tree Bank, and the BALANCED corpus, containing a balanced number of negative and positive examples.</Paragraph>
    <Paragraph position="2"> Recall that BALANCED is derived from SECTIONS5+6 by adding positive examples, but no negative examples. Therefore, when summarizing the data, we report three figures: for the negative cases (No VPE), all from SECTIONS5+6; for the positive cases in SECTIONS5+6 (SEC VPE); and for the positive cases in BALANCED (BAL VPE).</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Numerical Features
</SectionTitle>
      <Paragraph position="0"> The two distance measures (based on words and based on sentences) both are significantly correlated with the presence of VPE while the length of the antecedent VP is not. The results are summarized in Figure 1.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Morphological Features
</SectionTitle>
      <Paragraph position="0"> For the two auxiliaries features, we do not get significant correlation for the auxiliaries on the antecedent VP, with either corpus. The situation does not change if we distinguish only two classes, namely the presence or absence of auxil-</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
iaries
4.3 Syntactic Features
</SectionTitle>
      <Paragraph position="0"> When VPE occurs, the voice of the two VPs is the same, an effect that is significant only in BAL-</Paragraph>
      <Paragraph position="2"> number of data points. The counts are shown in  The syntactic structure also correlates with VPE, with the different forms of subordination favoring VPE, and the absence of a direct relation disfavoring VPE (a51a57a52 a11a53a10a16a10a16a10a16a10 a45 for both SECTIONS5+6 and BALANCED). The frequency distributions are shown in Figure 2.</Paragraph>
      <Paragraph position="3"> Features related to argument structure are not significantly correlated with VPE. However, whether the two argument structures are identical is a factor approaching significance: in the two cases where they differ, no VPE happens (a51a38a8 a11a53a10a54a47 a45 ). More data may make this result more robust.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.4 Semantic and Discourse Features
</SectionTitle>
      <Paragraph position="0"> If the adjuncts of the antecedent and candidate VPs (matched pairwise) are the same, then VPE is more likely to happen. If only one VP or the other has adjuncts, or if the VPs have different adjuncts, VPE is unlikely to happen. The correlation is significant for both corpora (a51a57a52 a11a53a10a16a10a16a10a16a10 a45 ). The distribution is shown in Figure 2.</Paragraph>
      <Paragraph position="1"> Feature In-Quotes correlates significantly with VPE in both corpora (a51a58a8 a11a53a10a16a10a59a41 for SEC and a51a60a8 a11a53a10a16a10a16a10a54a13 for BAL). We see that VPE does not often occur across quotes, and that it occurs unusually frequently within quotes, suggesting that it is more common in spoken language than in written language (or, at any rate, in the WSJ).</Paragraph>
      <Paragraph position="2"> The binary discourse structure feature correlates significantly with VPE (a51a26a8 a11a53a10a16a10a54a39a16a21 for SECTIONS5+6 and a51a61a52 a11a53a10a16a10a16a10a16a10 a45 for BAL), with presence of a close relation correlating with VPE.</Paragraph>
      <Paragraph position="3"> Since inter-annotator agreement was not achieved at a satisfactory level, the value of this feature remains to be confirmed.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Algorithms for VPE
</SectionTitle>
    <Paragraph position="0"> The previous section has presented a corpus-based static analysis of factors affecting VPE. In this section, we take a computational approach.</Paragraph>
    <Paragraph position="1"> We would like to use a trainable module that learns rules to decide whether or not to perform VPE. Trainable components have the advantage of easily being ported to new domains. For this reason we use the machine learning system Ripper (Cohen, 1996). However, before we can use Ripper, we must discuss the issue of how our new trainable VPE module fits into the architecture of generation.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 VPE in the Generation Architecture
</SectionTitle>
      <Paragraph position="0"> Tasks in the generation process have been divided into three stages (Rambow and Korelsky, 1992): the text planner has access only to information about communicative goals, the discourse context, and semantics, and generates a non-linguistic representation of text structure and content. The sentence planner chooses abstract linguistic resources (meaning-bearing lexemes, syntactic constructions) and determines sentence boundaries. It passes an abstract lexico-syntactic specification5 to the Realizer, which inflects, adds function words, and linearizes, thus producing the surface string. The question arises where in this architecture the decision about VPE should be made. We will investigate this question in this section by distinguishing three places for making the VPE decision: in or just after the text planner; in or just after the sentence planner; and in or just after the realizer (i..e, at the end of the whole generation process if there are no modules after realization, such as prosody). We will refer to these three architecture options as TP, SP, and Real.</Paragraph>
      <Paragraph position="1"> From the point of view of this study, the three options are distinguished by the subset of the fea5The interface between sentence planner and realizer differs among approaches and can be more or less semantic; we will assume that it is an abstract syntactic interface, with structures marked for grammatical function, but which does not represent word order.</Paragraph>
      <Paragraph position="2">  tures as identified in Section 3 that the algorithm has access to: TP only has access to discourse and semantic features; SP can also use syntactic features, but not morphological features or those that relate to surface ordering. Real can access all features. We summarize the relation between architecture option and features in Figure 3.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Using a Machine Learning Algorithm
</SectionTitle>
      <Paragraph position="0"> We use Ripper to automatically learn rule sets from the data. Ripper is a rule learning program, which unlike some other machine learning programs supports bag-valued features.6 Using a set of attributes, Ripper greedily learns rule sets that choose one of several classes for each data set.</Paragraph>
      <Paragraph position="1"> We use two classes, vpe and novpe. By using different parameter settings for Ripper, we obtain different rule sets. These parameter settings are of two types: first, parameters internal to Ripper, such as the number of optimization passes; and second, the specification of which attributes are used. To determine the optimal number of optimization passes, we randomly divided our SECTIONS5+6 corpus into a training and test part, with the test corpus representing 20% of the data.</Paragraph>
      <Paragraph position="2"> We then ran Ripper with different settings for the optimization pass parameter. We determined that best results are obtained with six passes. We then used this setting in all subsequent work with Ripper. The test/training partition used to determine this setting was not used for any other purpose.</Paragraph>
      <Paragraph position="3"> In the next subsection (Section 5.3), we present and discuss several rule sets, as they bring out different properties of ellipsis. We discuss rule sets trained on and evaluated against the entire set of data from SECTIONS5+6: since our data set is relatively small, we decided not to divide it into distinct training and test sets (except for determining the internal parameter; see above). The fact that these rule sets are obtained by a machine learning algorithm is in some sense incidental here, and while we give the coverage figures for the training corpus, we consider them of mainly qualitative interest. We present three rule sets, one each for each of three architecture options, each one with its own set of attributes.</Paragraph>
      <Paragraph position="4"> We start out with a full set of attributes, and suc6Our only bag-valued set of features is the set of auxiliaries, which is not used in the rules we present here. cessively eliminate the more surface-oriented and syntactic ones. As we will see, the earlier the VPE decision is made, the less reliable it is.</Paragraph>
      <Paragraph position="5"> In the subsection after next (Section 5.4), we present results using ten-fold cross-validation, for which the quantitative results are meaningful.</Paragraph>
      <Paragraph position="6"> However, since each run produces ten different rule sets, the qualitative results, in some sense, are not meaningful. We therefore do not give any rule sets; the cross-validation demonstrates that effective rule sets can be learned even from relatively small data sets.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.3 Algorithms for VP Ellipsis Generation
</SectionTitle>
      <Paragraph position="0"> We will present three different rule sets for the three architecture options. All rule sets must be used in conjunction with a basic screening algorithm, which is the same one that we used in order to identify negative examples: there must be two identical verbs with at most ten intervening verbs, and the arguments of the verbs must have the same meaning. Then the following rule sets can be applied to determine whether a VPE should be generated or not.</Paragraph>
      <Paragraph position="1"> We start out with the Real set of features, which is available after realization has completed, and thus all surface-oriented and morphological features are available. Of course, we also assume that all other features are still available at that time, not just the surface features. We obtain the following rule set: Choose VPE if sed&lt;=0 and syn=com (6/0).</Paragraph>
      <Paragraph position="2"> Choose VPE if vpd&lt;=14, sed&lt;=0, and anl&gt;=3 (7/1).</Paragraph>
      <Paragraph position="3"> Otherwise default to no VPE (110/2).</Paragraph>
      <Paragraph position="4"> Each rule (except the first) only applies if the preceding ones do not. The first rule says that if the distance in sentences between the antecedent VP and candidate VP (sed) is less than or equal to 0, i.e., the candidate and the antecedent are in the same sentence, and the syntactic construction is a comparative, then choose VPE. This rule accounts for 6 cases correctly and misclassified none. The second rule says that if the distance in words between antecedent VP and candidate VP is less than or equal to 14, and the VPs are in the same sentence, and the antecedent VP contains 3 or more words, then the candidate VP is elided. This rule accounts for 7 cases correctly but misclassified one. Finally, all other cases are  not treated as VPE, which misses 2 examples but classifies 110 correctly. This yields an overall training error rate of 2.4% (3 misclassified examples). (Recall that we are here comparing the performance against the training set.) We now consider the examples from the introduction, which are repeated here for convenience.  (4) In 1980, 18% of federal prosecutions concluded at trial; in 1987, only 9% did.</Paragraph>
      <Paragraph position="5"> (5) Ernst &amp; Young said Eastern's plan would  miss projections by $100 million. Goldman said Eastern would miss the same mark by at least $120 million.</Paragraph>
      <Paragraph position="6"> (6) In particular Mr Coxon says businesses are paying out a smaller percentage of their profits and cash flow in the form of dividends than they have VPE historically.</Paragraph>
      <Paragraph position="7"> Consider example (4). The first rule does not apply (this is not a comparative), but the second does, since both VPs are in the same sentence, and the antecedent has three words, and the distance between them is fewer than 14 words. Thus (4) would be generated as a VPE. The first rule does apply to example (6), so it would also be generated as a VPE. Example (5), however, is not caught by either of the first two rules, so it would not yield a VPE. We thus replicate the data in the corpus for these three examples.</Paragraph>
      <Paragraph position="8"> We now turn to SP. We assume that we are making the VPE decision before realization, and therefore have access only to syntactic and semantic features, but not to surface features. As a result, distance in words is no longer available as a feature.</Paragraph>
      <Paragraph position="9"> Choose VPE if sed&lt;=0 and anl&gt;=3 (10/3).</Paragraph>
      <Paragraph position="10"> Choose VPE if sed&lt;=0 and adj=sam (3/0).</Paragraph>
      <Paragraph position="11"> Otherwise default to no VPE (108/2).</Paragraph>
      <Paragraph position="12"> Here, we first choose VPE if the antecedent and candidate are in the same sentence and the antecedent VP length is greater than three, or if the two VPs are in the same sentence and they have the same adjuncts. In all other cases, we choose not to elide. The training error rate goes up to 3.97%.</Paragraph>
      <Paragraph position="13"> With this rule set, we can correctly predict a VPE for examples (4) and (6), using the first rule. We do not generate a VPE for (5), since it does not match either of the two first rules.</Paragraph>
      <Paragraph position="14"> Finally, we consider architecture option TP, in which the VPE decision is made right after text planning, and only semantic and discourse features are available. The rule set is simplified: Choose VPE if adj=sam (6/3).</Paragraph>
      <Paragraph position="15"> Otherwise default to no VPE (108/9).</Paragraph>
      <Paragraph position="16"> VPE is only chosen if the adjuncts are the same; in all other cases, VPE is avoided. The training error rate climbs to 9.52%.</Paragraph>
      <Paragraph position="17"> For our examples, only example (4) generates a VPE since the adjuncts are the same on the two VPS7 (6) fails to meet the requirements of the first rule since the second VP has an adjunct of its own, historically.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.4 Quantitative Analysis
</SectionTitle>
      <Paragraph position="0"> In the previous subsection we presented different rule sets. We now show that rule sets can be derived in a consistent manner and tested on a held-out test set with satisfactory results. We take these results to be indicative of performance on unseen data (which is in the WSJ domain and genre, of course). We use ten-fold cross-validation for this purpose, with the same three sets of possible attributes used above.</Paragraph>
      <Paragraph position="1"> The results for the three attribute sets are shown in Figure 4 (average error rates for the tenfold 7The adjunct is elided on the second VP, of course, but present in the input representation, not shown here.</Paragraph>
      <Paragraph position="2">  different architectures: after realizer, after sentence planner, after text planner cross-validations). The baseline is obtained by never choosing VPE (which, recall, is relatively rare in the SECTIONS5+6 corpus). We see that the TP architecture does not do better than the baseline, while SP results in an error reduction of 23% and the Real architecture in an error reduction of 35%, for an average error rate of 7.5%.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML