File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/n01-1003_metho.xml

Size: 20,402 bytes

Last Modified: 2025-10-06 14:07:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="N01-1003">
  <Title>SPoT: A Trainable Sentence Planner</Title>
  <Section position="4" start_page="0" end_page="21" type="metho">
    <SectionTitle>
3 The Sentence Plan Generator
</SectionTitle>
    <Paragraph position="0"> The research presented here is primarily concerned with creating a trainable SPR. A strength of our approach is the ability to use a very simple SPG, as we explain below. The basis of our SPG is a set of clause-combining operations that incrementally transform a list of elementary predicate-argument representations (the DSyntSs corresponding to elementary speech acts, in our case) into a single lexico-structural representation, by combining these representations using the following combining operations. Examples can be found in Figure 4.</Paragraph>
    <Paragraph position="1"> a0 MERGE. Two identical main matrix verbs can be identified if they have the same arguments; the adjuncts are combined.</Paragraph>
    <Paragraph position="2"> a0 MERGE-GENERAL. Same as MERGE, except that one of the two verbs may be embedded.</Paragraph>
    <Paragraph position="3"> a0 SOFT-MERGE. Same as MERGE, except that the verbs need only to be in a relation of synonymy or hyperonymy (rather than being identical).</Paragraph>
    <Paragraph position="4">  a0 SOFT-MERGE-GENERAL. Same as MERGE-GENERAL, except that the verbs need only to be in a relation of synonymy or hyperonymy.</Paragraph>
    <Paragraph position="5"> a0 CONJUNCTION. This is standard conjunction with conjunction reduction.</Paragraph>
    <Paragraph position="6"> a0 RELATIVE-CLAUSE. This includes participial adjuncts to nouns.</Paragraph>
    <Paragraph position="7"> a0 ADJECTIVE. This transforms a predicative use of an adjective into an adnominal construction.</Paragraph>
    <Paragraph position="8"> a0 PERIOD. Joins two complete clauses with a period.  These operations are not domain-specific and are similar to those of previous aggregation components (Rambow and Korelsky, 1992; Shaw, 1998; Danlos, 2000), although the various MERGE operations are, to our knowledge, novel in this form.</Paragraph>
    <Paragraph position="9"> The result of applying the operations is a sentence plan tree (or sp-tree for short), which is a binary tree with leaves labeled by all the elementary speech acts  given in Section 3.</Paragraph>
    <Paragraph position="10"> from the input text plan, and with its interior nodes labeled with clause-combining operations3. Each node is also associated with a DSyntS: the leaves (which correspond to elementary speech acts from the input text plan) are linked to a canonical DSyntS for that speech act (by lookup in a hand-crafted dictionary). The interior nodes are associated with DSyntSs by executing their clausecombing operation on their two daughter nodes. (A PERIOD node results in a DSyntS headed by a period and whose daughters are the two daughter DSyntSs.) If a clause combination fails, the sp-tree is discarded (for example, if we try to create a relative clause of a structure which already contains a period). As a result, the DSyntS for the entire turn is associated with the root node. This DSyntS can be sent to RealPro, which returns a sentence (or several sentences, if the DSyntS contains period nodes). The SPG is designed in such a way that if a DSyntS is associated with the root node, it is a valid structure which can be realized.</Paragraph>
    <Paragraph position="11">  8 are in Figures 5, 6 and 7. For example, consider the sp-tree in Figure 7. Node soft-merge-general merges an implicit-confirmations of the destination city and the origin city. The row labelled SOFT-MERGE in Figure 4 shows the result of applying the soft-merge operation when Args 1 and 2 are implicit confirmations of the origin and destination cities. Figure 8 illustrates the relationship between the sp-tree and the DSyntS for alternative 8. The labels and arrows show the DSyntSs associated with each node in the sp-tree (in Figure 7), and the diagram also shows how structures are composed into larger structures by the clause combining operations. number: sg mood: question  The complexity of most sentence planners arises from the attempt to encode constraints on the application of, and ordering of, the operations, in order to generate a single high quality sentence plan. In our approach, we do not need to encode such constraints. Rather, we generate a random sample of possible sentence plans for each text plan, up to a pre-specified maximum number of sentence plans, by randomly selecting among the operations according to some probability distribution.4</Paragraph>
  </Section>
  <Section position="5" start_page="21" end_page="21" type="metho">
    <SectionTitle>
4 The Sentence-Plan-Ranker
</SectionTitle>
    <Paragraph position="0"> The sentence-plan-ranker SPR takes as input a set of sentence plans generated by the SPG and ranks them. In order to train the SPR we applied the machine learning program RankBoost (Freund et al., 1998), to learn from a labelled set of sentence-plan training examples a set of rules for scoring sentence plans.</Paragraph>
    <Section position="1" start_page="21" end_page="21" type="sub_section">
      <SectionTitle>
4.1 RankBoost
</SectionTitle>
      <Paragraph position="0"> RankBoost is a member of a family of boosting algorithms (Schapire, 1999). Freund et al. (1998) describe the boosting algorithms for ranking in detail: for completeness, we give a brief description in this section.</Paragraph>
      <Paragraph position="1"> Each example a1 is represented by a set of a2 indicator functions a3a5a4a7a6 a1a9a8 for a10a12a11a14a13a15a11a16a2 . The indicator functions are calculated by thresholding the feature values (counts) described in section 4.2. For example, one such indicator function might be</Paragraph>
      <Paragraph position="3"> preferences for operations such as SOFT-MERGE and SOFT-MERGE-GENERAL over CONJUNCTION and PERIOD. This allows us to bias the SPG to generate plans that are more likely to be high quality, while generating a relatively smaller sample of sentence plans.</Paragraph>
      <Paragraph position="4"> So a3 a17a34a19a35a19 a6 a1a9a8a36a24a37a10 if the number of pronouns in a1 is a29a38a30 . A single parameter a39 a4 is associated with each indicator function, and the &amp;quot;ranking score&amp;quot; for an example a1 is then calculated asa40</Paragraph>
      <Paragraph position="6"> This score is used to rank competing sp-trees of the same text plan in order of plausibility. The training examples are used to set the parameter values a39 a4 . In (Freund et al., 1998) the human judgments are converted into a training set of ordered pairs of examples a1a44a43a46a45 , where a1 and a45 are candidates for the same sentence, and a1 is strictly preferred to a45 . More formally, the training set a47 is  a32 such pairs: in practice, fewer pairs could be contributed due to different candidates getting tied scores from the annotators.</Paragraph>
      <Paragraph position="7"> Freund et al. (1998) then describe training as a process of setting the parameters a39 a4 to minimize the following  a45a5a8a46a8 where a1 is preferred to a45 will be pushed to be positive, so that the number of ranking errors (cases where ranking scores disagree with human judgments) will tend to be reduced. Initially all parameter values are set to zero. The optimization method then greedily picks a single parameter at a time - the parameter which will make most impact on the loss function - and updates the parameter value to minimize the loss.</Paragraph>
      <Paragraph position="8"> The result is that substantial progress is typically made in minimizing the error rate, with relatively few non-zero parameter values. Freund et al. (1998) show that under certain conditions the combination of minimizing the loss function while using relatively few parameters leads to good generalization on test data examples. Empirical results for boosting have shown that in practice the method is highly effective.</Paragraph>
    </Section>
    <Section position="2" start_page="21" end_page="21" type="sub_section">
      <SectionTitle>
4.2 Examples and Feedback
</SectionTitle>
      <Paragraph position="0"> To apply RankBoost, we require a set of example sptrees, each of which have been rated, and encoded in terms of a set of features (see below). We started with a corpus of 100 text plans generated in context in 25 dialogs by the dialog system. We then ran the SPG, parameterized to generate at most 20 distinct sp-trees for each text plan. Since not all text plans have 20 valid sp-trees (while some have many more), this resulted in a corpus of 1868 sentence plans. These 1868 sp-trees, realized by RealPro, were then rated by two expert judges in the context of the transcribed original dialogs (and therefore also with respect to their adequacy given the communicative goals for that turn), on a scale from 1 to 5. The ratings given by the judges were then averaged to provide a rating between 1 and 5 for each sentence plan alternative.</Paragraph>
      <Paragraph position="1"> The ratings assigned to the sentence plans were roughly normally distributed, with a mean of 2.86 and a median of 3. Each sp-tree provided an example input to Rank-Boost, and each corresponding rating was the feedback for that example.</Paragraph>
    </Section>
    <Section position="3" start_page="21" end_page="21" type="sub_section">
      <SectionTitle>
4.3 Features Used by RankBoost
</SectionTitle>
      <Paragraph position="0"> Rankboost, like other machine learning programs of the boosting family, can handle a very large number of features. Therefore, instead of carefully choosing a small number of features by hand which may be useful, we generated a very large number of features and let RankBoost choose the relevant ones. In total, we used 3,291 features in training the SPR. Features were discovered from the actual sentence plan trees that the SPG generated through the feature derivation process described below, in a manner similar to that used by Collins (2000).</Paragraph>
      <Paragraph position="1"> The motivation for the features was to capture declaratively decisions made by the randomized SPG. We avoided features specific to particular text plans by discarding those that occurred fewer than 10 times.</Paragraph>
      <Paragraph position="2"> Features are derived from two sources: the sp-trees and the DSyntSs associated with the root nodes of sptrees. The feature names are prefixed with &amp;quot;sp-&amp;quot; or &amp;quot;dsynt-&amp;quot; depending on the source. There are two types of features: local and global. Local features record structural configurations local to a particular node, i.e., that can be described with respect to a single node (such as its ancestors, its daughters, etc.). The value of the feature is the number of times this configuration is found in the sp-tree or DSyntS. Each type of local feature also has a corresponding parameterized or lexicalized version, which is more specific to aspects of the particular dialog in which the text plan was generated.5 Global features record properties of the entire tree. Features and examples are discussed below.</Paragraph>
      <Paragraph position="3"> Traversal features: For each node in the tree, features are generated that record the preorder traversal of the subtree rooted at that node, for all subtrees of all depths (up to the maximum depth). Feature names are constructed with the prefix &amp;quot;traversal-&amp;quot;, followed by the concatenated names of the nodes (starting with the current node) on the traversal path. As an example, consider the sp-tree in Figure 5. Feature SP-TRAVERSAL-SOFT-MERGE*IMPLICIT-CONFIRM*IMPLICIT-CONFIRM has value 1, since it counts the number of subtrees in the sp-tree in which a soft-merge rule dominates two implicit-confirm nodes. In the DSyntS tree for alternative 8 (Figure 8), feature DSYNT-TRAVERSAL-PRONOUN, which counts the number of nodes in the DSyntS tree labelled PRONOUN (explicit or empty), has value 4.</Paragraph>
      <Paragraph position="4"> Sister features: These features record all consecutive sister nodes. Names are constructed with the prefix &amp;quot;sisters-&amp;quot;, followed by the concatenated names of the sister nodes. As an example, consider the sp-tree shown in Figure 7, and the DSyntS tree shown in Figure 8. Feature DSYNT-SISTERS-PRONOUN-ON1 counts the number of times the lexical items PRONOUN and ON1 are sisters in the DSyntS tree; its value is 1 in Figure 8. Another example is feature SP-SISTERS-IMPLICIT-CONFIRM*IMPLICIT-CONFIRM, which describes the configuration of all implicit confirms in the sp-trees in; its value is 2 for all three sp-trees in Figures 5, 6 and 7.</Paragraph>
      <Paragraph position="5"> Ancestor features: For each node in the tree, these features record all the initial subpaths of the path from that node to the root. Feature names are constructed with the prefix &amp;quot;ancestor-&amp;quot;, followed by the concatenated names of the nodes (starting with the current node).</Paragraph>
      <Paragraph position="6"> For example, the feature SP-ANCESTOR*IMPLICITCONFIRM-ORIG-CITY*SOFT-MERGE-GENERAL*SOFT- null MERGE-GENERAL counts the number of times that two soft-merge-general nodes dominate an implicit confirm of the origin city; its value is 1 in the sp-trees of Figures 5 and 6, but 0 in the sp-tree of Figure 7.</Paragraph>
      <Paragraph position="7"> Leaf features: These features record all initial substrings of the frontier of the sp-tree (recall that its frontier consists of elementary speech acts). Names are prefixed with &amp;quot;leaf-&amp;quot;, and are then followed by the concatenated names of the frontier nodes (starting with the current node). The value is always 0 or 1. For example, the sp-trees of Figure 5, 6 and 7 have value 1 for features LEAF-IMPLICIT-CONFIRM</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="21" end_page="21" type="metho">
    <SectionTitle>
AND LEAF-IMPLICIT-CONFIRM*IMPLICIT-CONFIRM,
</SectionTitle>
    <Paragraph position="0"> representing the first two sequences of speech acts on the leaves of the tree. Figure 5 sp-tree has value 1 for features LEAF-IMPLICIT-CONFIRM*IMPLICIT-CONFIRM*REQUEST, and LEAF-IMPLICIT5Lexicalized features are useful in learning lexically specific restrictions on aggregation (for example, for verbs such as kiss).</Paragraph>
    <Paragraph position="1">  CONFIRM. Each of these has a corresponding parameterized feature, e.g. for LEAF-IMPLICIT-CONFIRM, there is a corresponding parameterized feature of LEAF-IMPLICIT-CONFIRM-ORIG-CITY.</Paragraph>
    <Paragraph position="2"> Global Features: The global sp-tree features record, for each sp-tree and for each operation labeling a non-frontier node (i.e., rule such as CONJUNCTION or MERGE-GENERAL), (1) the minimal number of leaves (elementary speech acts) dominated by a node labeled with that rule in that tree (MIN); (2) the maximal number of leaves dominated by a node labeled with that rule (MAX); and (3) the average number of leaves dominated by a node labeled with that rule (AVG). For example, the sp-tree for alternative 8 in Figure 7 has value 2 for SOFT-MERGE-GENERAL-MAX -MIN, and -AVG, but a PERIOD-MAX of 5, PERIOD-MIN of 2 and PERIOD-AVG of 3.5.</Paragraph>
  </Section>
  <Section position="7" start_page="21" end_page="21" type="metho">
    <SectionTitle>
5 Experimental Results
</SectionTitle>
    <Paragraph position="0"> To train and test the SPR we partitioned the corpus into 5 disjoint folds and performed 5-fold cross-validation, in which at each fold, 80% of the examples were used for training an SPR and the other unseen 20% was used for testing. This method ensures that every example occurs once in the test set. We evaluate the performance of the the trained SPR on the test sets of text plans by comparing for each text plan: a89 BEST: The score of the top human-ranked sentence plan(s); a89 SPOT: The score of SPoT's selected sentence plan; a89 RANDOM: The score of a sentence plan randomly selected from the alternate sentence plans.</Paragraph>
    <Paragraph position="1"> Figure 10 shows the distributions of scores for the highest ranked sp-tree for each of the 100 text plans, according to the human experts, according to SPoT, and according to random choice. The human rankings provide a topline for SPoT (since SPoT is choosing among options ranked by the humans, it cannot possibly do better), while the random scores provide a baseline. The BEST distribution shows that 97% of text plans had at least one sentence plan ranked 4 or better. The RANDOM distribution approximates the distribution of rankings for all sentence plans for all examples.</Paragraph>
    <Paragraph position="2"> Because each text plan is used in some fold of 5-fold cross validation as a test element, we assess the significance of the ranking differences with a paired t-test of SPOT to BEST and SPOT to RANDOM.</Paragraph>
    <Paragraph position="3"> A paired t-test of SPOT to BEST shows that there are significant differences in performance (a90a65a24a92a91  a32a22a32a99a98 ). Perfect performance would have meant that there would be no significant difference. However, the mean of BEST is 4.82 as compared with the mean of SPOT of 4.56, for a mean difference of 0.26 on a scale of 1 to 5. This is only a 5% difference in performance. Figure 5 also shows that the main differences are in the lower half of the distribution of rankings; both distributions have a median of 5.</Paragraph>
    <Paragraph position="4"> A paired t-test of SPOT to RANDOM shows that there are also significant differences in performance (a90a82a24</Paragraph>
  </Section>
  <Section position="8" start_page="21" end_page="21" type="metho">
    <SectionTitle>
RANDOM
</SectionTitle>
    <Paragraph position="0"> bution is 2.50 as compared to SPoT's median of 5.0. The mean of RANDOM is 2.76, as compared to the mean of SPOT of 4.56, for a mean difference of 1.8 on a scale of 1 to 5. The performance difference in this case is 36%, showing a large difference in the performance of SPoT and RANDOM.</Paragraph>
    <Paragraph position="1"> We then examined the rules that SPoT learned in training and the resulting RankBoost scores. Figure 2 shows, for each alternative sentence plan, the BEST rating used as feedback to RankBoost and the score that RankBoost gave that example when it was in the test set in a fold. Recall that RankBoost focuses on learning relative scores, not absolute values, so the scores are normalized to range between 0 and 1.</Paragraph>
    <Paragraph position="2"> Figure 9 shows some of the rules that were learned on the training data, that were then applied to the alternative sentence plans in each test set of each fold in order to rank them. We include only a subset of the rules that had the largest impact on the score of each sp-tree. We discuss some particular rule examples here to help the reader understand how SPoT's SPR works, but leave it to the reader to examine the thresholds and feature values in the remainder of the rules and sum the increments and decrements.</Paragraph>
    <Paragraph position="3"> Rule (1) in Figure 9 states that an implicit confirmation as the first leaf of the sp-tree leads to a large (.94) increase in the score. Thus all three of our alternative sp-trees accrue this ranking increase. Rules (2) and (5) state that the occurrence of 2 or more PRONOUN nodes in the DSyntS reduces the ranking by 0.85, and that 3 or more PRONOUN nodes reduces the ranking by an additional 0.34. Alternative 8 is above the threshold for both of these rules; alternative 5 is above the threshold for Rule  (2) and alternative 0 is always below the thresholds. Rule (6) on the other hand increases only the scores of alter- null natives 0 and 5 by 0.33 since alternative 8 is below the threshold for that feature.</Paragraph>
    <Paragraph position="4"> Note also that the quality of the rules in general seems to be high. Although we provided multiple instantiations of features, some of which included parameters or lexical items that might identify particular discourse contexts, most of the learned rules utilize general properties of the sp-tree and the DSyntS. This is probably partly due to the fact that we eliminated features that appeared fewer than 10 times in the training data, but also partly due to the fact that boosting algorithms in general appear to be resistant to overfitting the data (Freund et al., 1998).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML