File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-1011_metho.xml

Size: 23,314 bytes

Last Modified: 2025-10-06 14:08:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1011">
  <Title>Trainable Sentence Planning for Complex Information Presentation in Spoken Dialog Systems</Title>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Chanpen Thai, which is a Thai restau-
</SectionTitle>
    <Paragraph position="0"> rant, has decent decor. It has good service. It has the best overall quality among the selected restaurants.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 .28
5 Since Chanpen Thai is a Thai restau-
</SectionTitle>
    <Paragraph position="0"> rant, with good service, and it has decent decor, it has the best overall quality among the selected restaurants.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.5 .14
</SectionTitle>
      <Paragraph position="0"/>
    </Section>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Chanpen Thai, which is a Thai restau-
</SectionTitle>
    <Paragraph position="0"> rant, with decent decor and good service, has the best overall quality among the selected restaurants.</Paragraph>
    <Paragraph position="1">  value among the selected restaurants. Above, which is a New American restaurant, with good decor, has good service. Carmine's, which is an Italian restaurant, with good service, has decent decor.</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 .73
12 Above and Carmine's o er exceptional
</SectionTitle>
    <Paragraph position="0"> value among the selected restaurants.</Paragraph>
    <Paragraph position="1"> Above has good decor, and Carmine's has decent decor. Above and Carmine's have good service. Above is a New American restaurant. On the other hand, Carmine's is an Italian restaurant. null</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.5 .50
</SectionTitle>
      <Paragraph position="0"> 13 Above and Carmine's o er exceptional value among the selected restaurants.</Paragraph>
      <Paragraph position="1"> Above is a New American restaurant.</Paragraph>
      <Paragraph position="2"> It has good decor. It has good service.</Paragraph>
      <Paragraph position="3"> Carmine's, which is an Italian restaurant, has decent decor and good service.</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 .67
20 Above and Carmine's o er exceptional
</SectionTitle>
    <Paragraph position="0"> value among the selected restaurants.</Paragraph>
    <Paragraph position="1"> Carmine's has decent decor but Above has good decor, and Carmine's and Above have good service. Carmine's is an Italian restaurant. Above, however, is a New American restaurant.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.5 .49
</SectionTitle>
      <Paragraph position="0"> 25 Above and Carmine's o er exceptional value among the selected restaurants.</Paragraph>
      <Paragraph position="1"> Above has good decor. Carmine's is an Italian restaurant. Above has good service. Carmine's has decent decor.</Paragraph>
      <Paragraph position="2"> Above is a New American restaurant.</Paragraph>
      <Paragraph position="3"> Carmine's has good service.</Paragraph>
    </Section>
  </Section>
  <Section position="10" start_page="0" end_page="0" type="metho">
    <SectionTitle>
NR NR
</SectionTitle>
    <Paragraph position="0"> realizations for the comparison in Figure 2. H = Humans' score. SPR = SPR's score. NR = Not generated or ranked The architecture of the spoken language generation module in MATCH is shown in Figure 5. The dialog manager sends a high-level communicative goal to the SPUR text planner, which selects the content to be communicated using a user model and brevity constraints (see (Walker  et al., 2002)). The output is a content plan for a recommendation or comparison such as those in Figures 1 and 2.</Paragraph>
    <Paragraph position="1"> SPaRKy, the sentence planner, gets the content plan, and then a sentence plan generator (SPG) generates one or more sentence plans (Figure 7) and a sentence plan ranker (SPR) ranks the generated plans. In order for the SPG to avoid generating sentence plans that are clearly bad, a content-structuring module rst nds one or more ways to linearly order the input content plan using principles of entity-based coherence based on rhetorical relations (Knott et al., 2001). It outputs a set of text plan trees (tp-trees), consisting of a set of speech acts to be communicated and the rhetorical relations that hold between them. For example, the two tp-trees in Figure 6 are generated for the content plan in Figure 2. Sentence plans such as alternative 25 in Figure 4 are avoided; it is clearly worse than alternatives 12, 13 and 20 since it neither combines information based on a restaurant entity (e.g Babbo) nor on an attribute (e.g. decor).</Paragraph>
    <Paragraph position="2"> The top ranked sentence plan output by the SPR is input to the RealPro surface realizer which produces a surface linguistic utterance (Lavoie and Rambow, 1997). A prosody assignment module uses the prior levels of linguistic representation to determine the appropriate prosody for the utterance, and passes a marked-up string to the text-to-speech module.</Paragraph>
  </Section>
  <Section position="11" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Sentence Plan Generation
</SectionTitle>
    <Paragraph position="0"> As in SPoT, the basis of the SPG is a set of clause-combining operations that operate on tp-trees and incrementally transform the elementary predicate-argument lexico-structural representations (called DSyntS (Melcuk, 1988)) associated with the speech-acts on the leaves of the tree. The operations are applied in a bottom-up left-to-right fashion and the resulting representation may contain one or more sentences. The application of the operations yields two parallel structures: (1) a sentence plan tree (sp-tree), a binary tree with leaves labeled by the assertions from the input tp-tree, and interior nodes labeled with clause-combining operations; and (2) one or more DSyntS trees (d-trees) which re ect the parallel operations on the predicate-argument representations.</Paragraph>
    <Paragraph position="1"> We generate a random sample of possible sentence plans for each tp-tree, up to a prespeci ed number of sentence plans, by randomly selecting among the operations according to a probability distribution that favors preferred operations1. The choice of operation is further constrained by the rhetorical relation that relates the assertions to be combined, as in other work e.g. (Scott and de Souza, 1990).</Paragraph>
    <Paragraph position="2"> In the current work, three RST rhetorical relations (Mann and Thompson, 1987) are used in the content planning phase to express the relations between assertions: the justify relation for recommendations, and the contrast and elaboration relations for comparisons. We added another relation to be used during the content-structuring phase, called infer, which holds for combinations of speech acts for which there is no rhetorical relation expressed in the content plan, as in (Marcu, 1997). By explicitly representing the discourse structure of the information presentation, we can generate information presentations with considerably more internal complexity than those generated in (Walker, Rambow and Rogati, 2002) and eliminate those that violate certain coherence principles, as described in Section 2.</Paragraph>
    <Paragraph position="3"> The clause-combining operations are general operations similar to aggregation operations used in other research (Rambow and Korelsky, 1992; Danlos, 2000). The operations and the 1Although the probability distribution here is hand-crafted based on assumed preferences for operations such as merge, relative-clause and with-reduction, it might also be possible to learn this probability distribution from the data by training in two phases.  constraints on their use are described below.</Paragraph>
    <Paragraph position="4"> merge applies to two clauses with identical matrix verbs and all but one identical arguments. The clauses are combined and the non-identical arguments coordinated. For example, merge(Above has good service;Carmine's has good service) yields Above and Carmine's have good service. merge applies only for the relations infer and contrast.</Paragraph>
    <Paragraph position="5"> with-reduction is treated as a kind of \verbless&amp;quot; participial clause formation in which the participial clause is interpreted with the subject of the unreduced clause. For example, with-reduction(Above is a New American restaurant;Above has good decor) yields Above is a New American restaurant, with good decor. with-reduction uses two syntactic constraints: (a) the subjects of the clauses must be identical, and (b) the clause that undergoes the participial formation must have a havepossession predicate. In the example above, for instance, the Above is a New American restaurant clause cannot undergo participial formation since the predicate is not one of havepossession. with-reduction applies only for the relations infer and justify.</Paragraph>
    <Paragraph position="6"> relative-clause combines two clauses with identical subjects, using the second clause to relativize the rst clause's subject. For example, relative-clause(Chanpen Thai is a Thai restaurant, with decent decor and good service;Chanpen Thai has the best overall quality among the selected restaurants) yields Chanpen Thai, which is a Thai restaurant, with decent decor and good service, has the best overall quality among the selected restaurants. relative-clause also applies only for the relations infer and justify.</Paragraph>
    <Paragraph position="7"> cue-word inserts a discourse connective (one of since, however, while, and, but, and on the other hand), between the two clauses to be combined. cue-word conjunction combines two distinct clauses into a single sentence with a coordinating or subordinating conjunction (e.g.</Paragraph>
    <Paragraph position="8"> Above has decent decor BUT Carmine's has good decor), while cue-word insertion inserts a cue word at the start of the second clause, producing two separate sentences (e.g. Carmine's is an Italian restaurant. HOWEVER, Above is a New American restaurant). The choice of cue word is dependent on the rhetorical relation holding between the clauses.</Paragraph>
    <Paragraph position="9"> Finally, period applies to two clauses to be treated as two independent sentences.</Paragraph>
    <Paragraph position="10"> Note that a tp-tree can have very di erent realizations, depending on the operations of the SPG. For example, the second tp-tree in Figure 6 yields both Alt 11 and Alt 13 in Figure 4. However, Alt 13 is more highly rated than Alt 11. The sp-tree and d-tree produced by the SPG for Alt 13 are shown in Figures 7 and 8. The composite labels on the interior nodes of the sp- null tree indicate the clause-combining relation selected to communicate the speci ed rhetorical relation. The d-tree for Alt 13 in Figure 8 shows that the SPG treats the period operation as part of the lexico-structural representation for the d-tree. After sentence planning, the d-tree is split into multiple d-trees at period nodes; these are sent to the RealPro surface realizer.</Paragraph>
    <Paragraph position="11"> Separately, the SPG also handles referring expression generation by converting proper names to pronouns when they appear in the previous utterance. The rules are applied locally, across adjacent sequences of utterances (Brennan et al., 1987). Referring expressions are manipulated in the d-trees, either intrasententially during the creation of the sp-tree, or intersententially, if the full sp-tree contains any period operations. The third and fourth sentences for Alt 13 in Figure 4 show the conversion of a named restaurant (Carmine's) to a pronoun.</Paragraph>
  </Section>
  <Section position="12" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Training the Sentence Plan
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Ranker
</SectionTitle>
      <Paragraph position="0"> The SPR takes as input a set of sp-trees generated by the SPG and ranks them. The SPR's rules for ranking sp-trees are learned from a labeled set of sentence-plan training examples using the RankBoost algorithm (Schapire, 1999).</Paragraph>
      <Paragraph position="1"> Examples and Feedback: To apply Rank-Boost, a set of human-rated sp-trees are encoded in terms of a set of features. We started with a set of 30 representative content plans for each strategy. The SPG produced as many as 20 distinct sp-trees for each content plan. The sentences, realized by RealPro from these sp-trees, were then rated by two expert judges on a scale from 1 to 5, and the ratings averaged. Each sp-tree was an example input for RankBoost, with each corresponding rating its feedback.</Paragraph>
      <Paragraph position="2"> Features used by RankBoost: RankBoost requires each example to be encoded as a set of real-valued features (binary features have values 0 and 1). A strength of RankBoost is that the set of features can be very large. We used 7024 features for training the SPR. These features count the number of occurrences of certain structural con gurations in the sp-trees and the d-trees, in order to capture declaratively decisions made by the randomized SPG, as in (Walker, Rambow and Rogati, 2002). The features were automatically generated using feature templates. For this experiment, we use two classes of feature: (1) Rule-features: These features are derived from the sp-trees and represent the ways in which merge, infer and cue-word operations are applied to the tp-trees.</Paragraph>
      <Paragraph position="3"> These feature names start with \rule&amp;quot;. (2) Sentfeatures: These features are derived from the DSyntSs, and describe the deep-syntactic structure of the utterance, including the chosen lexemes. As a result, some may be domain speci c.</Paragraph>
      <Paragraph position="4"> These feature names are pre xed with \sent&amp;quot;.</Paragraph>
      <Paragraph position="5"> We now describe the feature templates used in the discovery process. Three templates were used for both sp-tree and d-tree features; two were used only for sp-tree features. Local feature templates record structural con gurations local to a particular node (its ancestors, daughters etc.). Global feature templates, which are used only for sp-tree features, record properties of the entire sp-tree. We discard features that occur fewer than 10 times to avoid those speci c to particular text plans.</Paragraph>
      <Paragraph position="6">  and Compare3 results (N = 180) There are four types of local feature template: traversal features, sister features, ancestor features and leaf features. Local feature templates are applied to all nodes in a sp-tree or d-tree (except that the leaf feature is not used for d-trees); the value of the resulting feature is the number of occurrences of the described con guration in the tree. For each node in the tree, traversal features record the preorder traversal of the subtree rooted at that node, for all subtrees of all depths. An example is the feature \rule traversal assertcom-list exceptional&amp;quot; (with value 1) of the tree in Figure 7. Sister features record all consecutive sister nodes. An example is the feature \rule sisters PERIOD infer RELATIVE CLAUSE infer&amp;quot; (with value 1) of the tree in Figure 7. For each node in the tree, ancestor features record all the initial subpaths of the path from that node to the root. An example is the feature \rule ancestor PERIOD contrast*PERIOD infer&amp;quot; (with value 1) of the tree in Figure 7. Finally, leaf features record all initial substrings of the frontier of the sp-tree. For example, the sp-tree of Figure 7 has value 1 for the feature \leaf #assert-com-list exceptional#assert-comcuisine&amp;quot;. null Global features apply only to the sptree. They record, for each sp-tree and for each clause-combining operation labeling a nonfrontier node, (1) the minimal number of leaves dominated by a node labeled with that operation in that tree (MIN); (2) the maximal number of leaves dominated by a node labeled with that operation (MAX); and (3) the average number of leaves dominated by a node labeled with that operation (AVG).</Paragraph>
      <Paragraph position="7"> For example, the sp-tree in Figure 7 has value 3 for \PERIOD infer max&amp;quot;, value 2 for \PERIOD infer min&amp;quot; and value 2.5 for \PE-</Paragraph>
    </Section>
  </Section>
  <Section position="13" start_page="0" end_page="4" type="metho">
    <SectionTitle>
RIOD infer avg&amp;quot;.
5 Experimental Results
</SectionTitle>
    <Paragraph position="0"> We report two sets of experiments. The rst experiment tests the ability of the SPR to select a high quality sentence plan from a population of sentence plans randomly generated by the SPG.</Paragraph>
    <Paragraph position="1"> Because the discriminatory power of the SPR is best tested by the largest possible population of sentence plans, we use 2-fold cross validation for this experiment. The second experiment compares SPaRKy to template-based generation.</Paragraph>
    <Paragraph position="2"> Cross Validation Experiment: We repeatedly tested SPaRKy on the half of the corpus of 1756 sp-trees held out as test data for each fold. The evaluation metric is the human-assigned score for the variant that was rated highest by SPaRKy for each text plan for each task/user combination. We evaluated SPaRKy on the test sets by comparing three data points for each text plan: HUMAN (the score of the top-ranked sentence plan); SPARKY (the score of the SPR's selected sentence); and RANDOM (the score of a sentence plan randomly selected from the alternate sentence plans).</Paragraph>
    <Paragraph position="3"> We report results separately for comparisons between two entities and among three or more entities. These two types of comparison are generated using di erent strategies in the SPG, and can produce text that is very di erent both in terms of length and structure.</Paragraph>
    <Paragraph position="4"> Table 1 summarizes the di erence between SPaRKy, HUMAN and RANDOM for recommendations, comparisons between two entities and comparisons between three or more entities. For all three presentation types, a paired t-test comparing SPaRKy to HUMAN to RANDOM showed that SPaRKy was signi cantly better than RANDOM (df = 59, p &lt; .001) and signi cantly worse than HUMAN (df = 59, p &lt; .001). This demonstrates that the use of a trainable sentence planner can lead to sentence plans that are signi cantly better than baseline (RANDOM), with less human e ort than programming templates.</Paragraph>
    <Paragraph position="5"> Comparison with template generation: For each content plan input to SPaRKy, the judges also rated the output of a template-based generator for MATCH. This template-based generator performs text planning and sentence planning (the focus of the current paper), including some discourse cue insertion, clause combining and referring expression generation; the templates themselves are described in (Walker et al., 2002). Because the templates are highly tailored to this domain, this generator can be expected to perform well. Example template-based and SPaRKy outputs for a comparison between three or more items are shown in Figure 9.</Paragraph>
    <Paragraph position="6">  tion results. N = 180 Table 2 shows the mean HUMAN scores for the template-based sentence planning. A paired t-test comparing HUMAN and template-based scores showed that HUMAN was signi cantly better than template-based sentence planning only for compare2 (df = 29, t = 6.2, p &lt; .001). The judges evidently did not like the template for comparisons between two items. A paired t-test comparing SPaRKy and template-based sentence planning showed that template-based sentence planning was signi cantly better than SPaRKy only for recommendations (df = 29, t = 3.55, p &lt; .01). These results demonstrate that trainable sentence planning shows promise for producing output comparable to that of a template-based generator, with less programming e ort and more exibility.</Paragraph>
    <Paragraph position="7"> The standard deviation for all three template-based strategies was wider than for HUMAN or SPaRKy, indicating that there may be content-speci c aspects to the sentence planning done by SPaRKy that contribute to output variation. The data show this to be correct; SPaRKy learned content-speci c preferences about clause combining and discourse cue insertion that a template-based generator can-System Realization H Template Among the selected restaurants, the following o er exceptional overall value.</Paragraph>
    <Paragraph position="8"> Uguale's price is 33 dollars. It has good decor and very good service. It's a French, Italian restaurant. Da Andrea's price is 28 dollars. It has good decor and very good service. It's an Italian restaurant. John's Pizzeria's price is 20 dollars. It has mediocre decor and decent service.</Paragraph>
    <Paragraph position="9"> It's an Italian, Pizza restaurant.</Paragraph>
    <Paragraph position="10">  SPaRKy Da Andrea, Uguale, and John's Pizzeria o er exceptional value among the selected restaurants. Da Andrea is an Italian restaurant, with very good service, it has good decor, and its price is 28 dollars. John's Pizzeria is an Italian , Pizza restaurant. It has decent service. It has mediocre decor. Its price is 20 dollars.</Paragraph>
    <Paragraph position="11"> Uguale is a French, Italian restaurant, with very good service. It has good decor, and its price is 33 dollars.</Paragraph>
    <Paragraph position="12">  items, H = Humans' score not easily model, but that a trainable sentence planner can. For example, Table 3 shows the nine rules generated on the rst test fold which have the largest negative impact on the nal RankBoost score (above the double line) and the largest positive impact on the nal RankBoost score (below the double line), for comparisons between three or more entities. The rule with the largest positive impact shows that SPaRKy learned to prefer that justi cations involving price be merged with other information using a conjunction.</Paragraph>
    <Paragraph position="13"> These rules are also speci c to presentation type. Averaging over both folds of the experiment, the number of unique features appearing in rules is 708, of which 66 appear in the rule sets for two presentation types and 9 appear in the rule sets for all three presentation types. There are on average 214 rule features, 428 sentence features and 26 leaf features. The majority of the features are ancestor features (319) followed by traversal features (264) and sister features (60). The remainder of the features (67) are for speci c lexemes.</Paragraph>
    <Paragraph position="14"> To sum up, this experiment shows that the ability to model the interactions between domain content, task and presentation type is a strength of the trainable approach to sentence planning.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML