File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-1604_metho.xml

Size: 29,603 bytes

Last Modified: 2025-10-06 14:09:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1604">
  <Title>Real-Time Stochastic Language Generation for Dialogue Systems</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Acorn: System Description
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Input Form
</SectionTitle>
      <Paragraph position="0"> The input to Acorn is a semantic feature-value form rooted on the type of speech act. On the top level, the :speechact feature gives the type (i.e. sa tell, sa yn-question, sa accept, etc.), the :terms feature gives the list of semantic, content bearing terms, and the :root feature gives the variable of the root term in the utterance. Other features are allowed and often required, such as a :focus for wh-questions. Each term in the :terms list is a feature-value structure based on thematic roles, as used in many other representations (e.g. Verbnet [Kipper et al., 2000]). This utterance input is a syntactically modified version of the domain independent Logical Form described in [Dzikovska et al., 2003].</Paragraph>
      <Paragraph position="1"> Each term is specified by the features: :indicator, :class, optional :lex, and any other relevant thematic roles (e.g.</Paragraph>
      <Paragraph position="2"> :agent, :theme, etc.). The :indicator indicates the type or function of the term and takes the values THE, A, F, PRO, and QUANTITY-TERM. THE represents a grounded object in the discourse, A represents an abstract object, F is a functional operator, PRO is used for references, and QUANTITY-TERM represents quantities expressed in various scales. There are other indicators, but the details are beyond the scope of this paper. The :class specifies the semantic class of the term, and  preceded by a colon, and a value is any valid symbol, variable, or list.</Paragraph>
      <Paragraph position="3"> (utt :speechact sa tell :root v8069 :terms  computer.' This input provides the lexical items for the utterance, but these are typically absent in most cases.</Paragraph>
      <Paragraph position="4"> the :lex is the root lexical item for the term. Lex is an optional feature and is created from the :class if it is not present in the input. Figure 1 gives the specification of the input, and figure 2 shows an example input to Acorn for the utterance, 'I want a 2.4 gigahertz computer'. Appendix A provides further examples of both semantic and lexical inputs.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Grammar Rules
</SectionTitle>
      <Paragraph position="0"> The grammar rules in Acorn convert the input utterance into word orderings by matching keywords (features) in each term. A unique aspect of Acorn is that the utterance level features can also be matched at any time. It is often necessary to write a rule based on the current speech act type. The left-hand side (LHS) of a rule showing both options is given here: (grule focus</Paragraph>
      <Paragraph position="2"> Each rule matches keywords in its LHS to the current term and binds the values of the keywords in the term to the variables in the LHS. In the above example, the variable ?s would be bound to the subject of the term, and the variable ?act is bound to the top-level :speechact value. A LHS element that is preceded by the :g symbol indicates a top-level (global) feature. In this example, the value sa tell is also specified as a requirement before the rule can match.</Paragraph>
      <Paragraph position="3"> When matched, the right-hand side (RHS) offers several different options of processing. As in HALogen, the recasting (changing a keyword to a new keyword, such as converting a semantic role into a syntactic one), substitution (removing a keyword and its value, or just changing its value), and ordering rules (specifying phrasal and word-level ordering in the word forest) are supported. Two additional rules are supported in Acorn that are able to handle wh-movement and other head features. The first is called empty-creation and its complement is filling. In order to effectively use these rules, a method of passing head and/or foot features is needed. The following describes trickle-down features, followed by a description of the empty-creation and filling rules.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="7" type="metho">
    <SectionTitle>
Trickle-Down Features
</SectionTitle>
    <Paragraph position="0"> A drawback of the grammar phase is that all features in the terms must be explicitly coded in the rules, otherwise they are discarded when ordering rules are applied. Using a simple example of subject-object placement, the following ordering rule places the subject in front of the verb, and the object behind.</Paragraph>
    <Paragraph position="2"> Three new branches are created in the forest, one each for (?s), (?rest), and (?o). This rule creates a branch in the word forest that is a conjunct of three non-terminal nodes:</Paragraph>
    <Paragraph position="4"> Processing of the (?s) and (?o) branches is restarted at the top of the grammar, but they do not contain any features (the ?rest variable is a catch-all variable that represents all features not matched in the LHS). Indeed, it is possible to write rules with  However, this quickly leads to bloated rules and can slow the matching procedure considerably. It is very intuitive to keep features like head and foot features hidden from the grammar writer as much as possible. This is accomplished through what we are calling trickle-down features. The syntax for these special case features includes an asterisk before the name, as in :*gap. The result of using these features is to get the effect of the latter rule with the ease of use in the former rule. It essentially trickles down the features until their appropriate place in the input utterance is found. Figure 3 shows the feature 'searching' for its correct path. One use of this is shown in the following examples of the empty-creation and filling rules.</Paragraph>
    <Paragraph position="5">  The gap head feature can be seen percolating to each node, finding its true path (1-&gt;2-&gt;6) to the wh-term what, and linking the filler with the gap (6-&gt;G4).</Paragraph>
    <Paragraph position="6"> Empty-Creation When building the word forest, we often need to create a gap node that will be filled later by movement phenomena, such as in wh-questions. The content of the node may not be known, but through empty-creation, we can instantiate a variable and link it to the current location in the word forest. This variable can then be attached to a special trickle-down feature which is implicitly passed through the grammar. The following is an example of an empty-creation rule:</Paragraph>
    <Paragraph position="8"> (-&gt; ?wh-gap (?rest :*gap ?wh-gap))) The first half of the RHS (the g-&gt; rule) creates a global variable and binds a new word forest node label to it. This label is then used in the second half of the RHS where the node is inserted into the word forest, and as of now, is empty. The variable is then passed as a trickle-down feature :*gap to the current term using the ?rest catch-all variable. This rule is applied to node 1 in figure 3, creating gap node G4 and the ?rest node 2, passing the :*gap through the forest.</Paragraph>
    <Paragraph position="9"> Filling Filling rules perform the wh-movement needed in wh-questions and many complement structures. Filling in the context of Acorn can be seen as binding a gap variable that has already been created through an empty-creation rule. The following is an example filling rule that completes the above wh-gap example.</Paragraph>
    <Paragraph position="10">  This rule checks that the current term is a wh-term that has a gap head feature. The RHS (the b-&gt; rule) binds the current term to the gap that has already been created, filling the empty node in the word forest. The Filling rule essentially grafts a branch onto a location in the word forest that has previously been created by an Empty-Creation rule. The dotted line in figure 3 is created by such a Filling rule.</Paragraph>
    <Section position="1" start_page="7" end_page="7" type="sub_section">
      <SectionTitle>
3.3 Grammar Over-Generation
</SectionTitle>
      <Paragraph position="0"> One of the main attractions of the two-phased approach is that the grammar in the first phase can be left linguistically unconstrained and over-generates many possibilities for an input utterance. However, the statistical second phase may then be over-burdened with the task of searching it. The converse problem arises when the first stage is too constrained and does not produce enough realizations to be natural and flexible, perhaps removing the need for a stochastic phase entirely. There needs to be a balance between the two stages.</Paragraph>
      <Paragraph position="1"> The processing time is also critical in that over-generation can take too much time to be useful for dialogue.</Paragraph>
      <Paragraph position="2"> The grammar used in HALogen largely relied on the over-generation first phase to ensure full coverage of the output. It also reduced the number of rules in the grammar. Subject-verb agreement was loosely enforced, particularly with sub-ject number. Also, singular and plural nouns were both generated when the input was unspecified, doubling the size of the noun phrase possibilities. One of the biggest over-generations was in morphology. HALogen has its own morphology generator that relies on over-generating algorithms rather than a lexicon to morph words. The typical word forest then contains many unknown words that are ignored during the stochastic search, but which explode the size of the word forest. Lastly, modifiers are over-generated to appear both in front of and behind the head words.</Paragraph>
      <Paragraph position="3"> Our approach removes the above over-generation and links a lexicon to the grammar for morphology. Subject-verb agreement is enforced where possible without dramatically increasing the grammar size, nouns are only made plural when the input specifies so (under the assumption that the input would contain such semantically critical information), and modifiers are placed in specific locations on certain phrases (i.e. adjectives are always premodifiers for nouns, complements of infinitive verbs are postmodifiers, etc.).</Paragraph>
      <Paragraph position="4"> These changes greatly reduce the runtime of the first phase and directly affect the runtime of the second phase by creating smaller word forests.</Paragraph>
    </Section>
    <Section position="2" start_page="7" end_page="7" type="sub_section">
      <SectionTitle>
3.4 Algorithm
Forest Creation
</SectionTitle>
      <Paragraph position="0"> Word forest creation begins with the input utterance, such as the one in figure 2. The top level utterance features are stored in a global feature list, easily accessed by the grammar rules if need be. The :root feature points to the root semantic term given in the list of :terms. This root term is then processed, beginning at the top of the grammar.</Paragraph>
      <Paragraph position="1"> The grammar is pre-processed and each rule is indexed in a hash table of features according to the least popular feature in the rule. For example, if a rule has two features, :theme and :agent, and :agent only appears in 8 rules while :theme  appears in 14, the rule will be added to the list of rules in the :agent bin. During processing of an input term, all of the term's features are extracted and the rules under each feature in the hash table are merged into an ordered subset of the full grammar. This process differs from HALogen and its successors by vastly limiting the number of rules that are checked against each input. Instead of checking 250 rules, we may only check the relevant 20 rules. After a grammar rule matches, the index is queried again with the new term(s) from the RHS of the rule. A new subset of the grammar is created and used to continue processing through the grammar.</Paragraph>
      <Paragraph position="2"> RHS expansions create (1) ordering constraints, (2) new branches, and (3) feature modifications to the current term.</Paragraph>
      <Paragraph position="3"> Options (1) and (2) are typically done with ordering rules such as the following RHS:</Paragraph>
      <Paragraph position="5"> The variables are either bound from the LHS conditions, or are unbound (conditions that follow the &amp;optional indicator in the LHS) and ignored during RHS expansion. The ?rest variable is a special case variable which refers to the current term and its features that do not appear in the LHS (by default, features in the LHS that are matched are removed from the term, unless they follow a &amp;keep indicator). In the above example, there will be a new conjunction branch with three child nodes in the word forest, as shown in figure 4.</Paragraph>
      <Paragraph position="6"> When this rule is matched, the ?s node will bind its variable that must point to one of the terms in the input utterance's :terms list. Processing will now begin with that term, attaching any features in the RHS to it (in this example, :position subject), at the top of the grammar. Once completed, processing will continue with the current term (?rest) until the grammar is exhausted. Finally, the third term (?o ...) will begin at the top of the grammar. As discussed in section 3.2, any trickle-down features in the current term are appended to the three terms when processing begins/continues on each of them.</Paragraph>
      <Paragraph position="7"> A term attempts to match each rule in the grammar until a RHS creates a leaf node. This is accomplished by a RHS expansion into an initial atom that is a string. Finally, inline functions are allowed to be used in the grammar. The following example calls the function stringify and its returned value is bound to the ?str variable. These calls are typically used to access the lexicon.</Paragraph>
      <Paragraph position="8"> (grule stringify (:lex ?lex) ;; convert lexical item to string</Paragraph>
      <Paragraph position="10"> The PathFinder module of Acorn is the second stage, responsible for determining the most likely path through the forest.</Paragraph>
      <Paragraph position="11"> In this stage, the hypotheses from the grammar are analyzed and the top word ordering is chosen based on n-gram stochastic models derived from corpora.</Paragraph>
      <Paragraph position="12"> The algorithm we implemented in PathFinder is largely the same as the one described in [Langkilde, 2000]. It is a dynamic programming algorithm that stores the top m phrases at each decision point based on the leading and trailing words in the phrase. When dealing with n-grams, we only need to keep track of the first n [?] 1 and the last n [?] 1 words in each phrase. Our approach not only tracks these features as Langkilde calls them, but PathFinder also sorts the top m phrases and prunes any duplicates. Pruning duplicates offers an advantage in runtime when the phrases are merged with neighboring phrases. The complexity analysis is still O(m[?]m) = O(m2), but in practice, pruning phrases reduces the number of phrases to some number less than m.</Paragraph>
      <Paragraph position="13"> The largest change to the algorithm is that we added dynamic interpolation of language models. PathFinder can load any number of models and interpolate them together during n-gram analysis using an input set of weights. PathFinder also has the capability to use feature-based models and word history models.</Paragraph>
      <Paragraph position="14"> Feature models, such as part of speech n-grams, model the features1 of forest leaves instead of the lexical items. The Forest Creation stage is able to output features in addition to lexical items, as seen in the RHS of this forest leaf: N6 :POS NN :BASE COMPUTER -&gt; &amp;quot;COMPUTERS&amp;quot; There are two 'features' on this leaf, pos and base. Parameters can be passed to PathFinder that command it to use the features instead of the RHS string when applying a language model to the forest. This option is not evaluated in this paper, but is a promising option for future work.</Paragraph>
      <Paragraph position="15"> Word history models keep track of the current discourse and monitor word usage, providing a history of word choice and calculating a unigram probability for each word. The PathFinder is updated on each utterance in the dialogue and applies a decaying word history approach, similar to the work in [Clarkson and Robinson, 1997]. This model is not evaluated in this paper, but is useful in portraying the breadth of coverage that a stochastic phase can provide to dialogue.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="7" end_page="7" type="metho">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> The three factors that are most important in evaluating dialogue generation is portability, coverage, and speed. Other factors include naturalness, flexibility, and many more, but the above three are evaluated in this paper to address concerns of domain independent generation and real-time dialogue. During one's efforts to address the latter concern by constraining the size of the word forest, it is very easy to lose the former.</Paragraph>
    <Section position="1" start_page="7" end_page="7" type="sub_section">
      <SectionTitle>
4.1 The Grammar
</SectionTitle>
      <Paragraph position="0"> Acorn's grammar contains 189 rules and is heavily semantic based, although the semantic features and concepts are transformed into syntactic features before word ordering is decided. It is possible to input a syntactic utterance, but this evaluation is only concerned with semantic input. The grammar was created within the context of a computer purchasing domain in which the dialogue system is a collaborative assistant that helps the user define and purchase a computer. We had a corpus of 216 utterances from developers of the system who created their own mock dialogues. The grammar was constructed mainly based on these parsed utterances. Other domains such as an underwater robotic mine search and a database query interface were used to represent as many semantic roles as possible. The list of the main semantic features in Acorn's grammar is provided in figure 5.</Paragraph>
    </Section>
    <Section position="2" start_page="7" end_page="7" type="sub_section">
      <SectionTitle>
4.2 Evaluation Methodology
</SectionTitle>
      <Paragraph position="0"> Each utterance that was able to be parsed in our target dialogues was automatically transformed into the input syntax of Acorn. These inputs were pushed through Acorn, resulting in a single, top ranked utterance. This utterance was compared to the target utterance using the Generation String Accuracy metric. This metric compares a target string to the generated string and counts the number of word movements (M), substitutions (S), deletions (D), and insertions (I) (not counting deletions and insertions implicitly included in movements).</Paragraph>
      <Paragraph position="1"> The metric is given below (L is the number of tokens in the target string):</Paragraph>
      <Paragraph position="3"> Before comparison, all contractions were split into single lexical items to prevent the metric from penalizing semantically similar phrases (e.g. aren't to are not). The Simple  String Accuracy metric was also applied to provide comparison against studies that may not use the Generation Metric; however, the Generation Metric intuitively repairs some of the former's failings, namely double penalization for word movement. More on these and other metrics can be found in [Bangalore et al., 2000].</Paragraph>
    </Section>
    <Section position="3" start_page="7" end_page="7" type="sub_section">
      <SectionTitle>
4.3 Domain Independent Evaluation
</SectionTitle>
      <Paragraph position="0"> Acorn was evaluated using the Monroe Corpus [Stent, 2000], a collection of 20 dialogues. Each dialogue is a conversation between two English speakers who were given a map of Monroe County, NY and a description of a task that needed to be solved. There were eight different disaster scenarios ranging from a bomb attack to a broken leg, and the participants were to act as emergency dispatchers. It is a significantly different domain from computer purchasing and was chosen because it offers a corpus that has been parsed by our parser and thus has readily available logical forms for input to Acorn. The length of utterances are shown in figure 6.</Paragraph>
      <Paragraph position="1"> The four dialogues that had most recently been updated to our logical form definitions were chosen for the evaluation. The remaining sixteen are used by PathFinder as a bi-gram language model of the domain's dialogue. Two series of tests were run. The first includes the lexical items as input to Acorn and the second only includes the ontology concepts.</Paragraph>
      <Paragraph position="2"> Generation String Accuracy is used to judge the output of the system against the original utterances in the Monroe dialogues. While there have been other generation metrics that have been proposed, such as the Bleu Metric [Papineni et al., 2001], the Generation String Accuracy metric still provides a measure of system improvement and a comparison against other systems. Bleu requires more than one correct output option to be of worthwhile ('quantity leads to quality'), so is not as applicable with only one target utterance.</Paragraph>
    </Section>
    <Section position="4" start_page="7" end_page="7" type="sub_section">
      <SectionTitle>
4.4 Domain Specific Evaluation
</SectionTitle>
      <Paragraph position="0"> In order to compare the domain independent evaluation with a domain specific evaluation, the same evaluation described in 4.2 was used on the computer purchasing corpus that includes the logical forms on which Acorn's grammar is based.</Paragraph>
      <Paragraph position="1"> As described in 4.1, the domain is an assistant that collaboratively purchases computers online for the user. There are 132 utterances of length three or more in this corpus. The n-gram models were automatically generated using a hand formed word grammar of sample sentences. Both Simple and Gen- null of Acorn in the Monroe domain. The two baseline metrics and the final Acorn scores are given.</Paragraph>
    </Section>
    <Section position="5" start_page="7" end_page="7" type="sub_section">
      <SectionTitle>
4.5 Baselines
</SectionTitle>
      <Paragraph position="0"> Two baselines were included in the evaluation as comparative measures. The first is named simply, baseline, and is a random ordering of the lexical inputs to Acorn. Instead of using a grammar to choose the ordering of the input lexical items, the baseline is a simple procedure which traverses the input terms, outputting each lexical item as it comes across them. When there are multiple modifiers on a term, the order of which to follow first is randomly chosen. This baseline is only run when lexical items are provided in the input.</Paragraph>
      <Paragraph position="1"> The second baseline is called Random Path and serves as a baseline before the second phase of Acorn. A random path through the resulting word forest of the first phase of Acorn is extracted and compared against the target utterance. This allows us to evaluate the usefulness of the second stochastic phase. Both these baselines are included in the following results.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="7" end_page="7" type="metho">
    <SectionTitle>
4.6 Results
</SectionTitle>
    <Paragraph position="0"> Two different tests were performed. The first included lexical choice in the input utterances and the second included only the ontology concepts. The accuracy scores for the Monroe domain are shown in figure 7. A semantic input with all lexical items specified scored an average of 0.70 (or 70%) on 325 input utterances. A purely semantic input with just the ontology classes scored 0.62 (or 62%).</Paragraph>
    <Paragraph position="1"> The results from Acorn in the Computer Purchasing Domain are shown in figure 8. Both the semantic and lexical evaluations were run, resulting in an average score of 0.85 (85%) and 0.69 (69%) respectively.</Paragraph>
    <Paragraph position="2"> In order to judge usefulness for a real-time dialogue system, the runtime for both phases of Acorn was recorded for each utterance. We also ran HALogen for comparison. Since its grammar is significantly different from Acorn's, the output from HALogen is not relevant since little time was spent in conforming its grammar to our logical form; however, the runtimes are useful for comparison. The times for both Acorn and HALogen are shown in figure 9. With a purely semantic input, Acorn took 0.16 seconds to build a forest and 0.21 seconds to rank it for a total time of 0.37 seconds. HALogen took a total time of 19.29 seconds. HALogen runs quicker when lexical choice is performed ahead of time, finishing in  Acorn and HALogen. Both the lexical item and the semantic concept input are shown.</Paragraph>
    <Paragraph position="3"> 2.73 seconds. The reason is mainly due to its over-generation of noun plurals, verb person and number, and morphology.</Paragraph>
    <Paragraph position="4"> Finally, the runtime improvement of using the grammar rule indexing algorithm was analyzed. All utterances of word length five or more with correct parses were chosen from the dialogues to create forests of sufficient size, resulting in 192 tests. Figure 10 shows the average forest building time with the indexing algorithm versus the old approach of checking each grammar rule individually. A 30% improvement was achieved.</Paragraph>
  </Section>
  <Section position="8" start_page="7" end_page="7" type="metho">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"> While it is difficult to quantify, the implementation of trickle-down features and Empty-Creation and Filling rules accommodate well the construction of a grammar that can capture head/foot features. The forest creation algorithm of HALogen and others is much too cumbersome to implement within, and representing lexical movement is impossible without it.</Paragraph>
    <Paragraph position="1"> The above result of 62% coverage in a new domain is com- null matching rules versus the rule indexing approach described in this paper. The average runtime for 192 word forests is shown.</Paragraph>
    <Paragraph position="2"> parable, and arguably better than those given in Langkilde [Langkilde-Geary, 2002]. This paper uses a semantic utterance input which is most similar to the Min spec test of Langkilde. The Min spec actually included both the lexical choice and the surface syntactic roles (such as logical-subject, instead of theme or agent), resulting in a Simple String Accuracy of 55.3%. Acorn's input is even more abstract by only including the semantic roles. Its lexical input, most similar to the Min spec, but still more abstract with thematic roles, received 70%. This comparison should only be taken at face value since dialogue utterances are shorter than the WSJ, but it provides assurance that a constrained grammar can produce good output even with a more abstract input. It must also be noted that the String Accuracy approaches do not take into account synonyms and paraphrases that are semantically equivalent. null These results also evaluate the amount of effect the stochastic phase of this approach has on the overall results.  word forest (the result of the first grammar-based phase) was only 0.40 (40%). After PathFinder chooses the most probable path, the average is 0.62 (62%). We can conclude that the grammar is still over-generating possible realizations and that this approach does require the second stochastic phase to choose a realization based on previously seen corpora.</Paragraph>
    <Paragraph position="3"> The difference between the results in the known domain (computer purchasing) and the new domain (monroe rescue) is 85% to 70% (69% to 62% without lexical items). While the difference is too great to claim domain independence on a semantic input, one of the main advantages of the over-generation grammar is that it requires less work to construct a new grammar when domains are switched. Here we see 70% achieved for zero invested time. A study that analyzes the time it takes a programmer to reach 85% has yet to be done.</Paragraph>
    <Paragraph position="4"> The runtime improvement of our approach is more drastic than originally thought possible. An average runtime of 0.37 seconds is decidedly within the time constraints of an effective dialogue system. While the 30% improvement in grammar indexing is also significant, the larger gains appear to be results of finer morphology and person/number agreement between verbs and their subjects. Compared with 19.29 seconds of the previous implementation, it shows that a middle ground between over-generation and statistical determination is a viable solution.</Paragraph>
    <Paragraph position="5"> Finally, more work is needed to produce better output. The majority of errors in this approach are modifier placement choices. Without a formal grammar, the final placement decisions are ultimately decided by an n-gram language model, resulting in short-sighted decisions. Even though 85% from a semantic input is a good result, modifiers tend to be the one area that falls behind. Several examples of this can be seen in Appendix B where some poor generations are shown.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML