XML Viewer - p06-1034

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1034_metho.xml
Size: 22,873 bytes
Last Modified: 2025-10-06 14:10:16
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1034">
  <Title>Learning to Generate Naturalistic Utterances Using Reviews in Spoken Dialogue Systems</Title>
  <Section position="4" start_page="265" end_page="266" type="metho">
    <SectionTitle>
2 Learning a Generation Dictionary
</SectionTitle>
    <Paragraph position="0"> Our automatically created generation dictionary consists of triples (U,R,S) representing a mapping between the original utterance U in the user review, its semantic representation R(U), and its syntactic structure S(U). Although templates are widely used in many practicalsystems (Seneff and Polifroni, 2000; Theune, 2003), we derive syntactic structures to represent the potential realizations, in order to allow aggregation, and other syntactic transformations of utterances, as well as contextspecificprosodyassignment(Walkeretal., 2003; Moore et al., 2004).</Paragraph>
    <Paragraph position="1"> The method is outlined briefly in Fig. 1 and de- null scribed below. It comprises the following steps: 1. Collect user reviews on the web to create a population of utterances U.</Paragraph>
    <Paragraph position="2"> 2. To derive semantic representations R(U): * Identify distinguished attributes and construct a domain ontology; * Specify lexicalizations of attributes; * Scrape webpages' structured data for named-entities; * Tag named-entities.</Paragraph>
    <Paragraph position="3"> 3. Derive syntactic representations S(U).</Paragraph>
    <Paragraph position="4"> 4. Filter inappropriate mappings.</Paragraph>
    <Paragraph position="5"> 5. Add mappings (U,R,S) to dictionary.</Paragraph>
    <Section position="1" start_page="265" end_page="265" type="sub_section">
      <SectionTitle>
2.1 Creating the corpus
</SectionTitle>
      <Paragraph position="0"> We created a corpus of restaurant reviews by scraping 3,004 user reviews of 1,810 restaurants posted at we8there.com (http://www.we8there.com/), where each individual review includes a 1-to-5 Likert-scale rating of different restaurant attributes. The corpus consists of 18,466 sentences.</Paragraph>
    </Section>
    <Section position="2" start_page="265" end_page="265" type="sub_section">
      <SectionTitle>
2.2 Deriving semantic representations
</SectionTitle>
      <Paragraph position="0"> The distinguished attributes are extracted from the webpages for each restaurant entity. They include attributes that the users are asked to rate, i.e. food, service, atmosphere, value,andoverall, which have scalar values. In addition, other attributes are extracted from the webpage, such as the name, foodtype and location of the restaurant, which have categorical values. The name attribute is assumed to correspond to the restaurant entity. Given the distinguished attributes, a Dist. Attr. Lexicalization food food, meal service service, staff, waitstaff, wait staff, server, waiter, waitress atmosphere atmosphere, decor, ambience, decoration value value, price, overprice, pricey, expensive, inexpensive, cheap, affordable, afford overall recommend, place, experience, establish-</Paragraph>
    </Section>
    <Section position="3" start_page="265" end_page="266" type="sub_section">
      <SectionTitle>
ment
</SectionTitle>
      <Paragraph position="0"> tributes.</Paragraph>
      <Paragraph position="1"> simple domain ontology can be automatically derived by assuming that a meronymy relation, represented by the predicate 'has', holds between the entity type (RESTAURANT) and the distinguished attributes. Thus, the domain ontology consists of the relations:  We assume that, although users may discuss other attributes of the entity, at least some of the utterancesin the reviewsrealizethe relations specified in the ontology. Our problem then is to identify these utterances. We test the hypothesis that, if an utterance U contains named-entities corresponding to the distinguished attributes, thatR for that utterance includes the relation concerning that attribute in the domain ontology.</Paragraph>
      <Paragraph position="2"> We define named-entities for lexicalizations of the distinguished attributes, starting with the seed word for that attribute on the webpage (Table 1).  Fornamed-entityrecognition, weuseGATE(Cunningham et al., 2002), augmented with named-entity lists for locations, food types, restaurant names, and food subtypes (e.g. pizza), scraped from the we8there webpages.</Paragraph>
      <Paragraph position="3"> We also hypothesizethat the rating givenfor the distinguished attribute specifies the scalar value of the relation. For example, a sentence containing food or meal is assumed to realize the relation 'RESTAURANT has foodquality.',andthe value of the foodquality attribute is assumed to be the value specified in the user rating for that attribute, e.g. 'RESTAURANT has foodquality = 5' in Fig. 1. Similarly, the other relations in Fig. 1 are assumed to be realized by the utterance &amp;quot;The best Spanish food in New York&amp;quot; because it contains  In future, we will investigate other techniques for bootstrapping these lexicalizations from the seed word on the webpage.</Paragraph>
      <Paragraph position="4">  tences filtered and retained by each filter.</Paragraph>
      <Paragraph position="5"> one FOODTYPE named-entity and one LOCATION named-entity. Values of categorical attributes are replaced by variables representing their type before the learned mappings are added to the dictionary, as shown in Fig. 1.</Paragraph>
    </Section>
    <Section position="4" start_page="266" end_page="266" type="sub_section">
      <SectionTitle>
2.3 Parsing and DSyntS conversion
</SectionTitle>
      <Paragraph position="0"> We adopt Deep Syntactic Structures (DSyntSs) as a format for syntactic structures because they can be realized by the fast portable realizer RealPro (Lavoie and Rambow, 1997). Since DSyntSs are a type of dependency structure, we first process the sentences with Minipar (Lin, 1998), and then convert Minipar's representation into DSyntS. Since user reviews are different from the newspaper articles on which Minipar was trained, the output of Minipar can be inaccurate, leading to failure in conversion. We check whether conversion is successful in the filtering stage.</Paragraph>
    </Section>
    <Section position="5" start_page="266" end_page="266" type="sub_section">
      <SectionTitle>
2.4 Filtering
</SectionTitle>
      <Paragraph position="0"> The goal of filtering is to identify U that realize the distinguished attributes and to guarantee high precision for the learned mappings. Recall is less important since systems need to convey requested information as accurately as possible. Our procedurefor derivingsemanticrepresentationsis based on the hypothesisthat ifU containsnamed-entities that realizethe distinguishedattributes, thatRwill include the relevant relation in the domain ontology. We also assume that if U contains named-entities that are not covered by the domain ontology, or words indicating that the meaning of U depends on the surrounding context, that R will not completely characterizesthe meaning ofU,andso U should be eliminated. We also require an accurate S for U. Therefore, the filters described below eliminate U that (1) realize semantic relations not in the ontology; (2) contain words indicating that its meaning depends on the context; (3) contain unknown words; or (4) cannot be parsed accurately. null No Relations Filter: The sentence does not contain any named-entities for the distinguished attributes.</Paragraph>
      <Paragraph position="1"> Other Relations Filter: The sentence contains named-entities for food subtypes, person  relation mappings.</Paragraph>
      <Paragraph position="2"> names, country names, dates (e.g., today, tomorrow, Aug. 26th) or prices (e.g., 12 dollars), or POS tag CD for numerals. These indicate relations not in the ontology.</Paragraph>
      <Paragraph position="3"> Contextual Filter: The sentence contains indexicals such as I, you, that or cohesive markers of rhetorical relations that connect it to some part of the preceding text, which means that the sentence cannot be interpreted out of context. These include discourse markers, such as list item markers with LS as the POS tag, that signal the organization structure of the text (Hirschberg and Litman, 1987), as well as discourse connectives that signal semantic and pragmatic relations of the sentence with other parts of the text (Knott, 1996), such as coordinatingconjunctions at the beginning of the utterance like and and but etc., and conjunct adverbs such as however, also, then.</Paragraph>
      <Paragraph position="4"> Unknown Words Filter: The sentence contains words not in WordNet (Fellbaum, 1998) (which includes typographical errors), or POS tags contain NN (Noun), which may indicate an unknown named-entity, or the sentence has more than a fixed length of words,  indicating that its meaning may not be estimated solely by named entities.</Paragraph>
      <Paragraph position="5"> Parsing Filter: The sentence fails the parsing to DSyntS conversion. Failures are automatically detected by comparing the original sentence with the one realized by RealPro taking the converted DSyntS as an input.</Paragraph>
      <Paragraph position="6"> We apply the filters, in a cascading manner, to the 18,466 sentences with semantic representations.</Paragraph>
      <Paragraph position="7"> As a result, we obtain 512 (2.8%) mappings of (U,R,S). After removing 61 duplicates, 451 distinct (2.4%) mappings remain. Table 2 shows the number of sentences eliminated by each filter.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="266" end_page="268" type="metho">
    <SectionTitle>
3 Objective Evaluation
</SectionTitle>
    <Paragraph position="0"> We evaluate the learned expressions with respect to domain coverage, linguistic variation and generativity. null  We used 20 as a threshold.</Paragraph>
    <Section position="1" start_page="267" end_page="267" type="sub_section">
      <SectionTitle>
3.1 Domain Coverage
</SectionTitle>
      <Paragraph position="0"> To be usable for a dialogue system, the mappings must have good domain coverage. Table 3 shows the distribution of the 327 mappings realizing a single scalar-valued relation, categorized by the associatedrating score.</Paragraph>
      <Paragraph position="1">  For example, there are 57 mappings with R of 'RESTAURANT has foodquality=5,' and a large number of mappings for both the foodquality and servicequality relations. Although we could not obtain mappings for some relations such as price={1,2}, coverage for expressing a single relation is fairly complete. There are also mappings that express several relations. Table 4 shows the counts of mappings for multi-relation mappings, with those containing a food or service relation occurring more frequentlyas inthesingle scalar-valuedrelationmappings. We found only 21 combinations of relations, which is surprising given the large potential number of combinations (There are 50 combinations if we treat relations with different scalar values differently). We also find that most of the mappingshavetwo or threerelations, perhaps suggesting that system utterances should not express too many relations in a single sentence.</Paragraph>
    </Section>
    <Section position="2" start_page="267" end_page="267" type="sub_section">
      <SectionTitle>
3.2 Linguistic Variation
</SectionTitle>
      <Paragraph position="0"> We also wish to assess whether the linguistic variation of the learned mappings was greater than what we could easily have generated with a hand-crafted dictionary, or a hand-crafted dictionary augmented with aggregation operators, as in  There are two other single-relation but not scalar-valued mappings that concern LOCATION in our mappings.</Paragraph>
      <Paragraph position="1"> (Walker et al., 2003). Thus, we first categorized the mappings by the patterns of the DSyntSs. Table 5 shows the most common syntactic patterns (more than 10 occurrences), indicating that 30% of the learned patterns consist of the simple form &amp;quot;X is ADJ&amp;quot;whereADJ is an adjective, or &amp;quot;X is RB ADJ,&amp;quot; where RB is a degree modifier. Furthermore, up to 55% of the learned mappings could be generated from these basic patterns by the application of a combination operator that coordinates multiple adjectives, or coordinates predications over distinct attributes. However, there are 137 syntactic patterns in all, 97 with unique syntactic structures and 21 with two occurrences, accounting for 45% of the learned mappings. Table 6 shows examplesof learned mappingswith distinctsyntactic structures. It would be surprising to see this type of variety in a hand-crafted generation dictionary.</Paragraph>
      <Paragraph position="2"> In addition, the learned mappings contain 275 distinct lexemes, with a minimum of 2, maximum of 15, and mean of 4.63 lexemes per DSyntS, indicating that the method extracts a wide variety of expressions of varying lengths.</Paragraph>
      <Paragraph position="3"> Another interesting aspect of the learned mappings is the wide variety of adjectival phrases (APs)inthecommonpatterns. Tables7and8 show the APs in singlescalar-valuedrelation mappings for food and service categorized by the associated ratings. Tables for atmosphere, value and overall can be found in the Appendix. Moreover, the meanings for some of the learned APs are very specific to the particular attribute, e.g. cold and burnt associated with foodquality of 1, attentive and prompt for servicequality of 5, silly and inattentive for servicequality of 1. and mellow for atmosphere of 5. In addition, our method places the adjectival phrases (APs) in the common patterns on a more fine-grained scale of 1 to 5, similar to the strengthclassificationsin (Wilsonet al., 2004), in contrast to other automatic methods that classify expressions into a binary positive or negative polarity (e.g. (Turney, 2002)).</Paragraph>
    </Section>
    <Section position="3" start_page="267" end_page="268" type="sub_section">
      <SectionTitle>
3.3 Generativity
</SectionTitle>
      <Paragraph position="0"> Our motivation for deriving syntactic representations for the learned expressions was the possibility of using an off-the-shelf sentence planner to derive new combinations of relations, and apply aggregation and other syntactic transformations.</Paragraph>
      <Paragraph position="1"> We examined how many of the learned DSyntSs can be combined with each other, by taking every pair of DSyntSs in the mappings and applying the built-in merge operation in the SPaRKy generator (Walker et al., 2003). We found that only 306 combinations out of a potential 81,318  [atmosphere=5, overall=5] This is a quiet little place with great atmosphere.</Paragraph>
      <Paragraph position="2"> [atmosphere=5,food=5, overall=5, service=5, value=5] The food, service and ambience of the place are all fabulous and the prices are downright cheap.</Paragraph>
      <Paragraph position="3">  hand for relations in square brackets) whose syntactic patterns occurred only once.</Paragraph>
      <Paragraph position="4"> combinations (0.37%) were successful. This is because the merge operation in SPaRKy requires that the subjects and the verbs of the two DSyntSs are identical, e.g. the subject is RESTAURANT and verb is has, whereas the learned DSyntSs often place the attribute in subject position as a definite noun phrase. However, the learned DSyntS can be incorporated into SPaRKy using the semantic representations to substitute learned DSyntSs into nodes in the sentence plan tree. Figure 2 shows some example utterances generated by SPaRKy with its originaldictionary and exampleutterances when the learned mappings are incorporated. The resulting utterances seem more natural and colloquial; we examine whether this is true in the next section.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="268" end_page="269" type="metho">
    <SectionTitle>
4 Subjective Evaluation
</SectionTitle>
    <Paragraph position="0"> We evaluate the obtained mappings in two respects: the consistency between the automatically derived semantic representation and the realizafood=1 awful, bad, burnt, cold, very ordinary food=2 acceptable, bad, flavored, not enough, very bland, very good food=3 adequate, bland and mediocre, flavorful but cold, pretty good, rather bland, very good food=4 absolutely wonderful, awesome, decent, excellent, good, good and generous, great, outstanding, rather good, really good, traditional, very fresh and tasty, very good, very very good food=5 absolutely delicious, absolutely fantastic, absolutely great, absolutely terrific, ample, well seasoned and hot, awesome, best, delectable and plentiful, delicious, delicious but simple, excellent, exquisite, fabulous, fancy but tasty, fantastic, fresh, good, great, hot, incredible, just fantastic, large and satisfying, outstanding, plentiful and outstanding, plentiful and tasty, quick and hot, simply great, so delicious, so very tasty, superb, terrific, tremendous, very good, wonderful  valued relation mappings for foodquality.</Paragraph>
    <Paragraph position="1"> tion, and the naturalness of the realization.</Paragraph>
    <Paragraph position="2"> For comparison, we used a baseline of hand-crafted mappings from (Walker et al., 2003) except that we changed the word decor to atmosphere and added five mappings for overall.</Paragraph>
    <Paragraph position="3"> For scalar relations, this consists of the realization &amp;quot;RESTAURANT has ADJ LEX&amp;quot; where ADJ is mediocre, decent,good,very good,orexcellent for rating values 1-5, and LEX is food quality, service, atmosphere, value,oroverall depending on the relation. RESTAURANT is filled with the name of a restaurant at runtime. For example, 'RESTAURANT has foodquality=1' is realized as &amp;quot;RESTAURANT has mediocre food quality.&amp;quot; The location and food type relations are mapped to &amp;quot;RESTAURANT is located in LOCATION&amp;quot; and &amp;quot;RESTAURANT is a FOODTYPE restaurant.&amp;quot; The learned mappings include 23 distinct semantic representationsfor a single-relation (22 for scalar-valued relations and one for location) and 50 for multi-relations. Therefore, using the hand-crafted mappings, we first created 23 utterances for the single-relations. We then created three utterancesforeachof50multi-relationsusingdiffer- null ent clause-combining operations from (Walker et al., 2003). This gave a total of 173 baseline utterances, which together with 451 learned mappings,  service=1 awful, bad, great, horrendous, horrible, inattentive, forgetful and slow, marginal, really slow, silly and inattentive, still marginal, terrible, young service=2 overly slow, very slow and inattentive service=3 bad, bland and mediocre, friendly and knowledgeable, good, pleasant, prompt, very friendly service=4 all very warm and welcoming, attentive, extremely friendly and good, extremely pleasant, fantastic, friendly, friendly and helpful, good, great, great and courteous, prompt and friendly, really friendly, so nice, swift and friendly, very friendly, very friendly and accommodating service=5 all courteous, excellent, excellent and friendly, extremely friendly, fabulous, fantastic, friendly, friendly and helpful, friendly and very attentive, good, great, great, prompt and courteous, happy and friendly, impeccable, intrusive, legendary, outstanding, pleasant, polite, attentive and prompt, prompt and courteous, prompt and pleasant, quick and cheerful, stupendous, superb, the most attentive, unbelievable, very attentive, very congenial, very courteous, very friendly, very friendly and helpful, very friendly and pleasant, very friendly and totally personal, very friendly and welcoming, very good, very helpful, very timely, warm and friendly, wonderful Table 8: Adjectival phrases (APs) in single scalar-valued relation mappings for servicequality. yielded 624 utterances for evaluation.</Paragraph>
    <Paragraph position="4"> Ten subjects, all native English speakers, evaluated the mappings by reading them from a webpage. For each system utterance, the subjects were asked to express their degree of agreement, on a  scale of 1 (lowest) to 5 (highest), with the statement (a) The meaning of the utterance is consis- null tent with the ratings expressing their semantics, and with the statement (b) The style of the utterance is very natural and colloquial.Theywere asked not to correct their decisions and also to rate each utterance on its own merit.</Paragraph>
    <Section position="1" start_page="269" end_page="269" type="sub_section">
      <SectionTitle>
4.1 Results
</SectionTitle>
      <Paragraph position="0"> Table 9 shows the means and standard deviations of thescores forbaselinevs. learned utterancesfor consistency and naturalness. A t-test shows that theconsistencyofthe learnedexpressionis significantly lower than the baseline (df=4712, p &lt; .001) but that their naturalness is significantly higher than the baseline (df=3107, p &lt; .001). However, consistency is still high. Only 14 of the learned utterances (shown in Tab. 10) have a mean consistency score lower than 3, which indicates that, by and large, the human judges felt that the inferred semantic representations were consistent with the meaning of the learned expressions. The correlation coefficient between consistency and naturalness scores is 0.42, which indicates that consis-Original SPaRKy utterances * Babbo has the best overall quality among the selected restaurants with excellent decor, excellent service and superb food quality.</Paragraph>
      <Paragraph position="1"> * Babbo has excellent decor and superb food quality with excellent service. It has the best overall quality among the selected restaurants.</Paragraph>
      <Paragraph position="2"> | Combination of SPaRKy and learnedDSyntS * Because the food is excellent, the wait staff is professional and the decor is beautiful and very comfortable, Babbo has the best overall quality among the selected restaurants.</Paragraph>
      <Paragraph position="3"> * Babbo has the best overall quality among the selected restaurants because atmosphere is exceptionally nice, food is excellent and the service is superb.</Paragraph>
      <Paragraph position="4"> * Babbo has superb food quality, the service is exceptional and the atmosphere is very creative.Ithasthe  aged over 10 subjects.</Paragraph>
      <Paragraph position="5"> tency does not greatly relate to naturalness.</Paragraph>
      <Paragraph position="6"> We also performed an ANOVA (ANalysis Of VAriance) of the effect of each relation in R on naturalness and consistency. There were no significant effects except that mappings combining food, service, and atmosphere were significantly worse (df=1, F=7.79, p=0.005). However, there is a trend for mappings to be rated higher for the food attribute (df=1, F=3.14, p=0.08) and the value attribute (df=1, F=3.55, p=0.06) for consistency, suggesting that perhaps it is easier to learn some mappings than others.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML