File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1405_metho.xml

Size: 20,037 bytes

Last Modified: 2025-10-06 14:10:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1405">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Individuality and Alignment in Generated Dialogues</Title>
  <Section position="5" start_page="25" end_page="25" type="metho">
    <SectionTitle>
3 The CRAG System Overview
</SectionTitle>
    <Paragraph position="0"> The system described in the following sections (CRAG-2) is the successor to CRAG-1 which is detailed in Isard et al. (2005). The system generates a dialogue between two computer agents on the subject of opinions about a film. CRAG-2 uses the OPENCCG parsing and generation framework (White, 2004; White, 2006). The realiser component takes a logical form as input and outputs a list of candidate sentences ranked using one or more language models. In CRAG-2, we use the OPENCCG generator to massively over-generate paraphrases, and the combination of n-gram models described in Section 4 to choose the best utterance according to a character's personality and agenda, and the dialogue history.</Paragraph>
  </Section>
  <Section position="6" start_page="25" end_page="27" type="metho">
    <SectionTitle>
4 N-Grams: Personality and Alignment
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="25" end_page="26" type="sub_section">
      <SectionTitle>
Modelling
4.1 N-Gram Language Models
</SectionTitle>
      <Paragraph position="0"> The basic assumption underlying CRAG-2 is that personality, as well as alignment behaviour, can be modelled by the combination of a variety of n-gram language models.</Paragraph>
      <Paragraph position="1"> Language models are trained on a corpus and subsequently used to compute probability scores of word sequences. An n-gram language model approximates the probability of a word given its history of the preceding n[?]1 words. According to the chain rule, probabilities are then combined by multiplication. Equation (1) shows a trigram model that takes into account two words of context to predict the probability of a word sequence wn1:</Paragraph>
      <Paragraph position="3"/>
    </Section>
    <Section position="2" start_page="26" end_page="26" type="sub_section">
      <SectionTitle>
4.2 Avoiding the Length Effect
</SectionTitle>
      <Paragraph position="0"> Because word probabilities are always less than 1 and therefore each multiplication decreases the total, if we use this standard model, longer sentences will always receive lower scores (this is known as the length effect). We therefore calculate the probability of a sentence as the geometric mean of the probability of each word in the sentence as shown in (2): (2) P(wn1)[?]</Paragraph>
      <Paragraph position="2"/>
    </Section>
    <Section position="3" start_page="26" end_page="26" type="sub_section">
      <SectionTitle>
4.3 Linear Combination of Language Models
</SectionTitle>
      <Paragraph position="0"> OPENCCG supports the linear combination of language models, where each model is assigned a weight. For uniform interpolation of two language models Pa and Pb, each receives equal weight:</Paragraph>
      <Paragraph position="2"> In the more general case, the language models are assigned weights li, the sum of which has to  be 1: (4) P(wi|wi[?]1i[?]2)=l1Pa(wi|wi[?]1i[?]2)+l2Pb(wi|wi[?]1i[?]2)  For example, setting l1 = 0.9 and l2 = 0.1 assigns a high weight to the first language model.</Paragraph>
    </Section>
    <Section position="4" start_page="26" end_page="26" type="sub_section">
      <SectionTitle>
4.4 OPENCCG N-Gram Ranking
</SectionTitle>
      <Paragraph position="0"> In the OPENCCG framework, language models can be used to influence the chart-based realisation process. The agenda of edges is re-sorted according to the score an edge receives with respect to a language model. For CRAG-2, many paraphrases are generated from a given logical form, and they are then ranked in order of probability according to the combination of n-gram models appropriate for the character and stage of the dialogue.</Paragraph>
    </Section>
    <Section position="5" start_page="26" end_page="26" type="sub_section">
      <SectionTitle>
5 CRAG-2 Personality and Alignment
Models
</SectionTitle>
      <Paragraph position="0"> We use the SRILM toolkit (Stolcke, 2002) to compute our language models. All models (except for the cache language model described in Section 5.4) are trigram models with backoff to bi-grams and unigrams.</Paragraph>
      <Paragraph position="1"> We have experimented with two strategies for creating personality models. Since we want to study the effects of alignment as well as personality, it is essential that the two characters in a dialogue be distinct from one another, so that the effects of alignment can be seen. The first strategy involves using typical language for each personality trait, and the second uses the language of one individual. In both cases, the language models described in the following sections are combined as described in Section 5.5.</Paragraph>
    </Section>
    <Section position="6" start_page="26" end_page="26" type="sub_section">
      <SectionTitle>
5.1 Building a Personality
</SectionTitle>
      <Paragraph position="0"> Nowson (2006) performed a study on language use in weblogs. The weblog authors were asked to complete personality questionnaires based on the five-factor model (see Section 2.1). All weblog authors scored High or Medium on the Openness dimension, so we have no data for typical Low Open language.</Paragraph>
      <Paragraph position="1"> We divided the data into High, Medium and Low for each personality dimension, and trained language models so that we would be able to assess the probability of a word sequence given a personality type. This means that each individual weblog is used 5 times, once for each dimension.</Paragraph>
      <Paragraph position="2"> For each personality dimension, the system simplifies a character's personality setting x by assigning a value of High (x &gt; 70), Medium (30 &lt; x [?] 70) or Low (x [?] 30). The five models corresponding to the character's assigned personality are uniformly interpolated to give the final personality model. If the character has been given a low Openness score, since we do not have a model for this personality type, we simply interpolate the other four models.</Paragraph>
    </Section>
    <Section position="7" start_page="26" end_page="26" type="sub_section">
      <SectionTitle>
5.2 Borrowing a Personality
</SectionTitle>
      <Paragraph position="0"> Our second strategy was to train n-gram models on language of the individuals from the CRAG-1 corpus (Isard et al., 2005) and to use one of these models for each character in the dialogue.</Paragraph>
    </Section>
    <Section position="8" start_page="26" end_page="27" type="sub_section">
      <SectionTitle>
5.3 Base Language Model
</SectionTitle>
      <Paragraph position="0"> In the case of building a personality, a base language model is obtained by combining a language model computed from the corpus collected for the CRAG-1 system and a general language model based on data from the Switchboard corpus (Stolcke et al., 2000). The combined base model alone would rank the utterances without any bias for personality or alignment. When we are borrowing a personality, the base model is calculated from the Switchboard corpus alone.</Paragraph>
    </Section>
    <Section position="9" start_page="27" end_page="27" type="sub_section">
      <SectionTitle>
5.4 Cache Language Model
</SectionTitle>
      <Paragraph position="0"> We simulate alignment by computing a cache language model based on the utterance that was generated immediately before. This dialogue history cache model is the uniform interpolation of wordand class-based n-gram models, where classes act as a backoff mechanism when there is no exact word match. Classes group together lexical items with similar semantic properties, e.g.: * good, bad: quality-adjective * loved, hated: opinion-verb Details of this approach can be found in Brockmann et al. (2005).</Paragraph>
    </Section>
    <Section position="10" start_page="27" end_page="27" type="sub_section">
      <SectionTitle>
5.5 Combining the Language Models
</SectionTitle>
      <Paragraph position="0"> The system uses weights to combine all the models described above. First the base and personality models are interpolated to produce a basepersonality model, and finally the cache model is introduced to add alignment effects.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="27" end_page="27" type="metho">
    <SectionTitle>
6 Dialogue and Utterance Specifications
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="27" end_page="27" type="sub_section">
      <SectionTitle>
6.1 Character Specification
</SectionTitle>
      <Paragraph position="0"> Two computer characters are parameterised for their personality by specifying values (on a scale from 0 to 100) for the five dimensions: Extraversion (E), Neuroticism (N), Openness (O), Agreeableness (A), and Conscientiousness (C). Their alignment behaviour is set to a value between 0 (low propensity to align) and 1 (high propensity to align). Also, each character receives an agenda of topics they wish to discuss, along with polarities (positive/negative) that indicate their opinion on the respective topic.</Paragraph>
    </Section>
    <Section position="2" start_page="27" end_page="27" type="sub_section">
      <SectionTitle>
6.2 Utterance Design
</SectionTitle>
      <Paragraph position="0"> The character with the higher Extraversion score begins the dialogue, and their first topic is selected. Once an utterance has been generated, the other character is selected, and the system applies the algorithm shown in (5) to decide which topic should come next. This process continues until there are no topics left on the agenda of the cur- null rent speaker.</Paragraph>
      <Paragraph position="1"> (5) if (A &lt; 46) or (C &lt; 46) or  (no. of utts about this topic = 2) then take next topic from own agenda else continue on same topic The system creates a simple XML representation of the character's utterance, using the specified topic and polarity. An example using the topic music and polarity negative is shown in Figure 1. At this point the system also decides which discourse connectives may be appropriate, based on the previous topic and polarity.</Paragraph>
    </Section>
    <Section position="3" start_page="27" end_page="27" type="sub_section">
      <SectionTitle>
6.3 OPENCCG Logical Forms
</SectionTitle>
      <Paragraph position="0"> Following the method described in Foster and White (2004), the basic utterance specification is transformed, using stylesheets written in the XSL transformation language, into an OPENCCG logical form. We make use of the facility for defining optional and alternative inputs and underspecified semantics to massively over-generate candidate utterances. A fragment of the logical form which results from the transformation of Figure 1 is shown in Figure 2. We also include some fragments of canned text from the CRAG corpus in our OPENCCG lexicon.</Paragraph>
      <Paragraph position="1"> We also add optional interjections (i mean, you know, sort of ) and conversational markers (right, but, and, well) where appropriate given the discourse history.</Paragraph>
      <Paragraph position="2"> When the full logical form is processed by the OPENCCG system, the output consists of sentences of the types shown below: (I think) the music was bad.</Paragraph>
      <Paragraph position="3"> (I think) the music was not (wasn't) good.</Paragraph>
      <Paragraph position="4"> I did not (didn't) like the music.</Paragraph>
      <Paragraph position="5"> I hated the music.</Paragraph>
      <Paragraph position="6"> One thing I did not (didn't) like was the music.</Paragraph>
      <Paragraph position="7"> One thing I hated was the music.</Paragraph>
      <Paragraph position="8"> The fragmentary logical form in Figure 2 would create all possible paraphrases from:  and optional expressions, we create up to 3000 possibilities per utterance, and the best candidate is chosen by the specific combination of n-gram models appropriate for the given personality and dialogue history, as described in Section 4. Our OPENCCG lexicon is based on the core English lexicon included with the system and we have added vocabulary appropriate to the movie domain, and extended the range of grammatical constructions where necessary.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="27" end_page="29" type="metho">
    <SectionTitle>
7 Output and Evaluation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="27" end_page="29" type="sub_section">
      <SectionTitle>
7.1 Output
</SectionTitle>
      <Paragraph position="0"> In this section, we provide some example outputs from the CRAG-2 system, using characters based on participants from our corpus (see Section 5.2). Stan is higher on the Extraversion, Neuroticism, and Conscientiousness scales than Eddie. The characters' personalities and agendas are summarised in Figure 3.</Paragraph>
      <Paragraph position="1"> We show three example dialogues between Stan and Eddie. In the first (Figure 4) neither character aligns with the other at all, while in the second (Figure 5) Stan has a slight tendency towards alignment and in the third (Figure 6) a more pronounced tendency. In system terms, this means that in the first dialogue the cache model was given weight 0, while in the second and third the cache model was given weights 0.05 and 0.1 respectively for Stan's utterances. It can be seen that in the first dialogue, Eddie's use of humour in utterance 4 is followed by Stan's comedy, and dialogue in utterance 6 is followed by script. In the second dialogue, Stan aligns his first lexical choice with Eddie's but not his second, while in the third dialogue he aligns both. The syntactic structures of Stan's utterances in the second and third dialogue  also mirror Eddie's progressively more closely.</Paragraph>
      <Paragraph position="2"> 1. Stan: I didn't think much of the film you know.</Paragraph>
      <Paragraph position="3"> 2. Eddie: I mean I didn't like kind of like the story. 3. Stan: Yeah there were so many little pathways that they didn't take you know.</Paragraph>
      <Paragraph position="4"> 4. Eddie: I mean I didn't think the humour was really funny. 5. Stan: Yeah you know the comedy was a bit dull.</Paragraph>
      <Paragraph position="5"> 6. Eddie: I mean I didn't like kind of like the dialogue. 7. Stan: I mean the script was a bit dull.</Paragraph>
      <Paragraph position="6"> Figure 4: Zero Alignment 1. Stan: I didn't think much of the film you know.</Paragraph>
      <Paragraph position="7"> 2. Eddie: I mean I didn't like kind of like the story. 3. Stan: Yeah there were so many little pathways that they didn't take you know.</Paragraph>
      <Paragraph position="8"> 4. Eddie: I mean I didn't think the humour was really funny. 5. Stan: I mean the humour was a bit dull.</Paragraph>
      <Paragraph position="9"> 6. Eddie: I mean I didn't like kind of like the dialogue. 7. Stan: I mean the script was a bit dull.</Paragraph>
      <Paragraph position="10"> Figure 5: Little Alignment from Stan 1. Stan: I didn't think much of the film you know.</Paragraph>
      <Paragraph position="11"> 2. Eddie: I mean I didn't like kind of like the story. 3. Stan: I mean the story was a bit dull.</Paragraph>
      <Paragraph position="12"> 4. Eddie: I mean I didn't think the humour was really funny. 5. Stan: I mean the humour was a bit dull.</Paragraph>
      <Paragraph position="13"> 6. Eddie: I mean I didn't like kind of like the dialogue. 7. Stan: I mean the dialogue was a bit dull.</Paragraph>
      <Paragraph position="14">  To further illustrate the differences between the dialogues with and without alignment, we provide some utterance rankings. We show candidates for the fifth utterance in each dialogue. Table 1 shows sentences from the example generated without alignment, corresponding to utterance 5 (Stan)  from Figure 4. We show the first five occurrences of different sentence structures (see Section 6.3), with their rank and their geometric mean adjusted scores.</Paragraph>
      <Paragraph position="15"> Table 2 shows the the top five sentences from the fifth utterance from Figure 5 (little alignment), and Table 3 those from Figure 6 (more alignment). It can be seen that when more alignment is present, the syntactic structure used by the previous speaker rises higher in the rankings.</Paragraph>
    </Section>
    <Section position="2" start_page="29" end_page="29" type="sub_section">
      <SectionTitle>
7.2 Evaluation
</SectionTitle>
      <Paragraph position="0"> We have not evaluated CRAG-2. However, we have evaluated CRAG-1. The method was to generate a set of dialogues, systematically contrasting characters with extreme settings for the personality dimensions (High/Low Extraversion, Neuroticism, and Psychoticism1).</Paragraph>
      <Paragraph position="1">  Human subjects were asked to fill in a questionnaire to determine their personality. They were then given a selection of dialogues to read. After each dialogue, they were asked to rate their perception of the interaction and of the characters involved by assigning scores to a number of adjectives related to the personality dimensions.</Paragraph>
      <Paragraph position="2"> It was found that subjects could recognise differences in the Extraversion level of the language. Also, the personality setting of a character influenced the perception of its and its dialogue partner's personality (Kahn, 2006).</Paragraph>
      <Paragraph position="3"> We plan a similar evaluation for CRAG-2 to be able to compare human raters' impressions of dialogues generated by the two systems. We also plan to evaluate CRAG-2 internally by varying the weight given to the underlying language models, and observing the effects this has on the resulting ranking of the generated utterances.</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="29" end_page="30" type="metho">
    <SectionTitle>
8 Related Work
</SectionTitle>
    <Paragraph position="0"> Related work in NLG involves either personality or alignment. So far as we can tell, there is little work on the latter. Varges (2005) suggests that &amp;quot;a word similarity-based ranker could align the generation output (i.e. the highest-ranked candidate) with previous utterances in the discourse context&amp;quot;, but there is no report yet on an implementation of this proposal. A rather different approach is suggested by Bateman and Paris (2005), who discuss initial work on alignment, mediated by a process of register-recognition. Regarding generation with personality, the most influential work is probably Hovy's PAULINE system, which varies both content selection and realisation according to an individual speaker's goals and attitudes (Hovy, 1990).</Paragraph>
    <Paragraph position="1"> In her extremely useful survey of work on affective (particularly, emotional) natural language generation, Belz (2003) notes that the complexity of PAULINE's rule system means that numerous rule interactions can lead to unpredictable side effects.</Paragraph>
    <Paragraph position="2"> In response, Paiva and Evans (2004) take a more empirical line on style generation, which is closer to that pursued here. Other relevant work includes Loyall and Bates (1997), who explicitly propose that personality and emotion could be used in generation, but Belz observes that technical descriptions of Hap and the Oz project suggest that the proposals were not implemented. Walker et al.'s (1997) system produces linguistic behaviour which is much more varied than our current sys- null tem is capable of; but there, variation is driven by a model of social relations (based on Brown and Levinson), rather than on personality. The NECA project subsequently developed methods for generating scripts for pairs of dialogue agents (Piwek and van Deemter, 2003), supported by the MIAU platform (Rist et al., 2003). The VIRTUALHU-MAN project is a logical successor to this work, and its ALMA platform provides an integrated approach to affective generation, covering emotion, mood and personality (Gebhard, 2005).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML