File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-2014_metho.xml

Size: 11,369 bytes

Last Modified: 2025-10-06 14:09:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-2014">
  <Title>Dialogue Act Tagging for Instant Messaging Chat Sessions</Title>
  <Section position="3" start_page="79" end_page="80" type="metho">
    <SectionTitle>
2 Issues in Instant Messaging Dialogue
</SectionTitle>
    <Paragraph position="0"> There are several differences between IM and transcribed spoken dialogue. The dialogue act classifier described in this paper is dependent on preprocessing tasks to resolve the issues discussed in this section. null Sequences of words in textual dialogue are grouped into three levels. The first level is a Turn, consisting of at least one Message, which consists of at least one Utterance, defined as follows: Turn: Dialogue participants normally take turns writing.</Paragraph>
    <Paragraph position="1"> Message: A message is defined as a group of words that are sent from one dialogue participant to the other as a single unit. A single turn can span multiple messages, which sometimes leads to accidental interruptions as discussed inSS2.2.</Paragraph>
    <Paragraph position="2"> Utterance: This is the shortest unit we deal with and can be thought of as one complete semantic unit-something that has a meaning. This can be a complete sentence or as short as an emoticon (e.g. &amp;quot;:-)&amp;quot; to smile).</Paragraph>
    <Paragraph position="3"> Several lines from one of the dialogues in our corpus are shown as an example denoted with Turn, Message, and Utterance boundaries in Table 1.</Paragraph>
    <Section position="1" start_page="79" end_page="79" type="sub_section">
      <SectionTitle>
2.1 Utterance Segmentation
</SectionTitle>
      <Paragraph position="0"> Because dialogue acts work at the utterance level and users send messages which may contain more than one utterance, we first need to segment the messages by detecting utterance boundaries. Messages in our data were manually labelled with one or more dialogue act depending on the number of utterances each message contained. Labelling in this fashion had the effect of also segmenting messages into utterances based on the dialogue act boundaries.</Paragraph>
    </Section>
    <Section position="2" start_page="79" end_page="80" type="sub_section">
      <SectionTitle>
2.2 Synchronising Messages in IM Dialogue
</SectionTitle>
      <Paragraph position="0"> The end of a turn is not always obvious in typed dialogue. Users often divide turns into multiple messages, usually at clause or utterance boundaries, which can result in the end of a message being mistaken as the end of that turn. This ambiguity can lead to accidental turn interruptions which cause messages to become unsynchronised. In these cases each participant tends to respond to an earlier message than the immediately previous one, making the conversation seem somewhat incoherent when read as a transcript. An example of such a case is shown in Table 1 in which Customer replied to message 10 with message 12 while Sally was still completing turn 6 with message 11. If the resulting discourse is read sequentially it would seem that the customer ignored the information provided in message 11. The time between messages shows that only 1 second elapsed between messages 11 and 12, so message 12 must in fact be in response to message 10.</Paragraph>
      <Paragraph position="1"> Message Mi is defined to be dependent on message Md if the user wrote Mi having already seen and presumably considered Md. The importance of unsynchronised messages is that they result in the dialogue acts also being out of order, which is  problematic when using bigram or higher-order n-gram language models. Therefore, messages are re-synchronised as described inSS3.2 before training and classification.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="80" end_page="81" type="metho">
    <SectionTitle>
3 The Dialogue Act Labelling Task
</SectionTitle>
    <Paragraph position="0"> The domain being modelled is the online shopping assistance provided as part of the MSN Shopping site. People are employed to provide live assistance via an IM medium to potential customers who need help in finding items for purchase. Several dialogues were collected using this service, which were then manually labelled with dialogue acts and used to train our statistical models.</Paragraph>
    <Paragraph position="1"> There were 3 aims of this task: 1) to obtain a realistic corpus; 2) to define a suitable set of dialogue act tags; and 3) to manually label the corpus using the dialogue act tag set, which is then used for training the statistical models for automatic dialogue act classification.</Paragraph>
    <Section position="1" start_page="80" end_page="80" type="sub_section">
      <SectionTitle>
3.1 Tag Set
</SectionTitle>
      <Paragraph position="0"> We chose 12 tags by manually labelling the dialogue corpus using tags that seemed appropriate from the 42 tags used by Stolcke et al. (2000) based on the Dialog Act Markup in Several Layers (DAMSL) tag set (Core and Allen, 1997). Some tags, such as UN-INTERPRETABLE and SELF-TALK, were eliminated as they are not relevant for typed dialogue. Tags that were difficult to distinguish, given the types of utterances in our corpus, were collapsed into one tag.</Paragraph>
      <Paragraph position="1"> For example, NO ANSWERS, REJECT, and NEGA-TIVE NON-NO ANSWERS are all represented by NO-ANSWER in our tag set.</Paragraph>
      <Paragraph position="2"> The Kappa statistic was used to compare inter-annotator agreement normalised for chance (Siegel and Castellan, 1988). Labelling was carried out by three computational linguistics graduate students with 89% agreement resulting in a Kappa statistic of 0.87, which is a satisfactory indication that our corpus can be labelled with high reliability using our tag set (Carletta, 1996).</Paragraph>
      <Paragraph position="3"> A complete list of the 12 dialogue acts we used is shown in Table 2 along with examples and the frequency of each dialogue act in our corpus.</Paragraph>
      <Paragraph position="4">  and frequencies given as percentages of the total number of utterances in our corpus.</Paragraph>
    </Section>
    <Section position="2" start_page="80" end_page="81" type="sub_section">
      <SectionTitle>
3.2 Re-synchronising Messages
</SectionTitle>
      <Paragraph position="0"> The typing rate is used to determine message dependencies. We calculate the typing rate by time(Mi)[?]time(Md) length(Mi) , which is the elapsed time be-tween two messages divided by the number of characters in Mi. The dependent message Md may be the immediately preceding message such that d = i[?]1 or any earlier message where 0 &lt; d &lt; i with the first message being M1. This algorithm is shown in Algorithm 1.</Paragraph>
      <Paragraph position="1"> Algorithm 1 Calculate message dependency for</Paragraph>
      <Paragraph position="3"> The typing threshold in Algorithm 1 was calculated by taking the 90th percentile of all observed typing rates from approximately 300 messages that had their dependent messages manually labelled resulting in a value of 5 characters per second. We found that 20% of our messages were unsynchro- null nised, giving a baseline accuracy of automatically detecting message dependencies of 80% assuming that Md = Mi[?]1. Using the method described, we achieved a correct dependency detection accuracy of 94.2%.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="81" end_page="81" type="metho">
    <SectionTitle>
4 Training on Speech Acts
</SectionTitle>
    <Paragraph position="0"> Our goal is to perform automatic dialogue act classification of the current utterance given any previous utterances and their tags. Given all available evidence E about a dialogue, the goal is to find the dialogue act sequence U with the highest posterior probability P(U|E) given that evidence. To achieve this goal, we implemented a naive Bayes classifier using bag-of-words feature representation such that the most probable dialogue act ^d given a bag-of-words input vector -v is taken to be:</Paragraph>
    <Paragraph position="2"> where vj is the jth element in -v, D denotes the set of all dialogue acts and P(-v) is constant for all d[?]D.</Paragraph>
    <Paragraph position="3"> The use of P(d) in Equation 3 assumes that dialogue acts are independent of one another. However, we intuitively know that if someone asks a YES-NO-QUESTION then the response is more likely to be a YES-ANSWER rather than, say, CONVENTIONAL-CLOSING. This intuition is reflected in the bigram transition probabilities obtained from our corpus.1 To capture this dialogue act relationship we trained standard n-gram models of dialogue act history with add-one smoothing for the calculation of P(vj|d). The bigram model uses the posterior probability P(d|H) rather than the prior probability P(d) in Equation 3, where H is the n-gram context vector containing the previous dialogue act or previous 2 dialogue acts in the case of the trigram model.</Paragraph>
  </Section>
  <Section position="6" start_page="81" end_page="82" type="metho">
    <SectionTitle>
5 Experimental Results
</SectionTitle>
    <Paragraph position="0"> Evaluation of the results was conducted via 9-fold cross-validation across the 9 dialogues in our corpus using 8 dialogues for training and 1 for testing.</Paragraph>
    <Paragraph position="1"> Table 3 shows the results of running the experiment with various models replacing the prior probability, P(d), in Equation 3. The Min, Max, and Mean columns are obtained from the cross-validation technique used for evaluation. The baseline used for this task was to assign the most frequently observed dialogue act to each utterance, namely, STATEMENT.</Paragraph>
    <Paragraph position="2"> Omitting P(d) from Equation 3 such that only the likelihood (Equation 2) of the naive Bayes formula is used resulted in a mean accuracy of 80.1%.</Paragraph>
    <Paragraph position="3"> The high accuracy obtained with only the likelihood reflects the high dependency between dialogue acts and the actual words used in utterances. This dependency is represented well by the bag-of-words approach. Using P(d) to arrive at Equation 3 yields a slight increase in accuracy to 80.6%.</Paragraph>
    <Paragraph position="4"> The bigram model obtains the best result with 81.6% accuracy. This result is due to more accurate predictions with P(d|H). The trigram model produced a slightly lower accuracy rate, partly due to a lack of training data and to dialogue act adjacency pairs not being dependent on dialogue acts further removed as discussed inSS4.</Paragraph>
    <Paragraph position="5"> In order to gauge the effectiveness of the bigram and trigram models in view of the small amount of training data, hit-rate statistics were collected during testing. These statistics, presented in Table 3, show the percentage of conditions that existed in the various models. Conditions that did not exist were not counted in the accuracy measure during evaluation.</Paragraph>
    <Paragraph position="6"> The perplexities (Cover and Thomas, 1991) for the various n-gram models we used are shown in  decreased perplexity, comes when moving from the unigram to bigram models as expected. However, the large difference between the bigram and trigram models is somewhat unexpected given the theory of adjacency pairs. This may be a result of insufficient training data as would be suggested by the lower tri-gram hit rate.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML