File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/w99-0313_metho.xml

Size: 20,295 bytes

Last Modified: 2025-10-06 14:15:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0313">
  <Title>A Two-level Approach to Coding Dialogue for Discourse Structure: Activities of the 1998 DRI Working Group on Higher-level Structures*</Title>
  <Section position="3" start_page="101" end_page="103" type="metho">
    <SectionTitle>
2 The coding scheme
</SectionTitle>
    <Paragraph position="0"> The&amp;quot; coding scheme used for pre-meeting coding exercises is defined in (Nakatani and Traum, 1999), which was distributed to the group members prior to coding assignments. As mentioned above, this included two levels of coding, common ground units (CGUs) at the meso-level, and intentional/informational units (IUs) at the macro-level. Here we provide a brief summary of these coding schemes. Interested parties are referred to the manual (Nakatani and Traum, 1999) for detailed instructions and examples. There are three stages of coding, which must be performed in sequence. First, a preparatory tokenization phase, in which the dialogue is segmented into speaker turns and utterance tokens within the turns, each token being given a label. This was used as input for the coding of CGUs, in which utterance tokens were gathered together in units of tokens which together served to add some material to the common ground. Finally, the results of CGU coding was used as input for IU Coding, in which hierarchical intentional structure was built from either CGUs or smaller IUs. Each of these processes is briefly described in the subsections below.</Paragraph>
    <Section position="1" start_page="101" end_page="102" type="sub_section">
      <SectionTitle>
2.1 Common Ground Units (CGUs)
</SectionTitle>
      <Paragraph position="0"> A Common Ground Unit (CGU) contains all and only the utterance tokens needed to ground (that is, make part of the common ground) some bit of content. This content will include the initial token of the unit, plus whatever additional content is added by subsequent tokens in the unit and added to the common ground at the same time as the initiating token. The main coherence principle for CGUs is  thus not directly related to the coherence of the content itself (this kind of coherence is handled at the micro and macro levels), but whether the content is added to the common ground in the same manner (e.g., with the same acknowledgment utterance).</Paragraph>
      <Paragraph position="1"> CGUs will require at least some initiating material by one conversational participant (the initiator), presenting the new content, as well as generally some feedback (Allwood et al., 1992), or acknowledgment, by the other participant.</Paragraph>
      <Paragraph position="2"> The following principles in (1) summarize the decision procedures for how to code an utterance token with respect to existing or new CGUs:</Paragraph>
      <Paragraph position="4"> If the token contains new content, and there is no accessible ungrounded CGU, the contents of which could be acknowledged together with the current token then create a new CGU, and add this token to it.</Paragraph>
      <Paragraph position="5"> if there is an accessible CGU for which the  current token: (a) acknowledges the content (b) repairs the content (c) cancels the CGU (in this case, also put a * before the CGU marker, to indicate that it is canceled).</Paragraph>
      <Paragraph position="6"> (d) continues the content, in such a fashion that all content could be grounded to- null gether (with the same acknowledgment) then add this token to the CGU otherwise, do not add this token to the CGU Note that these rules are not mutually exclusive: more than one may apply, so that a token can be added to more than one CGU.</Paragraph>
      <Paragraph position="7"> CGUs are similar in many respects to other meso-level coding schemes, such as initiative-response in the LINDA coding scheme (Ahrenberg et al., 1990; Dahlb~ck and JSnsson, 1998), or conversational games (Carletta et al., 1997). However, there are some important differences. In terms of content, CGUs cover only grounding, while the LINDA scheme covers initiative more generally, and the I-ICRC game structure codes achievement of dialogue purposes. Several authors (e.g., (Allwood et al., 1992; Clark, 1994; Dillenbourg et al., 1996), consider multiple levels of coordination in dialogue, including roughly those of contact, perception, understanding, and attitudinal reaction. Grounding (which is what CGUs capture) is mainly concerned with the understanding level (and also the perception of messages), while there is a large part of the notion of response that is concerned with attitudinal reaction and not strictly mutual understanding. There are also differences in the structuring mechanisms used. In the LINDA coding scheme, IR units consist of trees, which may contain embedded IR units as constituents. The HCRC scheme does not require a strict tree structure, but also allows embedded games, when one game is seen as subordinate to the main purpose of another. In contrast, CGUs are &amp;quot;fiat&amp;quot; structures, consisting only of a set of utterances which work together to add some material to common ground. Moreover, a single utterance can be part of multiple (non-nested) CGUs. For example, except for very short reactions which are expressed in the same locution with the feed-back signal of understanding, the grounding of the reaction itself will also constitute a separate CGU. More concretely, consider a suggestion followed by a refinement by another speaker. The refinement indicates understanding of the original, and is thus part of the prior CGU, which presents the original, but it also'introduces new material (the refinement itself), and thus also initiates a new CGU, which requires further signals of understanding to he added to the common ground.</Paragraph>
      <Paragraph position="8"> Both of these differences in content and structuring mechanisms can lead to differences in the kinds of units that would be coded for a given dialogue fragment. For example, a question/answer/followup sequence might be one IR-unit or game but two CGUs (one to ground the question, and one to ground the answer). Likewise, a unit including a repair might be coded as two (embedded) IR-units or games, but only a single CGU.</Paragraph>
      <Paragraph position="9"> It remains an open question as to whether CGUs or one of these other meso-level units might be the most appropriate building block for macro-level intentional structure. One reason to think that CGUs might be more appropriate, though, is the use of non-hierarchical units, which avoids the question of which level of unit to use as starting point.</Paragraph>
    </Section>
    <Section position="2" start_page="102" end_page="103" type="sub_section">
      <SectionTitle>
2.2 Intentional/Informational Units (IUs)
</SectionTitle>
      <Paragraph position="0"> Macro-level of discourse structure coding involves reasoning about the relationships amongst the pieces of information that have been established as common ground. This is achieved by performing a topicstructure or planning-based analysis of the content of the CGUs, to produce a hierarchy of CGUs in a well-formed tree data structure. Such analysis proceeds in similar fashion to the intention-based methodology outlined in (Nakatani et al., 1995), but there are some crucial differences. The coding scheme of (Nakatani et al., 1995) was developed for mono-</Paragraph>
      <Paragraph position="2"> logic discourse, and is not directly applicable to dialogue. In particular, there is the general problem in dialogue, of associating the individual intentions of the participants with the overall structure. We use CGUs as a starting point helps establish the relevant intentions as a kind of joint intentional structure. While CGU analysis concentrates on establishing what is being said at the level of information exchange, macro-level analysis goes beyond this to establish relationships at a higher-level, namely relationships amongst CGUs (instead of utterancetokens) and relationships amongst groups of CGUs.</Paragraph>
      <Paragraph position="3"> These relationships may be both informational and intentional. Thus, we refer to groupings of CGUs at the lowest level of macro-structure as I-UNITS (IUs), where 'T' stands for either informational or intentional. null IU trees are created by identifying certain kinds of discourse relations. Following (Grosz and Sidner, 1986), macro-level analysis captures two fundamental intentional relations between I-units, those of domination (or parent-child) and satisfaction-precedence (or sibling) relations. The corresponding informational relations are generates and enables (Pollack, 1986; Goldman, 1970). More concretely, the domination relation can be elaborated in a planning-based framework as holding between a subsidiary plan and its parent, in which the completion of one plan contributes to the completion of its parent plan; the satisfaction-precedence relation can be elaborated as the temporal dependency between two plans (Lochbaum, 1994). As is often the case, when a temporal dependency cannot be strictly established, two IUs will be placed in a sibling relationship by virtue of their each being in a subsidiary relationship with the same dominating IU.</Paragraph>
      <Paragraph position="4"> I-unit analysis consists of identifying the higher-level intentional/informational structure of the dialogue, where each I-unit (IU) in the macro structure achieves a joint (sub)goal or conveys information necessary to achieve a joint (sub)goal. The following schema captures the decision process for IU coding: * Establish problem to be collaboratively solved, or joint goal.</Paragraph>
      <Paragraph position="5"> * Negotiate how to achieve joint goal.</Paragraph>
      <Paragraph position="6"> This may involve:  1. Deciding which (of possibly several) recipe(s) for action to use, 2. Deciding how to implement a recipe in the participants' domain by instantiating or identifying constraints and parameters of the recipe (e.g. deciding which of two engines to move to the orange warehouse), 3. Breaking the plan down into subplans, whose own achievements can be similarly negotiated at the subtask level.</Paragraph>
      <Paragraph position="7"> * Confirm achievement of (or failure to achieve) joint goal.</Paragraph>
      <Paragraph position="8">  This schema explicitly accommodates the inferential interface between the intentional and informational levels of analysis. For example, intentional and informational relations blend as siblings at the level of choosing and implementing a recipe and breaking down a plan into subplans. This reflects the simple fact that achieving a goal via action requires knowledge of the world (e.g. identification of objects), knowledge of how to act in the world (i.e. knowledge of recipes), and knowledge of how to reason about complex relations among actions (i.e. the ability to plan and re-plan). In sum, the blending of intentional and informational relations in IU coding is an original theoretical aspect of this coding scheme.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="103" end_page="106" type="metho">
    <SectionTitle>
3 Coding exercises
</SectionTitle>
    <Paragraph position="0"> In order to familiarize the group members with the coding schemes and provide some initial data for discussion, several coding exercises were performed, divided into two sets of two dialogues each - first TOOT and TRAINS, second Verbmobil (IU on common provided CGUs) and Maptask (only a fragment, no IU coding). These dialogues are all roughly characterizable as &amp;quot;task-oriented&amp;quot;, although the tasks are quite varied.</Paragraph>
    <Paragraph position="1"> The TRAINS dialogue was taken from the TRAINS-93 Corpus by the University of Rochester (Heeman and Allen, 1994; Heeman and Allen, 1995).</Paragraph>
    <Paragraph position="2"> TRAINS dialogs deal with tasks involving manufacturing and shipping goods in a railroad freight system. TRAINS dialogs consist of two human speakers, the system and the user. The user is given a problem to solve and a map of the world. The system is given a more detailed map and acts as a planning assistant to the user. Additional online information about the dialogues can be found at http ://WWl~. cs. ro chest or. edu/res eazch/speech/ 93dialogs/and about the trains project as a whole at hl:~;p://www, cs. rochester, edu/reseaxch/trains/ Toot dialogues are Human-Computer spoken dialogues, in which the computer system (S) finds Am-</Paragraph>
    <Paragraph position="4"> trek rail schedules via internet, according to specifications provided by the human user (U). The Toot system is described in (Litman et el., 1998). The dialogue we used for coding, was provided by Diane Litman of AT&amp;T Research.</Paragraph>
    <Paragraph position="5"> The Verbmobil project is a long term effort to develop a mobile translation system for spontaneous speech in face-to-face situations. The current domain of focus is scheduling business meetings. To support this goal, some English human-human dialogs were collected in this domain. More information about the Verbmobil project can be found on-line at http://~ww, dfki. uni-sb, do/verbmobPS1/.</Paragraph>
    <Paragraph position="6"> In the dialogue we coded, the two speakers try to establish a time and place for a meeting.</Paragraph>
    <Paragraph position="7"> The DCIEM Map Task dialogs from which the one we coded (d204), was drawn were collected in Canada and consist of pairs of Canadian army reservists collaborating to solve a problem. Both reservists have a map but the maps are not identical in terms of the landmarks present. One participant is designated the direction giver, G and has a path marked on his map. The goal is for the other participant, the direction follower, F to trace this route on his map even though he can only communicate with G via speech; i.e., these are not face to face conversations. Only the opening portion of the dialogue was coded, due to the length. More information about the DCIEM Map Task corpus can be found online at http ://www. here. ed. ac. uk/Si~ o/MAPTASKD, html.</Paragraph>
    <Paragraph position="8"> A fragment taken from the Verbmobil Dialogue, along with CGU and IU coding for this fragment is shown in Figure 1. Note that some utterances (e.g., A.11.1) appear in multiple cgus (serving an acknowl* edgment function for one and a proposal function for the other), and some utterances (e.g., B.12.2) do not appear in any.</Paragraph>
    <Section position="1" start_page="104" end_page="105" type="sub_section">
      <SectionTitle>
3.1 CGU Coding Analysis
</SectionTitle>
      <Paragraph position="0"> The inter-coder reliability of CGU coding was quite variable between the different dialogues and for different stretches within some of the dialogues. Resuits ranged from segments in which all coders coded identically to a few segments (for Maptask and Toot) in which all coders coded some aspect differently.</Paragraph>
      <Paragraph position="1"> This section outlines some of the qualitative and quantitative analysis done on the CGU coding for the four dialogues presented in the previous section.</Paragraph>
      <Paragraph position="2">  It was a bit challenging to devise a meaningful measure of inter-coder reliability for the CGU coding task. While it is simple to count how many coders chose to include a particular unit, there is no  easy way to devise an expected agreement for such a unit. Table 2 shows the average ratio of coders per CGU coded by any of the coders. It is not clear how to interpret this number, however, since if a particular unit was included only by a small amount of coders, that means that there was fairly high agreement among the other coders not to include it.</Paragraph>
      <Paragraph position="3">  Simply marking down boundary points of units would also not work well, since CGUs are allowed to be both overlapping and discontinuous. Instead, a pseudo-grounding acts scheme was induced, considering whether an utterance token begins, continues or completes a CGU. This is fueled by the observation that, while a token could appear in multiple CGUs, it doesn't generally perform the same function in each of them. This is not explicitly ruled out but does seem to be the case, perhaps with one or two exceptions. So, each token is scored as to whether or not it appeared (1) as the first token in a CGU (2) as the last token in a CGU and/or (3) in a CGU in neither the first or last position.</Paragraph>
      <Paragraph position="4"> This system seems sufficient to count as the same  all identified CGUs that are the same, and to assess penalties for all codings that differ, though it is not clear that the weighting of penalties is necessarily optimal (e.g., leaving out a middle counts only one point of disagreement, but leaving out an end counts as two, since the next to last, gets counted as an end rather than a middle).</Paragraph>
      <Paragraph position="5"> From this, it was possible to compute agreement and expected agreement (by examining the relative frequencies of these tags), and thus Kappa (Siegel and Castellan, 1988). The numbers for the group as a whole are shown in table 1 Systematic individual pairwise agreement or cluster analysis was not performed, however some of the pairwise numbers are above 0.8 for some dialogues.</Paragraph>
      <Paragraph position="6"> From this table it is clear that the ending points of CGUs in verbmobil has fairly high agreement, as does the TRAINS dialogue overall, whereas Maptask has fairly low agreement, especially for CGU beginnings.</Paragraph>
    </Section>
    <Section position="2" start_page="105" end_page="106" type="sub_section">
      <SectionTitle>
3.2 IU Coding Analysis
</SectionTitle>
      <Paragraph position="0"> IU analysis was carried out on the Toot, Trains and Verbmobil dialogues. However, as noted, only the IU analysis on Verbmobil was conducted starting with uniform IUs for all the coders. Thus, the reliability for IU coding could be quantitatively measured for the Verbmobil dialogue only. Nine coders provided IU trees starting from identical CGUs.</Paragraph>
      <Paragraph position="1"> Following the methodology in (ttirschberg and Nakatani, 1996), we measured the reliability of coding for a linearized version of the IU tree, by calculating the reliability of coding of IU beginnings using the kappa metric. We calculated the observed pair-wise agreement of CGUs marked as the beginnings of IUs, and factored out the expected agreement estimated from the actual data, giving the pairwise kappa score.</Paragraph>
      <Paragraph position="2"> Table 3 gives the raw data on coders marking of IU beginnings. For each CGU, a &amp;quot;1&amp;quot; indicates that it was marked as an IU-initial CGU by a given coder.</Paragraph>
      <Paragraph position="3"> A &amp;quot;0&amp;quot; indicates that it was not marked as IU-initial. Table 4 shows the figures on observed pairwise agreement, or the percentage of the time both coders agreed on the assignment of CGUs to IU-initial po- null sition.</Paragraph>
      <Paragraph position="4"> We calculated the expected probability of agreement for IU-initial CGUs to be P(E)=.375, based on the actual Verbmobil codings. Given P(E), kappa scores can be computed. Table 5 shows the kappa scores measuring the reliability of the codings for each pair of labelers.</Paragraph>
      <Paragraph position="5"> As the kappa scores show, there is some individual variation in IU coding reliability. On average, however, the kappa score for pairwise coding on IU-initial CGUs is .64, which is moderately reliable but shows room for improvement.</Paragraph>
      <Paragraph position="6"> By examining Table 3, it can be Seen that there was in fact always a decisive majority label for each  CGU, i.e. there are no CGUs on which the coders were split into two groups of four and five in their coding decision for IU-initial CGUs. A weaker reliability metric on the pooled data from nine coders, therefore, would provide a reliable majority coding on this dialogue (see (Passonneau and Litman, 1997) for discussion of how reliability is computed for pooled coding data). In fact, for the group of six coders who showed the most inter-coder agreement, the average palrwise kappa score is .80, which is highly reliable.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML