File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/i05-4006_metho.xml

Size: 10,154 bytes

Last Modified: 2025-10-06 14:09:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-4006">
  <Title>Construction of Structurally Annotated Spoken Dialogue Corpus</Title>
  <Section position="3" start_page="40" end_page="42" type="metho">
    <SectionTitle>
2 Spoken Dialogue Corpus with Layered
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="40" end_page="41" type="sub_section">
      <SectionTitle>
Intention Tags
</SectionTitle>
      <Paragraph position="0"> The Center for Integrated Acoustic Information Research (CIAIR), Nagoya University, has been compiling a database of in-car speech and dialogue since 1999, in order to achieve robust spoken dialogue systems in actual usage environments (Kawaguchi, 2004; Kawaguchi, 2005)} This corpus has been recorded using more than 800 subjects. Each subject had conversations with three types of dialogue system: a human operator, the Wizard of OZ system, and the conversational system.</Paragraph>
      <Paragraph position="1"> In this project, a system was specially built in a Data Collection Vehicle (DCV), shown in Figure 1, and was used for the synchronous recording of multi-channel audio data, multi-channel video data, and vehicle related data. All dialogue data were transcribed according to transcription standards in compliance with CSJ (Corpus of Spontaneous Japanese) (Maekawa, 2000) and were assigned discourse tags such as fillers, hesitations, and slips. An example of a transcript is shown in Figure 2. Utterances were divided into utterance units by a pause of 200 ms or more.</Paragraph>
      <Paragraph position="2"> These dialogues are annotated by speech act tags called Layered Intention Tags (LIT) (Irie, 2004(a)), which indicate the intentions of the speaker's utterances. LIT consists of four layers: &amp;quot;Discourse act&amp;quot;, &amp;quot;Action&amp;quot;, &amp;quot;Object&amp;quot;, and &amp;quot;Argument&amp;quot;. Figure 3 shows a part of the organization of LIT. As Figure 3 shows, the lower layered intention tag depends on the upper layered one. In principle, one LIT is given to one utterance unit.</Paragraph>
      <Paragraph position="3"> 35,421 utterance units have been tagged by hand (Irie, 2004(a)).</Paragraph>
      <Paragraph position="4"> In this research, we use parts of the restaurant guide dialogues between a driver and a human operator. An example of the dialogue corpus with LIT is shown in Table 1. In the column called Speaker, &amp;quot;D&amp;quot; means a driver's utterance and &amp;quot;O&amp;quot; means an operator's one. We used the Discourse act, Action, and Object layers and extended them with speaker symbols such as &amp;quot;D+Request+Search+Shop&amp;quot;. There are 41 types of extended LIT. Because the &amp;quot;Argument&amp;quot; layer is too detailed to express the dialogue structure, we omitted it.</Paragraph>
      <Paragraph position="5">  (I want to have a Hotpot.) 282 O hai kono tikaku desu to tyankonabe to oden kaiseki ato syabusyabu nado ga gozai masu ga.</Paragraph>
    </Section>
    <Section position="2" start_page="41" end_page="41" type="sub_section">
      <SectionTitle>
Statement Exhibit SearchResult
</SectionTitle>
      <Paragraph position="0"> (Well, there are restaurants near here that serve sumo wrestler's stew, Japanese hotpot, and sliced beef boiled with vegetables.)</Paragraph>
    </Section>
    <Section position="3" start_page="41" end_page="42" type="sub_section">
      <SectionTitle>
3.1 Dialogue structure
</SectionTitle>
      <Paragraph position="0"> In this research, we assume that the fundamental unit of a dialogue is an utterance to which one LIT is given. To make the structural analysis of the dialogue more efficient, we express the dialogue structure as a binary tree. We defined a category called POD (Part-Of-Dialogue), according to the observations of the restaurant guide task, that was especially focused on what subject was dealt with.</Paragraph>
      <Paragraph position="1"> As a result of this, 11 types of POD were built (Table 2). Each node of a structural tree is labeled with a POD or LIT. The dialogue structural tree of Table 1 is shown in Figure 4.</Paragraph>
      <Paragraph position="2"> 3.2 The design policy of dialogue structure To consider a dialogue as an LIT sequence, LIT providing process (Irie, 2004(b)) usually should be done. Furthermore, repairs and corrections are eliminated because they do not provide LIT. In this research, we used an LIT sequence provided in the corpus. After that, the annotation of the dialogue structure was done in the following way.</Paragraph>
      <Paragraph position="3"> Merging utterances: When two adjoining utterances such as request and answer, they seem to be able to pair up and merge with an  appropriate POD. In Table 1, for example, the utterance &amp;quot;Should I make a reservation?&amp;quot; (#286) is a request and the answer to #286 is &amp;quot;No, a reservation is not necessary&amp;quot;(#287). In this way, utterances are combined with the</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="42" end_page="42" type="metho">
    <SectionTitle>
POD &amp;quot;S INFO&amp;quot;.
</SectionTitle>
    <Paragraph position="0"> When the LIT's of two adjacent utterances are corresponding, these utterances are supposed to be paired and merged with the same LIT. Utterance &amp;quot;Fresh and roe&amp;quot; (#280) and &amp;quot;I want to have Hotpot&amp;quot; (#281) are related to choosing the style of restaurant and are provided with the same LIT.</Paragraph>
    <Paragraph position="1"> Therefore they are combined with the LIT &amp;quot;D+Statement+Select+Genre&amp;quot;.</Paragraph>
    <Paragraph position="2"> Merging partial dialogues: When two adjoining partial dialogues (i.e. a partial tree) are composing another partial dialogue, they are merged with a proper POD. In Table 1, for example, a search dialogue (from #277 to #285, SRCH) and a shop information dialogue helping search (from #286 to #287, S INFO) are combined and labeled as the POD &amp;quot;SLCT&amp;quot;.</Paragraph>
    <Paragraph position="3"> When the POD's of two adjacent partial dialogues are corresponding, these dialogues are merged with the same POD. Two search dialogues (one is from #277 to #282, other is from #283 to #285) are combined with the same POD &amp;quot;SRCH&amp;quot;.</Paragraph>
    <Paragraph position="4"> The root of the tree: The POD of the root of the tree is &amp;quot;GUIDE&amp;quot;, because the domain of the corpus is restaurant guide task.</Paragraph>
  </Section>
  <Section position="5" start_page="42" end_page="44" type="metho">
    <SectionTitle>
4 Construction of Structurally
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="42" end_page="42" type="sub_section">
      <SectionTitle>
Annotated Spoken Dialogue Corpus
4.1 Work environment and procedures
</SectionTitle>
      <Paragraph position="0"> We made a dialogue parser as a supportive environment for annotating dialogue structures.</Paragraph>
      <Paragraph position="1"> Applying the dialogue-structural rules, which are obtained from annotated structural trees (like Figure 4.), the parser analyzes the inputs of the LIT sequences and the outputs off all available dialogue-structural trees. An annotator then chooses the correct tree from the outputs. When  price, reservation, menu, area, fixed holiday.</Paragraph>
      <Paragraph position="2"> SLCT selecting a restaurant or parking space.</Paragraph>
      <Paragraph position="3"> SRCH searching for a restaurant.</Paragraph>
      <Paragraph position="4"> SRCH RQST requesting a search.</Paragraph>
      <Paragraph position="5"> RSRV making a reservation.</Paragraph>
      <Paragraph position="6"> RSRV DTL extracting reservation information such as time, number of people, etc.</Paragraph>
      <Paragraph position="7"> RSRV RQST requesting a reservation.</Paragraph>
      <Paragraph position="8"> the outputs don't include the correct tree, the annotator should rectify the wrong tree rewriting the list form of the tree. In this way, we make the annotation more efficient.</Paragraph>
      <Paragraph position="9"> The dialogue parser was implemented using the bottom-up chart parsing (Kay, 1980). The structural rules were extracted from all annotated dialogues. In the environment outlined above, we have worked at bootstrap building. That is, we  1. outputed the dialogue structures through the parser.</Paragraph>
      <Paragraph position="10"> 2. chose and rectified the dialogue structure using an annotator.</Paragraph>
      <Paragraph position="11"> 3. extracted some structural rules from some  dialogue-structural trees.</Paragraph>
      <Paragraph position="12"> We repeated these procedures and increased the structural rules incrementally, so that the dialogue parser improved it's operational performance.</Paragraph>
    </Section>
    <Section position="2" start_page="42" end_page="44" type="sub_section">
      <SectionTitle>
4.2 Structurally annotated dialogue corpus
</SectionTitle>
      <Paragraph position="0"> We built a structurally annotated dialogue corpus in the environment described in Section 4.1, using the restaurant guide dialogues in the CIAIR corpus. The corpus includes 789 dialogues consisting of 8150 utterances. One dialogue is composed of 11.61 utterances. Table 3 shows them in detail.</Paragraph>
      <Paragraph position="1">  I see. Please guide me there.</Paragraph>
      <Paragraph position="2"> No, reservation is not necessary.</Paragraph>
      <Paragraph position="3"> Should I make a reservation? How about this? &amp;quot;MARU&amp;quot; restaurant is suitable. I love Japanese Hotpot.</Paragraph>
      <Paragraph position="4"> Well, there are restaurant near hear that serve sumo wrestler's stew, Japanese hotchpotch and sliced beef boiled with vegetables.</Paragraph>
      <Paragraph position="5"> I want to have Hotpot.</Paragraph>
      <Paragraph position="6"> Fresh and row.</Paragraph>
      <Paragraph position="7"> Which kind do you like? Let me see.</Paragraph>
      <Paragraph position="8"> I'd like to eat sea bream.</Paragraph>
      <Paragraph position="9">  number of dialogues 789 number of utterances 8150 number of structural rules 297 utterances per one dialogue 11.61 number of dialogue-structural tree types 659 number of LIT sequence types 657</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML