File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2107_metho.xml
Size: 11,121 bytes
Last Modified: 2025-10-06 14:14:14
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2107"> <Title>Statistical Method of Recognizing Local Cohesion in Spoken Dialogues</Title> <Section position="3" start_page="634" end_page="634" type="metho"> <SectionTitle> 2 Local Cohesion between Utter- ances </SectionTitle> <Paragraph position="0"> The discourse structure in task-oriented dialogues has two types of cohesion: global cohesion and local cohesion. Global cohesion is a top-down structured context and is based on a hierarchy of topics led by domain (e.g., hotel reservation or flight cancellation). Using this cohesion, a task-oriented dialogue is segmented into several subdialogues according to the topic. On the other hand, local cohesion is a bottom-up structured context and a coherence relation between utterances, such as question-response or response-confirmation. Different from global cohesion, local cohesion does not have a hierarchy. This paper focuses on local cohesion.</Paragraph> <Paragraph position="1"> Figure 1 shows a Japanese conversation between a person and a hotel staff member, which is an example of a task-oriented dialogue; The per-son is making a hotel reservation. The first column represents global cohesion and the second column represents local cohesion. For example, the pair of U3 and U4 has local cohesion, because it has a coherence relation for each word in the utterances as follows: c 1) speech act pattern between &quot;onegaiitashimasu (requirement)&quot;in U3 and &quot;desu (response)&quot; in U4 c2) semantic coherence between nouns, &quot;hinichi (date)&quot;in U3, and &quot;hachi-gatsu to-ka (August 10th)&quot; and &quot;ju-ni-nichi (12th)&quot;in U4 In the same way, (U4, U5) and (U5, U6) have local cohesion. Thus, U3 to U6 are built up as one structure and form a subdialogue with the topic &quot;date&quot;.</Paragraph> <Paragraph position="2"> As observed from this example, whether two utterances have local cohesion with one another or not is determined by coherence relations between the speech act types in the utterances, coherence relations between the verbs in them and coherence relations between the nouns in them. In recognizing local cohesion, our method uses these three coherence relations.</Paragraph> </Section> <Section position="4" start_page="634" end_page="637" type="metho"> <SectionTitle> 3 Our Approach </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="634" end_page="635" type="sub_section"> <SectionTitle> 3.1 Utterance Model with Local Co- </SectionTitle> <Paragraph position="0"> hesion In this paper, we approximate an utterance in a dialogue to a three-tuple;</Paragraph> <Paragraph position="2"> where SPEECH ACT is the speech act type, VERB is the ma\]-n verb in the utterance, and NOUNS is a set of nouns in the utterance' (e.g., a subject noun and an object noun for the main verb). Figure 2 shows a dialogue with our utterance model.</Paragraph> <Paragraph position="4"> As mentioned in Section 2, when the ith utterance Ui(1 <_ i < j - 1) and thejth utterance Uj in a dialogue have local cohesion with one another, (SPEECH_ACT i, SPEECH_ACT.), (VERB i, VERB.) and (NOUNS., NOUNS.) have coherence .g l .~.</Paragraph> <Paragraph position="5"> relations. Therefore, the plauslbdlty of local cohesion between U. and U. can be formally defined t j as:</Paragraph> <Paragraph position="7"> nonnegatlve weights contributing to local cohesion, and cohesion_speechact, cohesion_verb and cohesion noun are functions giving the plausibility of coherence relations between speech act types, verbs and nouns respectively. The problem of deciding an utterance that has local cohesion with U. can then be formally defined as finding a J utterance with the highest plausibility of local cohesion for Uj, which is the result of the following function: Uopt s = arg max0<i<\]_l cohesion(U i , U\] ) (3) As the first step, this paper uses only speech act types in the calculation (i.e. A t =1, J 2 = J 3 = 0). This is because the speech act types are more powerful in finding local cohesion than the verbs or the nouns as follows: rl) The speech act types are independent of domain.</Paragraph> <Paragraph position="8"> r2) The speech act types are stable, while the nouns and the verbs are sometimes omitted in utterances in spoken dialogues.</Paragraph> <Paragraph position="9"> Thus, Equation (2) is reduced to:</Paragraph> <Paragraph position="11"> In order to calculate Equation (4), two kinds of information to answer the following questions are required as discourse knowledge: q l) What expressions in an utterance indicates a speech act type? q2) What speech act pattems have local cohesion? We automatically acquire these discourse knowledge from a annotated corpus with local cohesion. According to the information, our method is composed of two processes, 1) identifying the expressions which indicate a speech act type (called speech act expressions) in an utterance and 2) calculating the plausibility of the speech act patterns by using the dialogue corpus annotated with local cohesion.</Paragraph> </Section> <Section position="2" start_page="635" end_page="636" type="sub_section"> <SectionTitle> 3.2 Identifying Speech Act Expres- </SectionTitle> <Paragraph position="0"> sions in an Utterance The first process in our method identifies the speech act expression in each of the utterances by matching the longest pattern with the words in a set of speech act expressions. The words can be collected by automatically extracting fixed expressions from the parts at the end of utterances, because the speech act expressions in Japanese have two features as follows: fl) The speech act expression forms fixed patterns.</Paragraph> <Paragraph position="1"> f2) The speech act expressions lie on the parts at the end of the utterance. For example, &quot;desu&quot; in &quot;Futari desu&quot;(U8) in Figure 1 represents a speech act type &quot;response&quot;. We call these expressions ENDEXPR expressions.</Paragraph> <Paragraph position="2"> For the automatic extraction, we use a slight modification of the cost criteria-based method \[Kita 94\], which uses the product of frequency and length of expressions, because this is easy when dealing with languages that do not use delimiters in their writings such as Japanese. Kita et al. extract expressions in the order of larger cost criteria values. We do so in the order of longer fixed-expressions with cost-criteria values above a certain threshold. For the more details of our extraction method, see \[Katoh 95\].</Paragraph> <Paragraph position="3"> When the ENDEXPR expressions, which are listed in the set of the speech act expressions (represented as Set-ENDEXPR), are defined as a symbol ENDEXPR, we can approximate the speech act types as SPEECH ACT=ENDEXPR.</Paragraph> <Paragraph position="4"> Thus, Equation (4) is transfornTed to:</Paragraph> <Paragraph position="6"> plausibility of coherence relations between the ENDEXPR expressions.</Paragraph> </Section> <Section position="3" start_page="636" end_page="636" type="sub_section"> <SectionTitle> 3.3 Calculating the Plausibility of Local Cohesion </SectionTitle> <Paragraph position="0"> The second process is to calculate the plausibility of local cohesion between utterances from the dialogue corpus using a statistical method. In this paper, we define the plausibility of local cohesion, i.e., Equation (5), as follows:</Paragraph> <Paragraph position="2"> Thefand f are modified functions of mutual information in Information Theory. We call them pseudo-mutual information. P(.) is the relative frequency of two utterances with local cohesion and P(.) is that of two utterances without local cohesion. ENDEXPR~ (~ ENDEXPR. means that ENDEXPR: appears next to ENDEXPRz. We call a series of two ENDEXPR expressions (i.e., ENDEXPR i 0 ENDEXPRi+I) an ENDEXPR bigram.</Paragraph> <Paragraph position="3"> The htrger the value of Equation (6) in two utterances gets, the more plausible the local cohesion between them becomes. For example, in applications in speech recognition, the optimal result has the largest plausibility value among several candidates obtained from a module of speech pattern recognition.</Paragraph> </Section> <Section position="4" start_page="636" end_page="637" type="sub_section"> <SectionTitle> 3.4 Smoothing Methods </SectionTitle> <Paragraph position="0"> Although the statistical approach is easy to implement, it has a major problem, i.e., sparse data problem. Indeed Equation (6) gives very small values in some cases. In order to overcome this problem (to interpolate the plausibility), we propose two smoothing techniques.</Paragraph> <Paragraph position="1"> \[Smoothing Method 1\] Interpolate the plausibility by using partial fixed expressions in Set-ENDEXPR. For example, in Japanese, &quot;itadake-masu-ka&quot; can be segmented into the smaller morpheme &quot;masu-ka&quot; or &quot;ka&quot;, and the original morpheme &quot;itadake-masu-ka&quot; is interpolated by these two morphemes as follows:</Paragraph> <Paragraph position="3"> where flo, #21 and #3t(fZo + f12~ + #31 =1 ) are nonnegative parameters.</Paragraph> <Paragraph position="4"> Formally, if we assume the partial fixed expres-</Paragraph> <Paragraph position="6"> &quot;smaller&quot; as a symbol &quot;<&quot; (e.g., &quot;ka&quot;<&quot;masuka&quot;<&quot;itadake-masu-ka&quot;), we can interpolate the original phmsibility by using these smaller morphemes: null</Paragraph> <Paragraph position="8"> where ENDEXPR\] < ENDEXPR. 2 <...< ENDEXPR. m l ! l ENDEXPR \] < ENDEXPR .2 <...< ENDEXPR.n J J J \[Smoothing Method 2\] Interpolate the plausibility by using the speech act types themselves. For example, &quot;itadake-masu-ka (requirement)&quot; and &quot;desu (response)&quot; are interpolated by the relation of the speech act types (not the speech act expressions), i.e., &quot;requirement-response&quot;, as follows: cohesion endexpr(&quot;itadake-masu-ka&quot;, &quot;desu&quot;)</Paragraph> <Paragraph position="10"> where/z 0and # 1 (# o + # l = 1 ) are nonnegative parameters.</Paragraph> <Paragraph position="11"> These speech act types are automatically constructed by clustering ENDEXPRs based on ENDEXPR bigrams, and then the type bigrams are re-calculated from the ENDEXPR bigrams.</Paragraph> <Paragraph position="12"> Formally, when the speech act types are denoted by SACT TYPE, we can interpolate an original plausibilhy by using these type patterns: cohesion_speechact_type is a function giving the plausibility of coherence relations between the speech act types.</Paragraph> <Paragraph position="13"> The former method must use the n Xm parameters (i.e., ,u ~ ) and the latter one must produce the Iq speech act types. We chose the former method for our first experiments, because it was easier to implement.</Paragraph> </Section> </Section> class="xml-element"></Paper>