File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1030_intro.xml
Size: 10,274 bytes
Last Modified: 2025-10-06 14:06:55
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1030"> <Title>Analysis System of Speech Acts and Discourse Structures Using Maximum Entropy Model*</Title> <Section position="3" start_page="231" end_page="234" type="intro"> <SectionTitle> 2 Statistical models </SectionTitle> <Paragraph position="0"> We construct two statistical models: one for speech act analysis and the other for discourse structure analysis. We integrate the two models using maximum entropy model. In the following subsections, we describe these models in detail.</Paragraph> <Section position="1" start_page="231" end_page="231" type="sub_section"> <SectionTitle> 2.1 Speech act analysis model </SectionTitle> <Paragraph position="0"> Let UI,, denote a dialogue which consists of a sequence of n utterances, U1,U2 ..... U,, and let S i denote the speech act of U. With these notations, P(SilU1,i) means the probability that S~ becomes the speech act of utterance U~ given a sequence of utterances U1,U2,...,Ui.</Paragraph> <Paragraph position="1"> We can approximate the probability</Paragraph> <Paragraph position="3"> It has been widely believed that there is a strong relation between the speaker's speech act and the surface utterances expressing that speech act (Hinkelman (1989), Andernach (1996)). That is, the speaker utters a sentence, which most well expresses his/her intention (speech act) so that the hearer can easily infer what the speaker's speech act is. The sentential probability P(UilSO represents the relationship between the speech acts and the features of surface sentences. Therefore, we approximate the sentential probability using the syntactic pattern</Paragraph> <Paragraph position="5"> The contextual probability P(Si I $1, ~ - 1) is the probability that utterance with speech act S i is uttered given that utterances with speech act $1, $2 ..... S/- 1 were previously uttered. Since it is impossible to consider all preceding utterances $1, $2 ..... Si - ~ as contextual information, we use the n-gram model.</Paragraph> <Paragraph position="6"> Generally, dialogues have a hierarchical discourse structure. So we approximate the context as speech acts of n utterances that are hierarchically recent to the utterance. An utterance A is hierarchically recent to an utterance B if A is adjacent to B in the tree structure of the discourse (Walker (1996)).</Paragraph> <Paragraph position="7"> Equation (3) represents the approximated contextual probability in the case of using trigram where Uj and U~ are hierarchically recent to the utterance U, where</Paragraph> <Paragraph position="9"> As a result, the statistical model for speech act analysis is represented in equation (4).</Paragraph> <Paragraph position="11"/> </Section> <Section position="2" start_page="231" end_page="233" type="sub_section"> <SectionTitle> 2.2 Discourse structure analysis model </SectionTitle> <Paragraph position="0"> We define a set of discourse segment boundaries (DSBs) as the markers for discourse structure tagging. A DSB represents the relationship between two consecutive utterances in a dialogue. Table 3 shows DSBs and their meanings, and Figure 3 shows an example of DS DSB 1) User : I would like to reserve a room. I NULL 2) Agent : What kind of room do you want? 1.1 SS 3) User : What kind of room do you have? 1.1.1 SS 4) Agent : We have single and double rooms. 1.1.1 DC 5) User : How much are those rooms? 1.!.2 I B 6) Agent : Single costs 30,000 won and double costs 40,000 won. 1.1.2 DC 7) User : A single room, please. I. 1 1E Since the DSB of an utterance represents a relationship between the utterance and the previous utterance, the DSB of utterance 1 in the example dialogue becomes NULL. By comparing utterance 2 with utterance 1 in Figure 3, we know that a new sub-dialogue starts at utterance 2. Therefore the DSB of utterance 2 becomes SS. Similarly, the DSB of utterance 3 is SS. Since utterance 4 is a response for utterance 3, utterance 3 and 4 belong to the same discourse segment. So the DSB of utterance 4 becomes DC. Since a sub-dialogue of one level (i.e., the DS 1.1.2) consisting of utterances 3 and 4 ends, and new sub-dialogue starts at utterance 5. Therefore, the DSB of utterance 5 becomes lB. Finally, utterance 7 is a response for utterance 2, i.e., the sub-dialogue consisting of utterances 5 and 6 ends and the segment 1.1 is resumed. Therefore the DSB of utterance 7 becomes 1E.</Paragraph> <Paragraph position="1"> analysis We construct a statistical model for discourse structure analysis using DSBs. In the training phase, the model transforms discourse structure (DS) information in the corpus into DSBs by comparing the DS information of an utterance with that of the previous utterance. After transformation, we estimate probabilities for DSBs. In the analyzing process, the goal of the system is simply determining the DSB of a current utterance using the probabilities. Now we describe the model in detail.</Paragraph> <Paragraph position="2"> Let G, denote the DSB of U,. With this notation, P(GilU\],O means the probability that G/ becomes the DSB of utterance U~ given a sequence of utterances U~, U 2 ..... Ui. As shown in the equation (5), we can approximate P(GilU~,O by the product of the sentential</Paragraph> <Paragraph position="4"> In order to analyze discourse structure, we consider the speech act of each corresponding utterance. Thus we can approximate each utterance by the corresponding speech act in the sentential probability P(Ui I Gi):</Paragraph> <Paragraph position="6"> Let F, be a pair of the speech act and DSB of U, to simplify notations:</Paragraph> <Paragraph position="8"> We can approximate the contextual probability P(GilUl.i-i, Gl.i-l) as equation (8) in the case of using trigram.</Paragraph> <Paragraph position="10"> As a result, the statistical model for the discourse structure analysis is represented as equation (9).</Paragraph> <Paragraph position="12"/> </Section> <Section position="3" start_page="233" end_page="233" type="sub_section"> <SectionTitle> 2.3 Integrated dialogue analysis model </SectionTitle> <Paragraph position="0"> Given a dialogue UI,., P(Si, Gi IUl, i) means the probability that S~ and G i will be, respectively, the speech act and the DSB of an utterance U/ given a sequence of utterances Ut, U2 ..... U~. By using a chain rule, we can rewrite the probability as in equation (10).</Paragraph> <Paragraph position="2"> In the right hand side (RHS) of equation (10), the first term is equal to the speech act analysis model shown in section 2.1. The second term can be approximated as the discourse structure analysis model shown in section 2.2 because the discourse structure analysis model is formulated by considering utterances and speech acts together. Finally the integrated dialogue analysis model can be formulated as the product of the speech act analysis model and the discourse structure analysis model:</Paragraph> <Paragraph position="4"/> </Section> <Section position="4" start_page="233" end_page="234" type="sub_section"> <SectionTitle> 2.4 Maximum entropy model </SectionTitle> <Paragraph position="0"> All terms in RHS of equation (11) are represented by conditional probabilities. We estimate the probability of each term using the following representative equation:</Paragraph> <Paragraph position="2"> We can evaluate P(a,b) using maximum entropy model shown in equation (13) (Reynar 1997).</Paragraph> <Paragraph position="4"> In equation (13), a is either a speech act or a DSB depending on the term, b is the context (or history) of a, 7r is a normalization constant, and is the model parameter corresponding to each feature functionf.</Paragraph> <Paragraph position="5"> In this paper, we use two feature functions: unified feature function and separated feature function. The former uses the whole context b as shown in equation (12), and the latter uses partial context split-up from the whole context to cope with data sparseness problems. Equation (14) and (15) show examples of these feature functions for estimating the sentential probability of the speech act analysis model.</Paragraph> <Paragraph position="6"> iff a = response and (14)</Paragraph> <Paragraph position="8"> Equation (14) represents a unified feature function constructed with a syntactic pattern having all syntactic features, and equation (15) represents a separated feature function constructed with only one feature, named Sentence Type, among all syntactic features in the pattern. The interpretation of the unified feature function shown in equation (14) is that if the current utterance is uttered by &quot;User&quot;, the syntactic pattern of the utterance is \[decl,pvd,future,no,will,then\] and the speech act of the current utterance is response then f(a,b)= 1 else f(a,b)=O. We can construct five more separated feature functions using the other syntactic features. The feature functions for the contextual probability can be constructed in similar ways as the sentential probability. Those are unified feature functions with feature trigrams and separated feature functions with distance-1 bigrams and distance-2 bigrams.</Paragraph> <Paragraph position="9"> Equation (16) shows an example of an unified feature function, and equation (17) and (18) which are delivered by separating the condition of b in equation (16) show examples of separated feature functions for the contextual probability of the speech act analysis model.</Paragraph> <Paragraph position="10"> 10 iff a = response and f(a, b) = b = User : request, Agent : ask - ref otherwise where b is the information of Ujand Uk defined in equation (3)</Paragraph> <Paragraph position="12"> Similarly, we can construct feature functions for the discourse structure analysis model. For the sentential probability of the discourse structure analysis model, the unified feature function is identical to the separated feature function since the whole context includes only a speech act.</Paragraph> <Paragraph position="13"> Using the separated feature functions, we can solve the data sparseness problem when there are not enough training examples to which the unified feature function is applicable.</Paragraph> </Section> </Section> class="xml-element"></Paper>