File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1068_metho.xml
Size: 11,265 bytes
Last Modified: 2025-10-06 14:14:39
<?xml version="1.0" standalone="yes"?> <Paper uid="P97-1068"> <Title>Improving Translation through Contextual Information</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 Ambiguity in Speech Translation </SectionTitle> <Paragraph position="0"> For any given utterance out of what we can loosely call context, there is usually more than one possible interpretation. A speaker's utterance of an elliptical expression, like the figure &quot;'twelve fifteen&quot;, might have a different meaning depending on the context of situation, the way the conversation has evolved until that point, and the previous speaker's utterance.</Paragraph> <Paragraph position="1"> &quot;Twelve fifteen&quot; could be the time &quot;a quarter after twelve&quot;, the price &quot;one thousand two hundred and fifteen&quot;, the room number &quot;'one two one five&quot;, and so on. Although English can conflate all those possible meanings into one expression, the translation into other languages usually requires more specificity.</Paragraph> <Paragraph position="2"> If this is a problem for any human listener, the problem grows considerably when it is a parser doing the disambiguation. In this paper, I explain how we can use discourse knowledge in order to help a parser disambiguate among different possible parses for an input sentence, with the final goal of improving the translation in an end-to-end speech translation system.</Paragraph> <Paragraph position="3"> The work described was conducted within the JANUS multi-lingual speech-to-speech translation system designed to translate spontaneous dialogue in a limited domain (Lavie et al.. 1996). The machine translation component of JANUS handles these problems using two different approaches: the Generalized Left-to-Right parser GLR* (Lavie and Tomita, 1993) and Phoenix. the latter being the focus of this paper.</Paragraph> <Paragraph position="4"> *The author gratefully acknowledges support from &quot;In Caixa&quot; Fellowship Program. ATR Interpreting Laboratories, and Project Enthusias~.</Paragraph> </Section> <Section position="3" start_page="0" end_page="510" type="metho"> <SectionTitle> 2 Disambiguation through Contextual Information </SectionTitle> <Paragraph position="0"> This project addresses the problem of choosing the most appropriate semantic parse for any given input. The approach is to combine discourse information with the set of possible parses provided by the Phoenix parser for an input string. The discourse module selects one of these possibilities. The decision is to be based on: 1. The domain of the dialogue. JANUS deals with dialogues restricted to a domain, such as scheduling an appointment or making travel arrangements. The general topic provides some information about what types of exchanges, and therefore speech acts, can be expected.</Paragraph> <Paragraph position="1"> 2. The macro-structure of the dialogue up to that point. We can divide a dialogue into smaller, self-contained units that provide information on what phases are over or yet to be covered: Are we past the greeting phase? If a flight was reserved, should we expect a payment phase at some point in the rest of the conversation'? 3. The structure of adjacency pairs (Schegloff and Sacks, 1973), together with the responses to speech functions (Halliday, 1994: Martin. 1992).</Paragraph> <Paragraph position="2"> If one speaker has uttered a request for information, we expect some sort of response to that -- an answer, a disclaimer or a clarification.</Paragraph> <Paragraph position="3"> The domain of the dialogues, named travel plannin 9 domain, consists of dialogues where a customer makes travel arrangements with a travel agent or a hotel clerk to book hotel rooms, flights or other forms of transportation. They are task-oriented dialogues, in which the speakers have specific goals of carrying out a task that involves the exchange of both intbrmation and services.</Paragraph> <Paragraph position="4"> Discourse processing is structured in two different levels: the context module keeps a global history of the conversation, from which it will be able to estimate, for instance, the likelihood of a greeting once the opening phase of the conversation is over. A more local history predicts the expected response in any adjacency pair. such as a question-answer sequence. The model adopted here is that of a two-layered finite state machine (henceforth FSM). and the approach is that of late-stage di.sarnbzguatlon.</Paragraph> <Paragraph position="5"> where as muci~ information as possible is collected before proceeding on to disambiguation, rather than restricting the parser's search earlier on.</Paragraph> </Section> <Section position="4" start_page="510" end_page="510" type="metho"> <SectionTitle> 3 Representation of Speech Acts in </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="510" end_page="510" type="sub_section"> <SectionTitle> Phoenix </SectionTitle> <Paragraph position="0"> Writing tile appropriate grammars and deciding on the set of speech acts for this domain is also an important part of this project. The selected speech acts are encoded in the grammar -- in the Phoenix case. a semantic grammar -- the tokens of whici~ are concepts thac the segment in question represents.</Paragraph> <Paragraph position="1"> Any utterance is divided into SDUs -- Semantic Dialogue Units -- which are fed to the parser one at a time. SDUs represent a full concept, expression, or thought, but not necessarily a complete grammati- null cal sentence. Let us take an example input, and a possible parse for it: (1) Could you tell me the prices at the Holiday Inn? ,\[request\] (COULD YOU</Paragraph> <Paragraph position="3"> The top-level concepts of the grammar are speech acts themselves, the ones immediately after are further refinements of the speech act, and the lower level concepts capture the specifics of the utterance.</Paragraph> <Paragraph position="4"> such as the name of the hotel in the above example.</Paragraph> </Section> </Section> <Section position="5" start_page="510" end_page="511" type="metho"> <SectionTitle> 4 The Discourse Processor </SectionTitle> <Paragraph position="0"> The discourse module processes the global and local structure of the dialogue in two different layers. The first one is a general organization of tile dialogue's subparts: the layer under that pro,:esses the possible sequence of speech acts in a subpart. The assumption is that negotiation dialogues develop m a predictable way -- this assumption was also made for scheduling dialogues in tile Verbmobil project (Maier, I096) --. with three ,'lear phases: mlttalizatwn, negotiation, and dosrag. \Ve will call the middle phase in our dialogues the task performance phase, since it is not always a negotiation per se. Within the task performance phase very many subdialogues can take place, such as intbrmation-seeking, decision-making, payment.</Paragraph> <Paragraph position="1"> clarification, etc.</Paragraph> <Paragraph position="2"> Disco trse processing has frequently made use of ~equeuces of speech acts as they occur in the dialogue, through bigram probabilities of occurrences. or through modelling in a finite state machine.</Paragraph> <Paragraph position="3"> (31aier. 1.996: Reithinger eta\[., t9.96: Iida and Yamaoka. 1990: Qu et al.. 1996). However. taking into account only the speech act of the previous segment might leave us with insufficient information to decide -- as is the case in some elliptical utterances which do not follow a strict adjacency pair sequence: (2) (talking about flight times...} S1 \[ can .give you the arrival time. Do you have that information already'? S2 No. \[ don't.</Paragraph> <Paragraph position="4"> $1 It's twelve fifteen.</Paragraph> <Paragraph position="5"> If we are in parsing tile segment &quot;'It's twelve fifteen&quot;, and our only source of information is the previous segment. &quot;'No. \[ don't', we cannot possibly find tile referent for &quot;'twelve fifteen&quot;, unless we know we are in a subdialogue discussing flight times, and arrival times have been previously mentioned.</Paragraph> <Paragraph position="6"> Our approach aims at obtaining information both from the subdialogue structure and the speech act sequence by modelling the global structure of tile dialogue with a FSM. with opening and closing as initial and final states, and other possible subdialoguesin the intervening states. Each one of those states contains a FSAI itself, which determines the allowed speech acts in a given subdialogue and their sequence. For a picture of the discourse component here proposed, see Figure I.</Paragraph> <Paragraph position="7"> Let us look at another example where the use of information on the previous context and on tile speaker aIternance will help choose the most appropriate parse and thus achieve a better translation. The expression &quot;okay&quot; can be a prompt for an answer (3), an acceptance of a previous offer (4) or a backchanneling element, i.e., an acknowledgement that the previous speaker's utterance has been un- null derstood (5).</Paragraph> <Paragraph position="8"> (3) $1 So we'll switch you to a double room. okay? (4) S1 So we'll switch you to a double room.</Paragraph> <Paragraph position="9"> $2 Okay.</Paragraph> <Paragraph position="10"> (5) S1 The double room is $90 a night.</Paragraph> <Paragraph position="11"> $2 Okay, and how much is a single room? In example (3), we will know that &quot;okay&quot; is a prompt, because it is uttered by the speaker after he or she has made a suggestion. In example (4), it will be an acceptance because it is uttered after the previous speaker's suggestion. And in (5) it is an acknowledgment of the information provided. The correct assignment of speech acts will provide a more accurate translation into other languages.</Paragraph> <Paragraph position="12"> To summarize, the two-layered FSM models a conversation through transitions of speech acts that are included in subdialogues. When the parser returns an ambiguity in the form of two or more possible speech acts, the FSM will help decide which one is the most appropriate given the context.</Paragraph> <Paragraph position="13"> There are situations where the path followed in the two layers of the structure does not match the parse possibility we are trying to accept or reject. One such situation is the presence of clarification and correction subdialogues at any point in the conversation. In that case, the processor will try to jump to the upper layer, in order to switch the sub-dialogue under consideration. We also take into account the situation where there is no possible choice, either because the FSM does not restrict the choice i.e., the FSM allows all the parses returned by the parser -- or because the model does not allow any of them. In either of those cases, the transition is determined by unigram probabilities of the speech act in isolation, and bigrams of the combination of the speech act we are trying to disambiguate plus its predecessor.</Paragraph> </Section> class="xml-element"></Paper>