File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/83/e83-1030_metho.xml
Size: 13,178 bytes
Last Modified: 2025-10-06 14:11:38
<?xml version="1.0" standalone="yes"?> <Paper uid="E83-1030"> <Title>A MULTILEVEL APPROACH TO HANDLE NON-STANDARD INPUT</Title> <Section position="3" start_page="0" end_page="184" type="metho"> <SectionTitle> I THE INCREMENTAL, MULTILEVEL PARSING FORMALISM </SectionTitle> <Paragraph position="0"> In recent NLU-systems a major importance is lald on processing non-standard input.l) The present paper reports on the experiences we have made in the project &quot;Procedural Dialogue Models&quot; reconstructing task~oriented dialogues, which were uttered in a rather colloquial German.2) To this aim we have developed an incremental multilevel parsing formalism (Christaller/Metzlng 82, Gehrke 82, Gehrke 83), based on an extension of the concept of cascaded ATNs (Woods 80). This formalism (see fig. A) organizes the interaction of several independent processing components, in our case 5. The processing components need not be ATNs; it is up to the user of the formalism to choose the tool for the specific task that suits her/hlm best.</Paragraph> <Paragraph position="1"> * The project is funded by the Deutsche Forschungsgemeinschaft.</Paragraph> <Paragraph position="2"> I) See e.g. session VIII in ACL 82, Carbonell 83, Kwasny 80, 'Sondheimer/Weischedel 80; for handling of ellipsis see Weischedel/ Sondheimer 82, Wahlster et al. 83.</Paragraph> <Paragraph position="3"> 2) The dialogues that we are working with were recorded in the City of Frankfurt/ Main (Klein 79).</Paragraph> <Paragraph position="4"> The first level, an ATN, is responsible for the syntactic analysis. Its main put ~ pose is to detect phrases as well as wh~ and imperative structures and to determine the syntactic status a phrase may have in the utterance. On this level the analysis of an utterance can reach a permissible final state even if there is no complete sentence structure derived. The decision, if permissible or not, is made on the pragmatic level.</Paragraph> <Paragraph position="5"> The semantic interpretation is carried out by a case-oriented production rule system. According to the incremental manner of processing there are two defini ~ tions of case slots: i. a general one for a tentative categorization of phrases before the main verb is detected, and 2. a specific one, connected with the respective verb frame.</Paragraph> <Paragraph position="6"> This double definition of case slots enables the parsing formalism to make a minimal interpretation of parts of the utterance in the case of a missing verb and thus gives suggestions for filling this gap.</Paragraph> <Paragraph position="7"> The QUESTION-ANSWER-INTERACTION~compo~ nent is an ATN. It has to categorize an utterance as a question, a part of an answer or as communication maintaining categories such as assurance, confirmation etc. This component is also responsible for recognizing a dialogue within in a dialogue when e.g. some clarification on that dialogue takes place.</Paragraph> <Paragraph position="8"> Finally the TASK-COMMUNICATION-component is itself a two-level cascade. One stage, the TASK-INTERACTION-component, provides the formalism with a dialogue scheme that presumably is applicable to most types of information-giving dialogues. The other stage, the TASK-SPECIFICA-TION-component, is responsible for the task-specific categorization, in this case direction giving with categories such as route description or place description. We divided this component into two stages which are both realized as ATNs, I. in order to have a greater modularization between different components (processing other types of task-oriented dialogues may require only to change the TASK-SPECIFICATION-component on the pragmatic level.), and 2. because each level contributes one category to the utterance or a part of it, which avoids double categorizations at one level.</Paragraph> <Paragraph position="9"> The pragmatic components are supported by knowledge sources (KS) that hold for each participant about his knowledge of the world, the partner and the course of the dialogue dependent of the task. The processing components exchange their results via a common KS (a kind of a blackboard). Only control information is transmitted by the cascade. The parsing formalism is written in MacLISP and in FLAVORS (diPrimio/Chrfstaller 83) - an object-oriented language embedded in MacLISP.</Paragraph> </Section> <Section position="4" start_page="184" end_page="184" type="metho"> <SectionTitle> II The Dialogue Corpus </SectionTitle> <Paragraph position="0"> The dialogues that we are dealing with are real task-oriented dialogues. The majority of utterances in these dialogues contain non-standard constructions or are in some sense incomplete. There are dialect words, word duplications, self-corrections and interjections. On the other hand they do not contain complicated sentence structures such as subordinations, complex noun-phrases, etc. The translation of one of our dialogues (see fig. B) may give a little impression of these non-standard features.</Paragraph> <Paragraph position="1"> An extreme approach to the solution of the problem of non-standard utterances would be, in our case, to take the dialo ~ gues in the corpus as they are as stan ~ dard. But this would only be an ad ho~ solution, lacking generality. Thus we burden the pragmatic components with the decision whether an utterance is acceptable or not.</Paragraph> </Section> <Section position="5" start_page="184" end_page="185" type="metho"> <SectionTitle> III HANDLING OF NON-STANDARDS ON THE WORD LEVEL </SectionTitle> <Paragraph position="0"> Dialect words are handled as words of the standard speech, i.e. they occur in the lexicon. Duplication of words is recognized during the read process t ~heTc~e actual word is compared with its predecessor. If they are identical and if they belong only to one syntactic category, then the next word is processed directly.</Paragraph> <Paragraph position="1"> Otherwise a flag is set, stating that there is possibly a duplication of words to analyse. Such words are analysed as usual, but the syntactic category of the predecessing word may not be used. This condition may cause a new problem, namely X: Could You please tell me, how I can come to the old opera? to y: What? X: the old opera y: to the old opera; straight ahead, yes. Come on, I show X: yes, yes (I0 sec. pause) Y: it to you. ahead to the Kaufhof. To the X: yes Y: right there is the Kaufhof, isn't it? and there you stay on the X: yes, the eh Y: right side, straight on through the Fressgass&quot; it is new X: eh mhm Y: it's just in a new shape, the Fressgass', yes then you will X: thank you Y: reach directly the opera square, that is the opera ruin. X: very much.</Paragraph> <Paragraph position="2"> Y: Fig. B: a sample translation when a participial construction occurs within a noun-phrase, e.g. &quot;die die Strasse ueberquerende Frau&quot;. Comparable to this problem are constructions in English that begin with &quot;that that ...&quot;. Luckily such constructions do not occur in our corpus , but this prob~lem has to be kept in mind.</Paragraph> <Paragraph position="3"> If the analysis runs into an error, then the status quo ante is reestablished and the actual word is dlscarded as a duplication. null Cases of self-correctlon on the word level, when a word is replaced by another word of the same syntactic category or the same word with an altered inflection, are recognized during the read process as well. They can be treated in a similar way with the difference being, that the preceeding word is discarded and the diffe * ring features of the actual word are taken but no rules are without exceptions. The rare case of two suceeding nouns, e.g. in proper names (names of streets or buildings) is captured in the lexicon, while groups of prepositions or adverbs are permissible.</Paragraph> </Section> <Section position="6" start_page="185" end_page="186" type="metho"> <SectionTitle> IV HANDLING OF INCOMPLETE UTTERANCES </SectionTitle> <Paragraph position="0"> To handle utterances that are in some sense incomplete we have the great advan ~ rage that they have been uttered in a specific context. A linguistic analysis of the dialogues shows furtheron that some types of answers, especially route des ~ criptions und partial goal determinations, have a preference for being elliptificated. In the cases mentioned the degree of elllptification ranges from omitting the facultative SOURCE case slot to omitting the AGENT case slot up to uttering only a GOAL case slot.</Paragraph> <Paragraph position="1"> Due to the incremental manner o6 par ~ sing, as soon as a partial analysis of an utterance is obtained the SEMANTIC-component is triggered. There a phrase is ten 4 tatively categorized, depending on case markers (ending, preposition); auxiliary verbs mark tense or mood, etc. Some deictic adverbs such as &quot;hier&quot; (&quot;here&quot;) could act as a SOURCE case slot for MOVE-verbs.</Paragraph> <Paragraph position="2"> Categorized phrases are sent to the QUESTION-ANSWER-INTERACTION-component. null When the end of an utterance is recognized (sentence markers; colons can act as end markers too), then the SEMANTIC-component tests for completion. If a main verb and/or a obligatory case slot is missing, then a procedure is triggered to fill this gap. This inference procedure fir~:t inspects the actual states of the pragmatic components to gather information as to which categories they expect next and wether the partial analysis fits into the requirements of the respective category.</Paragraph> <Paragraph position="3"> This information is then used by various inference rules to fix the missing verb or case slot.</Paragraph> <Paragraph position="4"> Let us consider some examples: i. &quot;vor bis zum Kaufhof.&quot; (&quot;ahead to the partial goal determination, goal declaration SEMANTIC~comp. : &quot;zum Kaufhof&quot; is care ~ gorized as a GOAL case slot.</Paragraph> <Paragraph position="5"> The categories goal declaration and place description can be discarded, because their requirements are not matched. Since an explicit goal (buil~ ding, street connection etc.) is uttered the requirements of partial goal determination are fulfilled first. This category requires a verb of the field MOVE, e.g. &quot;gehen&quot; (&quot;to go&quot;). The GOAL case slot matches one of the requirements of the verb, but an AGENT is still missing. Since the utterance is part of a dialogue and it is directed from the person, who is asked to give a direction, to that person, who had asked for the direction, a reference to the last person, &quot;sie&quot; (&quot;you&quot;), is taken as AGENT.</Paragraph> <Paragraph position="6"> 2. &quot;gradaus dutch die Fressgass'&quot; (&quot;straight on through the Fressgass'&quot;) The expectations on the pragmatic components are the same as above. &quot;dutch die Fressgass'&quot; is categorized as a PATH case slot. In this case a route description is proved first and again a MOVE-verb is taken as a candidate for the verb. The PATH case slot matches with its requirements and the adverb &quot;gradaus&quot; is a possible description of the way of MOVing. The AGENT case slot is found as above.</Paragraph> <Paragraph position="7"> 3. At last a very funny example. One of our dialogues starts with the following sequence: X: to the old opera? Y: Yes? Here Y must have recognized, presumably by eye contact, that X wants to get into contact with him. X's answer, itself a question, is quite unpollte but understandable. Syntactically this utterance is an elliptical question (voice rising, when uttered) and on the semantic stage it can be categorized as a GOAL case slot, depending on &quot;zur&quot; and the fact that the NP refers to a building. Since it is at the beginning of a task-oriented dialogue with no task fixed until now, it is categorized as a de~i.af~o~i{,'c~lo.. A complete version of this utterance may be &quot;How can, I get to the old opera?&quot; Another possible interpretation may be that X only wants to be confirmed in her/hls assumption that he/she is on the right way to his goal. In this case a correct answer would have been simply &quot;yes&quot;. But a decision which interpretation holds true can not be made with the available information.</Paragraph> <Paragraph position="8"> V Conclusion It has been shown how some types of ill4formed input are handled, especially with the help of semantic constraints and pragmatic considerations. At present, our work in this field is laid on handling selfocorrections above the word level, as you will find one in llne 5 of the sample translation.</Paragraph> </Section> <Section position="7" start_page="186" end_page="186" type="metho"> <SectionTitle> Acknowlegdements </SectionTitle> <Paragraph position="0"> I would llke to thank D. Me,zing, T.</Paragraph> <Paragraph position="1"> Christaller and B. Terwey without whose cooperation this work would not have been possible.</Paragraph> </Section> class="xml-element"></Paper>