File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0401_metho.xml
Size: 7,741 bytes
Last Modified: 2025-10-06 14:14:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0401"> <Title>Discourse particles and routine formulas in spoken language translation</Title> <Section position="5" start_page="3" end_page="3" type="metho"> <SectionTitle> 4 Routine formulas </SectionTitle> <Paragraph position="0"> We pointed out that the particles investigated here have at least one reading in which the discourse usage is central to their usage, and not semantic contribution to propositional content. This difference points to the notion of &quot;idiomatic&quot; meaning, and -- not surprisingly -- the discourse functions introduced above can often also be realized by idiomatic phrases. Without going into detail here, we merely give a few examples, again taken from the VERBMOBIL domain. In all these cases (and many others), the &quot;literal&quot; compositional meaning is not the point of using the phrase, and they typically cannot be translated word-by-word.</Paragraph> <Paragraph position="1"> As fillers, we often find phrases like lch wiirde denken .... or Ich muff sagen, ... In English, the translation I must say, ... is not wrong but not conventionally used in this context. Similarly, the German Wenn ich da real nachsehe, ... should not be translated preserving the conditionality, hence If I look this up, ... but by the common phrase Let me see, ...</Paragraph> <Paragraph position="2"> The function check can be realized by phrases like Sehe ich das richtig? which also should not be translated literally (Do I see that correctly?) but by a conventional phrase such as Am I right? repair markers can also be phrasal, as in X, oder besser gesagt, Y or in X, nein, ich wollte sagen Y.</Paragraph> <Paragraph position="3"> Again, literal translations should give way to conventionalized English formulas, hence X, no, I wanted to say Y is less felicitous than X, no, I meant Y.</Paragraph> </Section> <Section position="6" start_page="3" end_page="7" type="metho"> <SectionTitle> 5 Towards automatic translation </SectionTitle> <Paragraph position="0"> Since the problems associated with discourse particles are largely absent when processing written language, computational linguistics has for most of its history not dealt with these problems. In SLT, however, they cannot be avoided, especially when working with a language rich in particles, such as German. Given the youth of the field, plus the fact that particles at first sight do not exactly seem to be the most important challenge for translating spoken language, it comes as no surprise that there are no satisfactory solutions in implemented systems yet.</Paragraph> <Paragraph position="1"> In the VERBMOBIL prototype that was completed last year, a number of particles are considered ambiguous between scopal/modal/focusing adverb on the one hand, and &quot;pragmatic adverb&quot; on the other. This class of &quot;pragmatic adverbs&quot; loosely corresponds to the &quot;discourse usage&quot; we have investigated above. The translation framework of VERBMOBIL is strongly lexeme-based; thus, for any particle in the German source-utterance, the transfer component seeks a corresponding English word on the basis of the reading determined. Typically, the ConEval module is asked to determine the class of a particle, wherupon transfer chooses a target word. As an exception, in some contexts a pragmatic adverb is suppressed in the translation.</Paragraph> <Paragraph position="2"> This procedure is a start, but it cannot deal with all the facets of meaning found in discourse particles, as outlined above. On the basis of corpus studies, both \[Schmitz, Fischer 1995\] and \[Ripplinger, Alexandersson 1996\] already demonstrated that many German particles have a whole range of English correspondents, of which VER.BMOBIL at present manages only very few.</Paragraph> <Paragraph position="3"> To improve the translations, for the second phase of the VERBMOBIL project we propose to build upon the framework of discourse functions. The purpose of computing discourse functions in analysis is twofold: it supports disambiguation (not only of the discourse particles, but also of the surrounding words) and computation of the dialogue act underlying the utterance; and it helps in segmentation, i.e., breaking an utterance into portions that serve as complete units for further processing. In translation, the information on discourse function is important for deciding whether to translate a particle at all, and how to do that: by inserting a corresponding target language particle, or by modifying the syntactic structure or intonation contour of the target utterance.</Paragraph> <Paragraph position="4"> Given the wide variety of information required for determining discourse functions (listed in section 2.3), the task is best performed in tandem with building up the conceptual representation of the utterance, i.e., in the ConEval module. The decision as to what discourse function to associate with a particle is seldom a strict one (not even for the human analyst). Instead, the different clues from syntax, semantics, prosodb;i and world knowledge are typically weak and have to be weighted against each other in the light of the complete utterance. Therefore, we tackle the problem with the same mechanism we use for identifying dialogue acts: a set of weighted default rules, implemented in FLEX \[Quantz et al. 1996\] as an extension to the standard description logic language. The rules are matched against the utterance representation, and the accumulated weights decide on the most likely discourse function. We are currently in the process of defining this rule set.</Paragraph> <Paragraph position="5"> The result will be more fine-grained information on discourse particles than is available now in the system. The transfer and generation modules can use the discourse function to decide whether a lexical correspondent should be produced in the target language, and if so, which one, and at what position of the utterance. Particles that are mere fillers can be removed entirely from the translation, and similarly those particles that are used to smooth the intonation contour in German. Whether restarts and self-repairs get translated or are merged into a single coherent utterance, is an open question. In many cases, it would not be difficult to replace the &quot;corrected&quot; portion of an utterance with the portion that &quot;overwrites&quot; it, thereby sparing the hearer from reworking the correction herself.</Paragraph> <Paragraph position="6"> As for routine formulas, they first of all cause the standard problems of idiomatic phrases: they need to be recognized as a single unit of meaning, so that they can be translated en bloc. This presupposes lexical representations that adequately describe the possible variants of the expression, e.g., whether additional modifiers may be inserted into a phrase, etc. When processing written language, this is difficult enough -- with speech and the additional uncertainties of word recognition, the problems are even harder. For the time being, a comprehensive treatment of routine formulas and other idioms does not seem feasible.</Paragraph> <Paragraph position="7"> Regarding the overall system architecture, the deep-analysis phase, as we have described it, need not be necessary for each and every utterance -- if the input allows for a standard transfer-based translation (e.g., because it doesn't contain ambiguous particles), that will typically be sufficient. This essentially amounts to a mixed-depth analysis in the translation process -- an important question that we cannot discuss further here.</Paragraph> </Section> class="xml-element"></Paper>