XML Viewer - c82-2020

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/c82-2020_metho.xml
Size: 8,135 bytes
Last Modified: 2025-10-06 14:11:31
<?xml version="1.0" standalone="yes"?>
<Paper uid="C82-2020">
  <Title>PROPOSALS FOR A HIERARCHY OF FORMAL TRANSLATION MODELS</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PROPOSALS FOR A HIERARCHY OF FORMAL TRANSLATION MODELS
Klaus-Jurgen Engelberg
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
Sprachwissenschaft, BRD
</SectionTitle>
    <Paragraph position="0"> The present deplorable state-of-the-art in the field of machine translation seems greatly due to a fundamental laok of formal translation models needed in natural language processing. null From the methodological point of view it appears diffloult to delineate a borderline between translation theory and modern theoretical lingu~stlos (availing itself of model theoretical semantics) or full natural language understanding systems as developed in Artificial Intelligence research. It seems plausible to postulate that any prospective translation theory should draw on ideas from both fields. Unfortunately, problems discussed in painstaking detail in linguistics like differences in quantifier scope appear to be of lesser concern to a translator (since these ambiguities may well remain present in the target language) , neither seems a full or deep understanding necessary in many cases, standard syntactic phrasing may suffice. More specifically, we regard the problems of disambi~ation, mandatory insertion of lexical items not conventionally implied in the source language and coreference/anaphora resolution as the crucial problem areas of machine translation.</Paragraph>
    <Paragraph position="1"> In this paper, we will endeavour - in this preliminary draft only in a very sketchy manner - to set up a hierarchy of formal translation models ordered according to their inoreas- 90 ing systematic disambiguation power for certain types of texts.</Paragraph>
    <Paragraph position="2"> Quite analogous to comple.xity considerations in mathematics, the power of a translation system is assumed to be measured by the amount of storage needed for the lexioal component (A~-people mAght call this long-term-memory) and/or for the transient or dynamic data (short-term-memory) built up duming the interpreting process of a particular text. Any model will be capable %o translate only certain restricted types of texts in a systematic manner and with satisfactory results, but the idea i8 that any model will also contain components of lower levels of complexity. This is to make sure that in oases in which disambigustion on purely syntactic grounds is possible no such process via &amp;quot;deep&amp;quot; semantic representations will be attempted for this particular case. The rationale, of course, will be to utilize ever lexger portions of contextual (or rather co-textual) information for these ends. As the reader will notice, powerfull translation system have to incorporate more and more knowledge-of-the world into the database, as becomes apparent from the famous examples The soldiers shot the women. They fell down.</Paragraph>
    <Paragraph position="3"> Les soldate abbatirent les femmes. Ils/el~es? tomberent.</Paragraph>
    <Paragraph position="4"> Syntactic methods Level Synl : Word-to-word translation TS out for appsLrent reasons! (although a full bilingual dictionary would require a considerable amount of storage space in a oomputer) Level Sy~: Constituent preserving translation These models utilize the immediate syntactical context (e.g. valency of verbs) for dlsambiguatlon purposes. In such a system a rule may look llke</Paragraph>
    <Paragraph position="6"> At any rate, a valency oriented lexicon would be helpful in the following models, too. The search strategy would be longest match first.</Paragraph>
    <Paragraph position="7"> Level Sy~3: Tree-to-tree translation Unbouded translations allow for reordering of arbitrarily long portions of a sentence. We think it reasonable to assume that a quarter-century of Generative Grammar research in Linguistics will have produced enough theoretical and practlo.1 apparatus to deal with any type of tree-restructurlng that may be needed in direct syntactic translations between natural languages (also of. the French system GETA).</Paragraph>
    <Paragraph position="8"> Semantic methods Level Seml : Case.grammar oriented translations There are several MT systems that impose heavy restrictions on the possible arguments of verbs by encoding semantic features in the lexicon (e.g. METEO in Canada). By this, of course, disambi~ation can take place only within the limits of a single sentence or clause.</Paragraph>
    <Paragraph position="9"> Level Sere2: Translations using coherence relations The basis of this approach is the assumption that there exist finitely many determined and computable coherence relations between two subsequent sentences and/or clauses in certain types of texts. (sometime called the cohesive-ties-approach). They may be even indications of these relations at the surface level of the discours e.g. ~whereasdegsuggesting CONTRAST or degthen&amp;quot; suggesting TI~E-SEQUENCE, other relations may be ELABORATION, EFFECT, CAUSE (Hirer /1981/). Processing of these texts could be done by semantic finite state automata that would accept only highly constrained discourses in which no abrupt shifts of focus would be allowed. At last at this level of complexity It seems necessary to assume that the vocabulary should be organized - in addition to the usual lexioographlc - 92 order - as a sort of semantic network containing all types of sense relations like super-subset relation, antonymy, conver-seness, time-sequence - existing even between several places verbs.</Paragraph>
    <Paragraph position="10"> Level Se~: Translations using story trees These models dynamically build up a tree-like maorestructu~e for a text in which arbitrary deep embeddings of themes and sub-themes are represented. In this approach, coherence relations between entire portions of text or paragraphs could be established - thus allowing for ooreferen across long distances in a text (vide Rumelhart /1975/), This process may be facilitated by what Y. Wilks chose to call &amp;quot;paraplates&amp;quot; in the database.</Paragraph>
    <Paragraph position="11"> Level Sere4: Translations uslng semantic networks This model is designed for not so orderly texts as assumed in the previous levels. A semantic network as the dynsmlc macro-structure of &amp; text would allow for multiple views or thematic structures associated with a portion of a text. To make this effective, a very rloh fabric of various types of associative links would be needed in the database.</Paragraph>
    <Paragraph position="12"> Level Sere5: 1~rsmevbased translations &amp;quot;Frames&amp;quot; or &amp;quot;scripts&amp;quot; have been widely discussed in the AI con:mx~ity In the past 10 years or so. The idea seems to bee to aggregate all sorts of information objeot~oentred linked with a particular &amp;quot;stereotypical situation&amp;quot; into a structured entity - called &amp;quot;frame'. This approach would,'in principle, allow one - by default reasoning - to recover information not explicitly mentioned in the texts In particular, this may be helpful when translating into a western language from Russian, in which the deftni%e/tndefihite or known/unknown distinction in nouns is lacking. Consider the translation problems in the following example (drawing on Schank's favou~ite soript)s - 93 Petr posel v restoran. Oficiant podal emu menJu. =7 Peter went to a restattt~lt. Th_.~e waiter handed him the menus Soripts could account for associations induced by &amp;quot;spatial- temporal contiguities&amp;quot; as present in this example. Doubts as to the feasibility of MT based on frames except possibly in very restricted areas of discourse - have come from various quarters. First, the coding effort could turn out to be enormous. Second, a intricate problem seems to be how to find out which script is relevant to the current portion of text, - 94 -</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML