XML Viewer - e89-1038

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/e89-1038_metho.xml
Size: 27,747 bytes
Last Modified: 2025-10-06 14:12:17
<?xml version="1.0" standalone="yes"?>
<Paper uid="E89-1038">
  <Title>A NEW VIEW ON THE PROCESS OF TRANSLATION</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 The projects involved
2.1 Eurotra-D Analysis Module
</SectionTitle>
    <Paragraph position="0"> The German analysis module of our proposed MT system is based on the Eurotra Engineering Framework (Bech and Nygaard, 1988) enhanced by a semantic component derived from systemic theory. 1 The general Eurdtra philosophy for translation is described elsewhere (Arnold et al., 1986, 1987). The essentials of the Eurotra-D approach are to be found in Steiner, Schmidt, and Zelinsky-Wibbelt (1988). The Eurotra system is a transfer-based multi-lingual MT-system.</Paragraph>
    <Paragraph position="1"> It is stratificational in the sense that analysis and synthesis proceed through two syntactic levels (configurational and functional) and one semantic level, called the Interface Structure (IS). These interface representations are semantically interpreted dependency structures; they are described in more detail in Section 3.3. Each level is defined by a level-specific grammar and a lexicon. The connection between adjacent levels is established with translator-rules which define a tree-to-tree mapping between level representations. The main operation involved in the mapping is unification, i.e. the unification between already built objects and rules. Transfer between two languages takes place as a translation between the interface level representations of the source language (SL) and the target language (TL).</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Penman Generation Module
</SectionTitle>
      <Paragraph position="0"> The English generation component of our proposed MT system is Penman (Mann, 1983). Penman has been designed to be a portable, reusable text generation facility which can be embedded in many kinds of computational systems. The linguistic core of Penman is Nigel (Mann and Matthiessen, 1983), a large systemic-functional grammar of English based on the work of ttalliday (1985) with contributions made by several other systemic linguists. Nigel is a large network of interdependent points of minimal grammatical contrast, called systems. Each of these systems defines a collection of alternatives called grammatical features. The semantic interface of the Nigel grammar is defined by a set of inquiries that control choices of grammatical features by mediating the flow of information between the grammar and external sources of information.</Paragraph>
      <Paragraph position="1"> Penman also provides structure for some of these external sources of information, including a conceptual hierarchy of relations and entities, called the upper model. The upper model is typically used to mediate between the organisation of knowledge found in an application domain and the kind of organisation that is most convenient for implementing the grammar's inquiries. We have made crucial use of the upper model in constructing our combination of the two components. In effect, the upper model can often mediate between the results of the MT analysis, expressed in ET-D Interface Structures, and the input that must be specified for Penman, expressed in the Penman Sentence Plan Language (SPL) (Kasper, 1989). Each of these information sources, the upper model, the Penman SPL, and the ET-D Interface Structures will now be described in detail.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Components of the
German-English Interface
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Penman's Upper Model
</SectionTitle>
      <Paragraph position="0"> Perhaps the crucial task for text generation is to be able to control linguistic resources so as to make the generated text conform to what is to be expressed.</Paragraph>
      <Paragraph position="1"> In Penman this is the responsibility of the grammar's inquiry semantics. Furthermore, a large subset of Penman's inquiries are taxonomic. These relate particular instances of what is to be expressed to the categories of semantic organisation that the grammar's semantics requires. These categories, and the relationships among them, constitute the upper model.</Paragraph>
      <Paragraph position="2"> The upper model serves to organize the propositional content that needs to be expressed in text; in systemic-functional linguistics, this range of meaning is called ideational. Many ideational.inquiries can be expressed in terms of the classifications of concepts that the upper model provides. These classifications form an inheritance hierarchy that organises concepts according to how they may be expressed in English.</Paragraph>
      <Paragraph position="3"> Thus, when an application domain for which Penman is to generate language connects its concepts to those of the upper model, a single inheritance hierarchy is formed from which the grammar's inquiries can determine information about how any particular domain concept may be expressed in English. We refer to this single inheritance hierarchy formed from the application domain model and the upper model as the combined model. Inquiries that need to determine whether an application domain model concept belongs to the class defined by some upper model concept can then rely on simple inheritance inferences. For example, this type of inference allows Penman to ascertain that a domain entity is a process, rather than an object, and so should be expressed as a verb rather than as a nominal phrase. Much finer distinctions are drawn by the actual upper model, which currently contains approximately 200 concepts.</Paragraph>
      <Paragraph position="4"> By virtue of their positions in the inheritance hierarchy, entities in the combined model also inherit roles from their ancestors. These can serve to define, for example, the types of participants that processes may have, or the types of qualities that may be ascribed to particular objects. Both inheritance of class membership and of roles find significant use in the construction and interpretation of expressions in the Penman interface notation SPL.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Penman Interface Notation -
SPL
</SectionTitle>
      <Paragraph position="0"> Penman accepts demands for text to be generated in the SPL notation. SPL expressions are lists of terms describing the types of entities and the particular features of those entities to be expressed in English. The types of SPL terms are interpreted with respect to the knowledge base of general conceptual categories defined in the upper model. When the concepts of Penman's upper model are instantiated by more specific concepts from an application program's knowledge base (i.e. world knowledge specific to the domain of the application), then application concepts can be used directly in the SPL expression. The features of SPL terms are either semantic relations to be expressed, drawn from the relations\]roles defined by the combined model or direct specifications of responses to Penman's inquiries. This latter possibility provides for the input of information from other sources of knowledge known to be necessary for controlling generation, e.g. text planning information and speaker-hearer models. These types of meaning fall outside the kind of taxonomic, 'ideational' meanings defined in the upper model and so require separate treatment. Currently we specify information of this type as direct responses to Penman's inquiries since the inquiries are not limited to ideational meanings.</Paragraph>
      <Paragraph position="1"> SPL representations as a whole are used as input spec- null a trailing position in the industrial application of many high technologies. - 284ificstions by Penman's inquiries and hence are able to drive sentence generation in a way that is fully responsive to required communicative goals.</Paragraph>
      <Paragraph position="2"> An example of an SPL specification for a sentence is shown in Figure 1. 2 In this expression we can see a collection of SPL variables (HI, N2, N3, N4... ) which have types drawn from concepts and relations of the combined model for English and German described in the next section; these types include g-associative, eu.</Paragraph>
      <Paragraph position="3"> tops, Rackstand and Anwenden. The semantic relations to be expressed and direct inquiry responses are prefixed with a colon; e.g. :speech.act, :identifiability. q, and :g-affected- those ending with -q and .id denote Penman inquiries.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Eurotra-D Interface Represen-
</SectionTitle>
      <Paragraph position="0"> tations The Eurotra-D interface representations (ET-D IS) are semantically interpreted dependency structures. They represent dependency relationships between constituents by structural embedding, and additional linguistic information in their feature structures, including semantic relations and semantic (lexical) features, such as time, diathesis, modality, mood, topic, focus, determination and number. An example of an IS-representation is given in Figure 2. In this representation we can see at the topmost node the features s-TENSE and s_ASPECT which are used to compute the appropriate time information for the SPL expression. The German simul/durative ('present') has to be expressed in English with a 'present perfect' construction. The feature nclass--proper is responsible for the fact that in the SPL expression we can simple use the keyword macro :name which indicates any proper-noun lexical item. The features d.is\]rame and argi, 1 &lt; i &lt; 4, are used to determine the process type (gassociative) and its roles (g-attribuant, g-associated). The feature g.scope in the SPL representation is inserted from the IS feature d_pform=in of the NP governed by Anwendung.</Paragraph>
      <Paragraph position="1"> These features axe referring to categories of an upper model that we have constructed for German (UM~); the UMG is essentially a re-expression of the transitivity relations worked out in Fawcett (1987).</Paragraph>
      <Paragraph position="2"> :Just as for the Penman upper model for English, which we shall now label UME, the German upper model is not a representation of a particular sentence: it is a representation of concepts into which IS roles and roleconfigurations are mapped. The UMG concepts then 2This specification shows the finest level of detail of grammar control that may be given in an SPL expression.</Paragraph>
      <Paragraph position="3"> In practise, when using SPL it is possible to abbreviate or to default commonly used combinations of inquiry responses; thus, for example, it is possible to replace all of the :speech-act, :speech-act-id, and :e~ent-time features shown in Figure 1 with the more coarsely-grained, specification :speech-act ~sssrt :tense present-in-past For more details see Kasper (1989).</Paragraph>
      <Paragraph position="4"> also stand in inheritance relationships to each other.</Paragraph>
      <Paragraph position="5"> Furthermore, a concept in UM~ may have slots (roles) which can be filled by other concepts, of specified types (role restrictions). Roles of the German IS grammar are linked to concepts of UM~ through the specialize predicate. When an IS is expressed in an SPL representation, the roles (st features) of IS are mostly substituted by the corresponding UMG concepts.</Paragraph>
      <Paragraph position="6"> Roles as well as features of IS may also be mapped into inquiry responses during transfer into SPL, as described in Section 4.3. The fact that for the time being the UMG is almost isomorphic to a representation of the predicate-argument part of the German IS grammar is more due to time constraints than to any far reaching claims about the mutual relationships between an IS and an Upper Model, although the nature of that relationship is interesting and is receiving study in its own right.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 The Nature of the Transla-
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
tion Process
</SectionTitle>
      <Paragraph position="0"> It is important from a conceptual point of view to keep apart the three levels of representation involved here: ET-D IS, UM~, and a description of the German sentence in SPL. The basic form of the translation process is to transfer ET-D IS representations into Penman SPL representations. As ET-D IS and Penman SPL representations are both feature-based dependency structures, the formal aspects of the transfer from ET-D IS into Penman SPL are not very complicated. Determining an appropriate mapping for the content of particular values within ET-D IS representations is by far a more challenging aspect of this translation process.</Paragraph>
      <Paragraph position="1"> The translation process is achieved by employing three principal levels of transfer, which are described in detail below. The product of this multi-level transfer is an SPL representation of the English translation of the original German sentence, which may then drive generation by Penman as in any other application domain. The translation process as a whole is summarised in Figure 3. The general strategy of this translation process should also generalise to future applications in a multi-lingual MT environment.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Upper model transfer
</SectionTitle>
      <Paragraph position="0"> Preparatory to being able to transfer IS representations into corresponding SPL expressions for German sentences, a mapping needs to be established between the categories of UMo and appropriate categories of Penman's English specific upper model (UME). As an initial approximation, and one which makes maximal use of mechanisms already developed for driving Penman, we take the concepts of UMG as specialis.</Paragraph>
      <Paragraph position="1"> ing the concepts of UM~. This mapping only needs</Paragraph>
      <Paragraph position="3"> {C/at=s, s_TENSE-s imul, s_ASPECT--durat ire, stype-main, d_vf orm=fini~ e, d_diath=aet } {cat-v, vfeat-stat ,roleffigov ,nb=sing ,humarg2ffinonhum ,humargl=hu,., ers_frame=cOcl, d_moodlindicative, d_lu=haben, d_is_rno--r I, d_isframe=arg12, arg2=associat ed, argl=attr, abstrarg2=abstr, abstrargl-abstr} {cat--np, whfno, sr=attr, role=argl ,nb=sing, msdefs=msabs, index~9, hum~hum, d_gender=neut er , cs=no , argtypeffull, abstr=abstr} {cat=n, wh--no, role=gov, nform=full, nclass=proper, nbffising, humfhum, ere _frame=null, d_lu=europa, d_is_rno=r i, d_isframe=argO, d_gender--neuter, count--mass, abstr=abstr} {cat--np, gh=no, sr=assoc iated, ro le=arg2, nb=s ing, msdefsffiqns indef, index=22, hum--nonhum, dem=no, cs--no, argtype=fu11, abstr=abstr}  {cat--n, eh=no, role=gov, nform=full, nclass=common, nb=sing, hum=nonhum, ere _frame=c4, d_pformargl=in, d_lu=rueckst and, d_is_rno--r i, d_is frame=arg 1, count=mass, abstr=abstr} {catffinp, sh--no ,role=argl ,nb=sing ,msdefs=msdef, index=20, hum=nonhum, d_pform=in ,d_gender=fem, dem--no, cs--no, argt ype--full, abstr=abstr} {cat =n, gh=no, role=gov, nf orm---f ull, nclass=common, nb=sing, hum--nonhum, ere_frame=c2, d_pf ormarg3=durch, d_pf ormarg2=auf, d_morphsrce=deverb, d_lu=angendung, d_is_rno--r I, d_isframe=arg123, d_gender=f era, count=mass, abstr=abstr} {cat=np, wh--no, role=argl, nb=plu, medefs=msabs, index = 17, hum=nonhum, d_gender=fem, cs--no, argtypeffull, abstr=abstr} {catffin, wh=no, role=gov, nform--full, nclass=common, nb=plu, hum--nonhum, ers _frameffinull, d_lu=spit z ent echnologie, d_is_rno=r i, d_isframe=argO, d_gender=f em, count=count, abstr=abstr} {cat=ap, role=mod, nb=plu, msdefs=msabs, d_gender=fem} {C/at=adj, role=gov ,nb=plu, ere_frame--null, d_lu--viel, d_isframe=argO, d_gender=f em, deg=base} {cat=ap, role=mod ,nb=sing ,msdefs=msdef, d_gender=fem} {cat=adj ,role=gov ,nb=sing, ere_frame--null, d_lu=indus~riell, d_isframe=argO, d_gender=f era, degfbase} ~cat =pp ,role=rood, top=yes, index=8 } {c at=p, role=gov, ers_frame=comp, d_lu=seit, d_isframe~argl} {cat=np, .h=no ,role=argl ,nb=sing, msdef s---metier, index=7, humffinonhum, d_gender=f em, dem--no, cs=no, argtypeffull, abstr=abstr} {C/atffin, whffino ,rolefgov ,nf ormffull ,nclass=common, nbfs ing ,hum--nonhum, ere_frame--null, d_lu=wiederaufbauphase, d_is_rno=r I, d_isframe=argO, d_gender=f em, count=mass, abstr=abstr} {cat =pp ,rol efmod, index=5} {cat=p ,rolefgov, ers_framefcomp, d_luffinach, d_isframefargl} {ca~=np, gh--no, role=argl, nb=sing, msdef s--msdef, index=4, hum--nonhum, dem=no, cs--no, argt ype=full,</Paragraph>
      <Paragraph position="5"> to be defined once, it is then available for all IS representations that need to be transferred. Translation of UMa categories (and hence, indirectly, of the IS semantic features) subsequently takes the form of inferencing over the inheritance relationships in the combined UMc&amp;UME model. This is the standard way in which the general grammatical resources of Penman are made responsive to knowledge from particular application domains. Here, the German upper model is simply being made to play the role of a Penman appllcation domain.</Paragraph>
      <Paragraph position="6"> Let us give an example of this type of transfer.</Paragraph>
      <Paragraph position="7"> In the example sentence whose IS representation was shown in Figure 2, we have the prepositional phrase Seit der Wiederaufbauphase .... Seit as a German preposition in one of its readings is linked into UMo as a concept that specializes a more general relation 'g-spatio-temporal' in UMG. The UMa 'g-spatio-temporal' is further linked, by the preparatory mapping already defined between the English and German upper models, to a UME concept 'static-spatial' and this UME category guides the responses to Penman's inquiries to consider all the grammatical constructs and lexical items of English that Nigel has available for realizing this concept. In particular, one of the English realizations may be the English preposition since, which is thus one candidate for an acceptable translation. Because the prepositional phrase is a modifier of the main process (indicated by the role feature and the fact that the main process and the modifier are siblings in the IS representation) we have to use in SPL a ':relations' construct to state this dependence.</Paragraph>
      <Paragraph position="8"> In SPL this is a special keyword which is used for information that does not determine a unique inquiry response without reference to other contextual information. null Apart from the specific example given here, the translation through the UMa&amp;UM~ combination opens the way to relatively free, but still acceptable translations, and thus provides the framework for discussing the notion of an acceptable translation, as different from, say, a simple paraphrase. Note, in particulax, that syntactic category need not be preserved in this translation process, which is important for the translation of, say, relative clauses in German into NP or PP modifiers in English, translation of pre-modifiers of German into post.modifiers in English etc. - all of which are classical translation problems between these and other languages.</Paragraph>
      <Paragraph position="9"> &amp;quot;At present, lexical transfer is also largely handled as a side-effect of transfers of this type.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Semantic feature transfer
</SectionTitle>
      <Paragraph position="0"> Semantic features of the ET-D IS representation may also be transferred into sets of Penman inquiry responses. This type of transfer is used for semantic information of kinds not approI)riate for inclusion in an upper model, e.g., textual organisation information, non-hierarchical conceptual information and speech act information. Penman has a rich variety of inquiries dealing with such information and so makes available a large set of resources and capabilities for any system that requires English as output.</Paragraph>
      <Paragraph position="1"> Information of these kinds is notoriously difficult for the usual types of syntactic transfer strategies. Determiner selection, and, in particular, correct translation of the indefinite and definite articles are another case of this. For example, the IS semantic features representing determination are translated into the inquiry responses that are responsible for controlling determiner selection in Nigel as follows: {def =</Paragraph>
      <Paragraph position="3"> Thus, the features expressing definiteness in IS are mapped into inquiry responses giving information about whether a given phrase is identifiable; those features expressing number are mapped into responses concerning whether the concept is to be expressed as a single entity or as several distinct entities. These are some of the semantic dimensions around which NigeI organises the selection of determiners and quantifiers in English (for a fuller account of Nigel's treatment, see: Bateman and Matthiessen, 1988; also, for an account of the ET-D approach, see: Steiner, Winter and Zellnsky-Wibbelt, 1987). It is this level of information at which meaning is preserved in translation, and not the syntax:tic level of determiner selection; this is dearly shown by the fact that translation between languages with and without articles is possible.</Paragraph>
      <Paragraph position="4"> Another area which is translated in this way in the present system is the area of time. Both the Eurotra appr~ch to time (cf. van Eynde, 1988) and the Nigel approach (cL Matthiessen, 1984) grew out of a critical appraisal of the Reichenbachian framework, although they took quite different directions from there, with Mar~hiessen following essentially SFG lines. Still, enough common ground has been preserved in order to make a transfer of ET-D time features (i.e. semantic), rather than tense features (morpho-syntactic), an interesting and possible enterprise. Tenses encode complex relationships between time of speaking, reference time, and time of event, in interaction with Adverbiais in particular, and it is only with the help of a type of transfer that gives access to this level of detail that we can arrive at the English 'present perfect tense' as a translation of the German 'present' plus a time adverbial. For example, in Figure 1, we can see the inquiry responses under the features :speakingtime-id and :event.time that convey this information to Nigel. These are the results of interpreting the features s.TENSE and s-ASPECT in the IS representation shown in Figure 2. While we are not claiming that a direct mapping of tenses into tenses in SL-TL transfer is necessarily impossible, it would seem considerably more complex and translationally implausible than encoding the meaning expressed by tenses, as we have done here in terms of inquiry responses.</Paragraph>
      <Paragraph position="5"> - 288 -</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Morpho-syntactic transfer
</SectionTitle>
      <Paragraph position="0"> It is also possible for morpho-syntactic features of the ET-D IS representation to be directly translated into corresponding grammatical features of the Nigel grammar; e.g. ET-D active/passive to Nigel activeprocess/passive-process. This type of transfer is very close to the idea of IS =~ IS transfer in Eurotra, but is used sparingly in the present application. Most of the morpho-syntactic features present in the IS representations do not need to be used directly since the semantic features give sufficient and more appropriate information for translation.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Perspectives for MT and
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Text Generation
</SectionTitle>
      <Paragraph position="0"> Combining the resources of the ET-D German analysis component with the Penman English generator has created an interesting research environment for asking questions about transfer strategies in MT. As is well known, the transfer process in an MT environment places complex requirements on both the linguistic theories involved and on the theories of translation.</Paragraph>
      <Paragraph position="1"> Perhaps the most refreshing aspect of the endeavour has been the new perspective which one gets on old problems, which suddenly seem to lose the air of having a range of often tried and well known, but essentially unsatisfactory solutions.</Paragraph>
      <Paragraph position="2"> One whole class of questions relates to what should be preserved in a translation process, as different from, say, processes of paraphrasing or summarising. One possible answer to this is that what needs to be preserved at least is the truth value of sentences and their translations. While this may serve as a useful bottom line from which to start, it has long been recognised to be no more than that. Many researchers argue that we also need to preserve the essential features of thematic structure and information structure. For most projects at this time, this problem is difficult to address because the linguistic models embodied in them do not foreground that type of information, ttowever, with ET-D's interest in topic and focus, and with Nigel's fairly comprehensive treatment of theme, there is a very immediate way of making these aspects of linguistic information an accessible part of the translation process. In the translation pair represented by Figures 1 and 2, for example, we can see that the IS s~mantic feature top=yes indicating thematic prominence have been transferred into the inquiry response specification :circumstantial-theme-q(S9 H1) context.</Paragraph>
      <Paragraph position="3"> This calls for the grammar to prepose the constituent realising $9, i.e. the Since-clause, into sentence-initial thematic position, rather than letting it appear later in the sentence as it would when non-thematic.</Paragraph>
      <Paragraph position="4"> The function of predicate-argument structures, especially in connection with semantic casls is another interesting research topic (as suggested by Somers (1986) which can be addressed in the present context, especially as the two components involved share their essential notions of predicate-argument structures from systemic linguistics.</Paragraph>
      <Paragraph position="5"> Our first translations in this research environment are still sentence-based; however, in the longer term we will concentrate our research interests on issues concerning text structure. The Penman group intends to enhance the Penman environment to the interpersonal and textual metafunctions of SFG. Although these extensions will be made primarily for text generation they should be of interest also for the design of a text-based MT-analysis.</Paragraph>
      <Paragraph position="6"> In summary, then, we have introduced the projects involved, and the structure of the German-Engllsh transfer mechanism, offering specific examples of the transfer process for some of the features present in the IS analysis.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML