File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1068_metho.xml

Size: 14,017 bytes

Last Modified: 2025-10-06 14:11:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="P84-1068">
  <Title>DESIGN OF A MACHINE TRANSLATION SYST~4 FOR A SUBIASK~A(~</Title>
  <Section position="4" start_page="0" end_page="334" type="metho">
    <SectionTitle>
2. THE SUBLAN(ETAGE
</SectionTitle>
    <Paragraph position="0"> The corpus is taken from a weekly publication by the Swiss goverrm~nt announcing federal job openings. The wordload of this publication amounts to ca. I0,000 words per week; however, many of the advertisements are carried for several weeks. All job adds are published in the three national languages: German, French and Italian, with German usually serving as the source language (SL), French and Italian as the target language (TL).</Paragraph>
    <Paragraph position="1"> The study is hence based on a collection of texts already translated by human translators. The ads are grouped according to profession, e.g. academic, technical, administrative, etc. At present, the corpus is limited to the domain of administrative positions, an example of which is given in figure I.</Paragraph>
    <Section position="1" start_page="0" end_page="334" type="sub_section">
      <SectionTitle>
Verwaltungsbeamtin
Fonctionnaire d'administration
Funzionaria amministrativa
</SectionTitle>
      <Paragraph position="0"> FOhren des Sekretadates eines Sektionschefs. Ausfertigen yon Korrespondenzen und 8erichten nach Diktat und Vorlage in deutscher, franz6sischer und englischer Sprache, Abgeschlossene kaufm~nnische Lehre oder Handelsschulbildung, Berufserfahrung erwOnscht, Sprachen: Deutsch, Franz6sisch. Eng-Iisch in Wort und Schrift. Italienisch und/oder Spanisch erw0nscht. null Diriger le secr(~tariat d'un chef de section. Dactylographier de la correspondance allemande, franqaise et anglaise et des rapports sous dictee ou d'apr@s manuscrits. Certificat d'ernployee de commerce ou dipl6me d'une ecole de commerce, Exp@rience professionnelle d@sirbe. Langues: le fran~:ais, I'altemand et I'anglais parles et ~crits. Connaissances de I'italien ou de I'espagnol, voire des deux souhaitees.</Paragraph>
      <Paragraph position="1"> Dirigere il segretariato di un capo sezione. Stesura di corrispondenza e rapporti secondo dettato o manoscritto. Tirocinio commerciale o formazione commerciale. Pratica pluriennale.</Paragraph>
      <Paragraph position="2"> Lingue: tedesco, francese, inglese (orale e seritto). Buone nozioni deil'itahano e/o dello spagnolo auspicate.</Paragraph>
      <Paragraph position="3"> Figure i. Advertisement for an administrative position (&amp;quot;Die Stelle&amp;quot;, 1981).</Paragraph>
      <Paragraph position="4">  The corpus exhibits many of the textual features generally used to characterize a sublanguage, i.e. (i) limited subject matter, (ii) lexical and syntactic restrictions, and (iii) high frequency of certain constructions. AS can be seen from the example, the style of the sublanguage is distinguished by cc~plex nominal dependencies with various levels of coordination. In addition, most sentences are inoc~lete in that they consist of a series of nominal phrases and do not oontain a m~ verb; no relative phrases nor dependent clauses occur. The inportance of nominal constituents is reflected in the statistics of the German texts: over 55% of the words in the corpus are nouns, 11% adjectives, 11% prepositions, 17% conjunctions ; verbs only make up 1% of the corpus. A ccr~parison with the statistics of the French and Italian translations reveal approximately the sane distribution except for infinitival venbs. The higher frequency of verbs in French and Italian is due to a preference for infinitival phrases in place of deverbal nominal constructions. Apart from this difference, the major textual characteristics carry over from source to target sublanguage thereby facilitating mechanical translation.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="334" end_page="334" type="metho">
    <SectionTitle>
3. BRIEF DESCRIPTION OF THE SYb-i~4
</SectionTitle>
    <Paragraph position="0"> Modem transfer-based MT systems are based on the following design principles : (i) modularity, e.g. separation of linguistic data and algorithms, (ii) multilinguality i.e. independent analysis, transfer, and generation phases, (iii) formalized specification of the linguistic model (Hutchins, 1982). Although only a prototype, the system was * designed in accordance with these considerations.</Paragraph>
    <Paragraph position="1"> As to modularity, the software used is a general purpose rule-based transducer especially developed for MT (Shann, Cod%ard, 1984). This software tool not only allows for the separation of data and algorithms but also provides great flexibility in the organization of grammars and subgrammars, and in the control of the cc~putational processes applied to them.</Paragraph>
    <Paragraph position="2"> As a multilingual system it is not directly oriented towards any specific language pair; the s~ne Gem1~n analysis module serves as input for the German-French as well as the German-Italian transfer module. Separate French and Italian generation modules use only language specific knowledge to produce the final translation. However, the German analysis is indirectly influenced by target language considerations: the interface structure between analysis and transfer was defined to take advantage of the similarities between the three languages and to accommodate the differences.</Paragraph>
  </Section>
  <Section position="6" start_page="334" end_page="335" type="metho">
    <SectionTitle>
4. L~ISTIC APPBDACH: MINIMAL BUT SUFFICIENT
DEPTH
</SectionTitle>
    <Paragraph position="0"> With the sublanguage investigated displaying restricted syntactic structures within a limited semantic dcmain, a grammar specifically tailored to these job advertisements can be defined. Moreover, the linear series of nominal phrases as well as the almost one-to-one lexical equivalences found in the SL and TL texts suggest that a shallow analysis without a semantic component is sufficient for adequate translation. The flat tree representation resulting from such a minimal depth ~;Tp~oach does not make any claim to linguistic generalizability for purposes other than the translation of this particular sublanguage.</Paragraph>
    <Section position="1" start_page="334" end_page="334" type="sub_section">
      <SectionTitle>
4.1 Ccmputational considerations
</SectionTitle>
      <Paragraph position="0"> In a transfer-based MT system, actual translation takes place in transfer and can be described as the ocr~putaticnal manipulation of tree structures. In the absenoe of any formal theory of translation for MT, and given the relatively well-developed analysis techniques currently available, a major concern in Mr research is to minimize the o~n~station neoessazy in the transfer phase. A flat tree representation provides one way of simplifying the structures to be processed; an interfaoe representation defined to acocmmodate both SL and TL structures in the same manner, thus avoiding tree structure manipulation, is yet another means. The representation of the linguistic data in this system is a direct result of these two considerations.</Paragraph>
    </Section>
    <Section position="2" start_page="334" end_page="335" type="sub_section">
      <SectionTitle>
4.2 Flat trees
</SectionTitle>
      <Paragraph position="0"> The fact that the linearity of the surface structure constituents carries o~r from SL to the TLs justifies the adoption of a minimal depth analysis. The analysis is restricted to the identification of the phrasal constituents and their internal structure; dependencies holding between constituents are only partially ccr~puted. Thus, the interface structure resulting from analysis and serving as input to transfer does not reflect a linguistically correct dependency structure.</Paragraph>
      <Paragraph position="1"> Instead, the IS respects the linear surface order of the constituents (with the exception of predicate groups, see below) in a flat tree representation. null In a flat tree, the major phrasal constituents, in particular the prepositional phrases, are not attached at the node from which they depend linguistically but at specified nodes higher up in the tree. Schematically, the differences can be illustrated as follows:</Paragraph>
      <Paragraph position="3"> The flat tree representation applies to all three mjor phrasal constituents defined for this corpus: (i) nominal phrases proper, (ii) deverbal  ncminal phrases, and (iii) verbal phrases. Samples taken from the oorpus are given below to illustrate each of the three constituent structures.</Paragraph>
      <Paragraph position="4"> (i) Ncminal phrases proper b~ve a standard noun phrase as their head, possibly followed by a linear sequence of prepositional phrases. (G~ stands for both standard NPs and PPs. )  followed by a linear sequence of GNs. (F~ enccrnpasses predicative participles, predicative adjectives, and infinitival predicates; the few finite verbs in the corpus (0.4%) are not treated.)</Paragraph>
    </Section>
    <Section position="3" start_page="335" end_page="335" type="sub_section">
      <SectionTitle>
4.3 Normalized tree structures
</SectionTitle>
      <Paragraph position="0"> In order to further minimize manipulation of structure in transfer, the interface representation is also normalized for two impo~t categories in the sublanguage, narely deverbal ncminal phrases (GDEV) and noun and prepositional phrases (~N). The structures are defined such that they remain valid for both the source and target language.</Paragraph>
    </Section>
    <Section position="4" start_page="335" end_page="335" type="sub_section">
      <SectionTitle>
4.3.1 Devenbal nominal phrases
</SectionTitle>
      <Paragraph position="0"> A marked stylistic difference between the SL and the TLs occurring with high frequency in the corpus is the translation of a German deverbal noun into an infinitive in French and Italian. With the deverbal noun in Gennan usually serving as the head of a ccmplex D~minal structure with several ccsplements, the translation of the noun into an infinitive in the target language changes the type of cc~plement structure accordingly. The complete linearization of the deverbal crmplements provides a format for acccmrcdating the target language infinitival construction aimed at in translation.</Paragraph>
      <Paragraph position="1"> Structural transfer is thus reduced to renaming the nodes; the normalized tree structure remains the same, as can be seen in the SL and TL representations shown below.</Paragraph>
      <Paragraph position="2">  Certain noun phrases in German (e.g. genetive attributes) are translated into prepositional phrases in French and Italian. In order to avoid structural transfer of noun phrases into prepositional phrases and vice-versa, a normalized form for noun phrases has been defined which reserves a position in the tree for prepositions. For standard noun phrases a special value (NIL) has been defined to fill the empty preposition slot. Therefore, in the transfer phase, a translation from a noun Dhrase to a prepositional phrase or vice-versa is merely a change in the value of the prepositional slot without any change in the tree structure.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="335" end_page="335" type="metho">
    <SectionTitle>
PREP N ART GN ...
</SectionTitle>
    <Paragraph position="0"> NPs and PPs.</Paragraph>
  </Section>
  <Section position="8" start_page="335" end_page="336" type="metho">
    <SectionTitle>
4.4 CONSIDERATIONS FOR TRANSLATION
</SectionTitle>
    <Paragraph position="0"> The goal of the system, and perhaps of MT in general, has to be to carry over the information content from SL to TL, to produce output acceptable  in terms of TL conventions, and to respect the style of the text type. It seems that treating a well-defined sublanguage enhances the possibilities for an Mr system to answer these requirements. In fact, the sublanguage itself suggests possible strategies for dealing with some of the classical translation problems in Mr such as (i) lexical anbiguity, (2) translation of prepositions, and  (3) treatment of coordination.</Paragraph>
    <Paragraph position="1">  Two well-known lexical problems in computational linguistics are homograph resolution and polysemy disambiguaticn. Given the small number of possible syntactic structures in the sublanguage, the few homographs found in the corpus do not present any problems for analysis. In turn, the limited s~mantic danain of the sublanguage cc~pletely eliminates multiple word senses so that the transfer of lexical meanings is basically a one-to-one mapping. Therefore, with the nouns serving as the major carriers of the textual meaning, lexical transfer ensures that the information content of the text is carried over.</Paragraph>
    <Paragraph position="2">  The fact that the types of nouns occurring in the sublanguage are restricted and repetitive and that the possible prepositions commanded by any given noun is small in nt~nber (max. 3 in the corpus) allows the adoption of a limited noun-focused approach for the translation of prepositions. In such an approach, it is the particular noun or noun class rather than general s~mantic features that determine the translation of prepositions. At present, the info~nation relevant to correct translation of prepositions is attached to individual noun entries in the transfer dictionary; semantic noun subclassification similar to other sublanguage research (Sager, 1982) is being investigated.</Paragraph>
    <Section position="1" start_page="336" end_page="336" type="sub_section">
      <SectionTitle>
4.4.3 Coordination
</SectionTitle>
      <Paragraph position="0"> With SL and TLs exhibiting parallel surface syntactic structure, and with inherent ambiguities of scope therefore carrying over, analysis of co-ordination remains shallow. Conjunctions and intrasentential punctuation are defined functionally as coordinators to yield, in keeping with the flat tree representation, a structure such as the one shown below.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML