XML Viewer - c90-3001

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/90/c90-3001_intro.xml
Size: 19,314 bytes
Last Modified: 2025-10-06 14:04:53
<?xml version="1.0" standalone="yes"?>
<Paper uid="C90-3001">
  <Title>ing translations:</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Strategy for Machine Trans-
</SectionTitle>
    <Paragraph position="0"> lation with LTAGs The idea of using grammars written with &amp;quot;lexicalist&amp;quot; formalisms for machine translation is not new *This research was partially ftmded by ARO grant DAAG29-84-K-0061, DARPA grant N00014-85-K0018, and NSF grant MCS-82-19196 at the University of Pen nsylvania. We are indebted to Stuart Shieber for his valuable comments. We would like also to thank Marilyn Walker.</Paragraph>
    <Paragraph position="1"> 1 In tlfis volume.</Paragraph>
    <Paragraph position="2"> and has been exemplified by Kaplan, et al., (1989) for LFG, Beaven etal. for UCG (1988), Dorr for GB (1989) and Arnold et al. for Eurotra (1986). tIowever, our approach is more radical in the sense that we associate with the lexical items structures that localize syntactic and semantic dependencies. This allows for the possibility that an explicit semantic representation level can be avoided. 2 The claims about the advantages of an explicit semantic representation level need to be investigated again in the context of the approach proposed here. For examples, many traditionally difficult problems for machine translation due to different divergence types (Dorr 1989) such as categorial, thematic, conflational, structural and lexical are not problems in the approach we suggest, Also contrary to UCG, but like LFG, we use grammars that have not been designed for the purpose of translation.</Paragraph>
    <Paragraph position="3"> The underlying formalism achieving the transfer of derivations is &amp;quot;Synchronous Tree-Adjoining Grammars&amp;quot; (as described in a companion paper by Shieber and Schabes \[1990\]). ~ The strategy adopted for machine translation consists of matching the source LTAG derivation of the source sentence to a target LTAG derivation by looking at a transfer lexicon.</Paragraph>
    <Paragraph position="4"> The transfer lexicon puts into correspondence a tree from the source grammar instantiated by lexical insertion (all its nodes and their attributes) with a tree from the target grammar. Although the approach is not inherently directional, for convenience we will call the English and French grammars, the source and target grammars.</Paragraph>
    <Paragraph position="5"> The translation proces.s consists of three steps in which the generation step is reduced to a trivial step. First the source sentence is parsed accordingly to the source grammar. Each elementary tree in the derivation is now considered with the features given from the derivation through unification. Second, the source derivation tree is transferred to a</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2The formalism of Synchronous Tree-Adjolning Grammar
</SectionTitle>
      <Paragraph position="0"> does not prevent constructing an explicit semantic representation. In fact, in Shieber and Schabes (1990) it is shown how to construct a semantic representation, which itself is a TAG.</Paragraph>
      <Paragraph position="1"> 3We assume that the reader is familiar with Tree Adjoining Grammars. We refer the reader to Joshi (1987) for an introduction to TAGs. We also refer the reader to the companion paper for more details on synchronous TAGs.</Paragraph>
      <Paragraph position="2"> 1 1 target derivation. This step maps each elementary tree in the source derivation tree to a tree in the target derivation tree by looking in the transfer lexicon. And finally, the target sentence is generated from the target derivation tree obtained in the previous step.</Paragraph>
      <Paragraph position="3"> As an example, consider the fragment of the transfer lexicon given in Figure 1.</Paragraph>
      <Paragraph position="5"> Fragment of the English-French transfer The transfer lexicon consists of pairs of trees one from the source language and one from the target language. Within the pair of trees, nodes may be linked (thick lines). Whenever in a source tree, say Got, roe, adjunction or substitution is performed on a linked node (say nso~ is linked to n,~,t), the corresponding tree paired with tsouree, ttaraet, operates on the linked node ntaraet. For example, suppose we start with the pair 7 and we operate the pair a on the link from the English node NPo to the French node NPI. This operation yields the derived pair</Paragraph>
      <Paragraph position="7"> Then, if the pair/3 operates on the NP1-NPo in ~1, the following pair ~u is generated.</Paragraph>
      <Paragraph position="9"> Finally, when the pair 6 operates on the S-S link in a~, the pair a3 is generated.</Paragraph>
      <Paragraph position="10">  The source sentence is parsed accordingly to the source grammar, then the target derivation is generated by tracing the pairs stated in the transfer lexicon. The fragment of the transfer lexicon given in Figure 1 therefore enables us to translate: Apparently, John misses Mary Apparemment, Mary manque ~ John In most cases, translation can be performed incrementally as the input string is being parsed. The aim of this paper is to show that LTAG's localization of syntactic dependencies (such as filler-gap), as well as semantic dependencies (such as predicatearguments) combined with the lexiealized property of LTAGs are especially attractive for machine translation. null We show how the transfer lexicon is stated. We motivate the need for mapping trees instantiated with words and with the value of their features obtained from the derivation tree corresponding to the parse of the source sentence. We also show that the transfer needs to be stated at different levels: matching tree families (trees associated to the same predicate), trees, nodes and therefore their attributes, since they are associated with a node. We show how not only subcategorization frames but also adjuncts are transferred, and how differences of syntactic and semantic properties are accounted for ill terms of structural discrepancies. Then we illustrate how the extended domain of locality enables us to deal with these structural discrepancies in the process of machine translation.</Paragraph>
      <Paragraph position="11">  The transfer is stated between the English and French LTAG grammars in a lexicon. We rely on grammars built from a monolingual perspective, but the match between them can be one to many, or many to one.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Matching elementary trees
</SectionTitle>
      <Paragraph position="0"> Instead\[ of matching words, we match structures in which words have been already lexically inserted.</Paragraph>
      <Paragraph position="1"> This provides interesting disambiguations that could not be obtained by a morphological match. For example, there is one morphological English verb leave, but the structures associated with it disambiguate it between intransitive and transitive leave. Interestingly, these two predicates receive two different French translations: 4</Paragraph>
      <Paragraph position="3"> The pairs a4 and c~5 will correctly give the following translations: Sohn John John l_efi Mary *-~ John a quitlg Mary By convention, in the elementary trees, the set of morphological flexions of a given word is written surrounded by baekslashes. For example, \leave\ stands for {leave, leaves, left, ...}. For each word in a morphological set attributes (such as mode and agreement) are also specified. When a word in a tree is not surrounded by backslahes, it stands for the infleeted form and not for a morphological set.</Paragraph>
      <Paragraph position="4"> Since lexieal items appearing in the elementary structures can be inflected words or a morphological set, lexieal items of the two languages are matched regardless of whether they exhibit the same morpho4We use standard TAG notation: '1' stands for nodes to be substituted, '*' annotates the foot node of an auxiliary tree and the hadices shown on the nodes correspond to semantic functions. The trees are combined with adjunction and substitution. null Our approach does not depend on the specific representation adopted ha this paper. See Abeill6 1990 (b) for an Mternate representation.</Paragraph>
      <Paragraph position="5"> logical variations or not. For example, English adjectives lacking morphological variation appear as such in the syntactic and transfer lexicons, while their French counterparts are usually morphological sets.</Paragraph>
      <Paragraph position="6"> The word white is thus matched with \blanc\, standing for {blanc, blanche, blancs, blanches).</Paragraph>
      <Paragraph position="7"> Words that are not autonomous entries in the English syntactic lexicon (ex: complementizers, light verbs or parts of an idiomatic expression), are not considered as autonomous entries in the transfer lexicon; for example, no rule needs to match directly take or pay with faire, or give with pousser, in order to get the right light-verb predicative noun combinations in the following sentences: 5 John t.ool~ a walk John a fail une promenade (Danlos 1989) John pays court_ t deg Mary ~, John fait ia court d Mary (Danlos 1989) John ~ ~-~ Jean a poussd un cri Some words existing as autonomous entries in the English syntactic lexicon do not appear as entries in the transfer lexicon because their French counterpart is a morphological flexion, not a word. For example, the future auxiliaries will or shall are not translated as such. The tense feature they contribute is transferred (as well other syntactic features) and the future tense French verbal form will be chosen.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Matching nodes
</SectionTitle>
      <Paragraph position="0"> Matching predicates of the two languages as a whole is not sufficient. Correspondences between their arguments must be stated too as shown in the following  These examples also show that it is not correct to match trees where lexical insertion has not already been made and therefore the correspondences between nodes cannot be made on the only basis of the subcategorization frame.</Paragraph>
      <Paragraph position="1"> Arguments are matched directly by the links existing between them. Adjuncts are matched indirectly by the links existing on the nodes, at which they adjoin. For example, in the following correspondence,</Paragraph>
      <Paragraph position="3"> the AP node in the English tree is linked to the V node of the French tree to account for: John is fond of music John aime ia mnsique John is very fond of music John aime beaucoup la musique The adjective fond is associated with an AP-type auxiliary tree which is paired with a V-type auxiliary tree corresponding to the word beaucoup.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Matching feature structures
</SectionTitle>
      <Paragraph position="0"> Some feature structures of the words appearing in the trees are transferred in the translation process, but with the value further specified from the derivation (and not with the one from the lexical entry which may not be as specific). For example, fish can be either singular or plural and is therefore stated as such in the lexicon. However, it can get its number from the verb-subject agreement constraints, as in the following sentences: The fish. swim in the pond *-+ Les poissons nagent dans I'dtang (plural) The fish is good Le poisson est bon (singular) Agreement features of nouns are lexically matched only in the case of two morphological sets. In the ease of one (or both) entry being a single inflected word, the agreement features depend only on the lexieal entry itself and are directly assigned in the transfer lexicon: \boy\,g \[hum=X\] ~ \gar~on\,N \[nura= X\] luggage, N \[hum=sing\] ~ bagages, N Inure = pl\] Because of these idiosyncrasies, agreement features of verbs are not matched. We will thus rightly have: My luggage .0. heavy (singular) *-* Mes bagages sont Iourds (plural) based on monolingual agreement constraints between subject and verb.</Paragraph>
      <Paragraph position="1"> Features assigned to the sententiai yoot node (either from lexieal insertion or from S dh~e adjoined ma- ,'.%,&amp;quot; '3 terial) are transferred or not depending on whether they are assigned autonomously !nthe target language or not. The feature tense for example is usually transferred, but not the feature~:mode because the latter depends on the verb of the matrix sentence if the sentence is embedded: Jean wants Marie to leave ~-* Jean veut que Marie parte (Danlos 1989)</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.4 Matching tree families
</SectionTitle>
      <Paragraph position="0"> In order to transfer both the predicate-argument relations, and the construction types such as question, passive, topicalization etc., it is necessary to be able to refer to a specific tree in a tree family. This is done by matching the syntactic features by which the different trees are identified within a tree family, for example &lt;passive&gt;, &lt;relative, NPi &gt; or &lt;question, NP~ &gt;.6 As has been noted, transitivity alternations exhibit striking differences in the two languages. The trees in the two families will not necessarily bear the same syntactic features; corresponding tree families may not include the same number of trees.</Paragraph>
      <Paragraph position="1"> When a syntactic feature of a given tree family does not exist for the corresponding tree family in the target language, it will be ignored. English trees for prepositional passives will thus be matched with their corresponding declarative trees in French (unless the English prepositional argument is matched with the French direct object): John was given a book by Mary Mary a donnd nn livre ~ Jean Similarly, the feature &lt;question, NPi &gt; will be transferred but not the feature differentiating between pied-piping and preposition-stranding in English, since French always pied-pipes: Who did Mary give a book to? ~ Mary a-t-elle donnd un livre? When a certain syntactic feature exists for both tree families in the two languages, but not for both lexical items, it is ignored as well: Advantage was taken of this affair by John * Patti a ~td tird de cette affaire par Jean Jean a tird patti de cette affaire Such idiosyncrasies are in fact expected and handled in our grammars, since they have both their constituent structures and their syntactic rules iexicalized (see Abeill6 \[1990 (a)\] for a discussion on this topic).</Paragraph>
      <Paragraph position="2"> 6NPi refers to the noun phrase being extracted, usually 0 for subject, 1 for first object etc..</Paragraph>
      <Paragraph position="3">  Units of a LTAG grammar have a large domain of locality. Discrepancies in the internal structures being matched are in fact expected by our strategy, and no special meclhanism is required for them.</Paragraph>
      <Paragraph position="4"> fi;.1 Discrepancies in constituent structures I1. is not a problem when an elementary tree of a certain constituent structure translates into an elementary tree with a different constituent structure ix:t the target language, provided they have a simil~p,r argument structure. For example: idiom ~-+ verb; idiom ~ different kind of idiom; verb ~ light-verb combination; VP-adverb ~-~ raising verb; S-adverb C/~ matrix clause ... as in: The baby j_ust fell ~-+ Le bdbd ~ tomber (I(aplan et ai. 1989) John is likely to come I1 est probable que Jean viendra  Links provide for simultaneous adjunction (or substitution) of matching trees at the corresponding nodes. For example in the pair 0&amp;quot;11, adjunction of an adjective (on N) in the English tree corresponds to an adjunction on the French VP: John gave a weak cough John toussa faiblemenl Furthermore elementary structures of the source language need not exist in the target language as elementary structures. For example, there is no French counterpart to the English verb particle combination. John called Mary up ~ John a appeid Mary</Paragraph>
    </Section>
    <Section position="6" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Discrepancies in syntactic prop-
</SectionTitle>
      <Paragraph position="0"> erties Some English predicates do not have the same number of arguments as their corresponding French ones. In such cases, the pair does not consists of pairs of elementary trees but rather pairs of derived trees of bounded size. Since the match is performed between derived trees, no new elementary trees are introduced in the grammars. This addition of pairs of bounded derived trees is the only change we have to make to the units of the original grammars.</Paragraph>
      <Paragraph position="1"> For example, the adverb hopefully has an S argument. Since there is no corresponding French adverb, the French verb espdrer (which has two arguments, an NP and an S) combined with on will be used: hopefully, John will work on espgre que Jean travaiilera  In the pair 0&amp;quot;12, hopefully is paired with a derived tree corresponding to on esp~re. The English tree for hopefully is paired with the result of the substitution of on in the subject position of the tree for esp~rer. The right hand tree in 0&amp;quot;12 is a derived tree. Matching agentless passive with declarative trees is done with the same device: John was given a book ~ a donnd un livre h John Similar cases occur for verbs exhibiting ergativity alternation in one language and but not in the other. In this case, a supplementary causative tree has to be used for the unaccusative language (see pair 0&amp;quot;13): The sun melt____.As the snow * le soleil fond la neige +-+ le soleil fair fondre la neige  The right hand tree in al3 is again a derived tree. Multicomponent TAG !Joshi \[1987\]) can also be used for resolving certain other discrepancies. This device is not a new addition, it is already a part of the Synchronous TAG framework.</Paragraph>
      <Paragraph position="2"> Conclusion By virtue of their extended domain of locality, Tree Adjoining Grammars allow regular correspondences between larger structures to be stated without a mediating interlingual representation. The mapping of derivation trees from source to target languages, using the formalism of synchronous TAGs, makes possible to state such direct correspondences. By doing so, we are able to match linguistic units with quite different internal structures. Furthermore, the fact that the grammars are lexicalized enables capturing some idiosyncrasies of each language.</Paragraph>
      <Paragraph position="3"> The simplicity and effectiveness of the transfer rules in this approach shows that lexicMized TAGs, with their extended domain of locality, are very well adapted to machine translation. A detailed discussion of this approach will be provided in an expanded version of this paper which will include a discussion of the applicability of this method for other pairs of languages exhibiting some language phenomena that do not arise in the pair considered in this paper.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML