File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2202_metho.xml
Size: 16,510 bytes
Last Modified: 2025-10-06 14:09:23
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2202"> <Title>A Model for Fine-Grained Alignment of Multilingual Texts</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Di erences in Translations </SectionTitle> <Paragraph position="0"> In most cases, translations are not absolutely literal counterparts of their source texts. In order to avoid translationese, i. e. deviations from the norms of the target language, a skilled translator will apply certain mechanisms, which (Salkie, 2002) calls \inventive translations&quot; and which need to be captured and systematised.</Paragraph> <Paragraph position="1"> The following section will give some examples2 2As we work with English and German, all examples are taken from these two languages. They are taken from the Europarl corpus (see Section 4) and are abbreviated where necessary. Unfortunately, it is not easof common discrepancies encountered between a source text and its translation.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Nominalisations </SectionTitle> <Paragraph position="0"> Quite frequently, verbal expressions in L1 are expressed by corresponding nominalisations in L2. This departure from the source text results in a completely di erent structure of the target sentence, as can be seen in (1) and (2), where the English verb harmonise is expressed as Harmonisierung in German. The argument of the English verb functioning as the grammatical subject is realised as a postnominal modi er in the German sentence.</Paragraph> <Paragraph position="1"> erforderlich.</Paragraph> <Paragraph position="2"> necessary.</Paragraph> <Paragraph position="3"> This case is particularly interesting, because it involves a case of modality. In the English sentence, the verb is modi ed by the modal auxiliary must. In order to express the modality in the German version, a di erent strategy is applied, namely the use of an adjective with modal meaning (erforderlich, 'necessary'). Consequently, there are two predications in the German sentence as opposed to only one predication in the English sentence.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Voice </SectionTitle> <Paragraph position="0"> A further way in which translations can differ from their source is the choice of active or passive voice. This is exempli ed by (3) and (4). Here, the direct object of the English sentence corresponds to the grammatical subject of the German sentence, while the subject of the English sentence is realised as a prepositional phrase with durch in the German version.</Paragraph> <Paragraph position="1"> (3) The conclusions of the Theato report safeguard them perfectly.4 ily discernible from the corpus data which language is the source language. Consequently, our use of the terms 'source', 'target', 'L1', and 'L2' does not admit of any conclusions as to whether one of the languages is the source language, and if so, which one.</Paragraph> <Paragraph position="2"> 3Europarl:de-en/ep-00-01-19.al, 489.</Paragraph> <Paragraph position="3"> 4Europarl:de-en/ep-00-01-18.al, 749.</Paragraph> <Paragraph position="4"> (4) Durch</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Negation </SectionTitle> <Paragraph position="0"> Sometimes, a positive predicate expression is translated by negating its antonym. This is the case in (5) and (6): both sentences contain a negative statement, but while the negation is incorporated into the English adjective by means of the negative pre x in-, it is achieved syntactically in the German sentence.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.4 Information Structure </SectionTitle> <Paragraph position="0"> Sentences and their translations can be organised di erently with regard to their information structure. Sentences (7) and (8) are a good example for this type of non-literal translation.</Paragraph> <Paragraph position="1"> The German sentence is rather inconspicuous, with the grammatical subject being a prototypical agent (wir, 'we'). In the English version, however, it is the means that is realised in sub-ject position and thus perspectivised. The corresponding constituent in German (mit unserer Entschlie ung, 'with our motion') is but an adverbial. In English, the actual agent is not realised as such and can only be identi ed by a process of inference based on the presence of the possessive pronoun our. Thus, while being more or less equivalent in meaning, this sentence pair di ers signi cantly in its overall organisation.</Paragraph> <Paragraph position="2"> 5Europarl:de-en/ep-00-01-18.al, 2522.</Paragraph> <Paragraph position="3"> 6Europarl:de-en/ep-00-01-18.al, 53.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Alignment Model </SectionTitle> <Paragraph position="0"> The alignment model we propose is based on the assumption that a representation of translational equivalence can best be approximated by aligning the elements of monolingual predicate-argument structures. Section 3.1 describes this layer of the model in detail and shows how some of the di erences in translations described in Section 2 can be accomodated on such a level.</Paragraph> <Paragraph position="1"> We assume that the annotation model described here is an extension to linguistic data which are already annotated with phrase-structure trees, i. e. treebanks. Section 3.2 shows how the binding of predicates and arguments to syntactic nodes is modelled. Section 3.3 describes the details of the alignment layer and the tags used to mark particular kinds of alignments, thus accounting for some more of the di erences shown in Section 2.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Predicates and Arguments </SectionTitle> <Paragraph position="0"> The predicate-argument structures used in our model consist solely of predicates and their arguments. Although there is usually more than one predicate in a sentence, no attempt is made to nest structures or to join the predications logically in any way. The idea is to make the predicate-argument structure as rich as is necessary to be able to align a sentence pair while keeping it as simple as possible so as not to make it too di cult to annotate. In the same vein, quanti cation, negation, and other operators are not annotated. In short, the predicate-argument structures are not supposed to capture the semantics of a sentence exhaustively in an interlingua-like fashion.</Paragraph> <Paragraph position="1"> To have clear-cut criteria for annotators to determine what a predicate is, we rely on the heuristic assumption that predicates are more likely to be expressed by tokens belonging to some word classes than by tokens belonging to others. Potential predicate expressions in this model are verbs, deverbal adjectives and nouns7 or other adjectives and nouns which show a syntactic subcategorisation pattern. The predicates are represented by the capitalised citation form of the lexical item (e. g. harmonise). They are assigned a class based on their syntactic form (v, n, a for 'verbal', 'nominal', and 'adjectival', respectively), and derivationally related predi7For all non-verbal predicate expressions for which a derivationally related verbal expression exists it is assumed that they are deverbal derivations, etymological counter-evidence notwithstanding.</Paragraph> <Paragraph position="2"> cates form a predicate group.</Paragraph> <Paragraph position="3"> Arguments are given short intuitive role names (e. g. ent harmonised, i. e. the entity being harmonised) in order to facilitate the annotation process. These role names have to be used consistently only within a predicate group. If, for example, an argument of the predicate harmonise has been assigned the role ent harmonised and the annotator encounters a comparable role as argument to the predicate harmonisation, the same role name for this argument has to be used.8 The usefulness of such a structure can be shown by analysing the sentence pair (1) and (2) in Section 2.1. While the syntactic constructions di er considerably, the predicate-argument structure shows the correspondence quite clearly (see the annotated sentences in Figure 19): in the English sentence, we nd the predicate harmonise with its argument ent harmonised, which corresponds to the predicate harmonisierung and its argument harmonisiertes in the German sentence. The information that a predicate of the class v is aligned with a predicate of the class n can be used to query the corpus for this type of non-literal translations.</Paragraph> <Paragraph position="4"> The active vs. passive translation in sentences (3) and (4) is another phenomenon which is accomodated by a predicate-argument structure (Figure 2): the subject np502 in the English sentence corresponds to the passivised subject np502 (embedded in pp503) in the German sentence on the basis of having the same argument role (safeguarder vs. bewahrer) in a comparable predication.</Paragraph> <Paragraph position="5"> It is sometimes assumed that predicate-argument structure can be derived or recovered from constituent structure or functional tags such as subject and object.10 It is true that these annotation layers provide important heuristic clues for the identi cation of predi8Keeping the argument names consistent for all predicates within a group while di erentiating the predicates on the basis of syntactic form are complementary principles, both of which are supposed to facilitate querying the corpus. The consistency of argument names within a group, for example, enables the researcher to analyse paradigmatically all realisations of an argument irrespective of the syntactic form of the predicate. At the same time, the di erentiation of predicates makes possible a syntagmatic analysis of the di erences of argument structures depending on the syntactic form of the predicate. null 9All gures are at the end of the paper.</Paragraph> <Paragraph position="6"> 10See e. g. (Marcus et al., 1994).</Paragraph> <Paragraph position="7"> cates and arguments and may eventually speed up the annotation process in a semi-automatic way. But, as the examples above have shown, predicate-argument structure goes beyond the assignment of phrasal categories and grammatical functions, because the grammatical category of predicate expressions and consequently the grammatical functions of their arguments can vary considerably. Also, the predicate-argument structure licenses the alignment relation by showing explicitly what it is based on.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Binding Layer </SectionTitle> <Paragraph position="0"> As mentioned above, we assume that the annotation model described here is used on top of syntactically annotated data. Consequently, all elements of the predicate-argument structure must be bound to elements of the phrasal structure (terminal or non-terminal nodes). These bindings are stored in a dedicated binding layer between the constituent layer and the predicate-argument layer.</Paragraph> <Paragraph position="1"> A problem arises when there is no direct correspondence between argument roles and constituents. For instance, this is the case whenever a noun is postmodi ed by a participle clause: in Figure 3, the argument role ent raised of the predicate raise is realised by np525, but the participle clause (ipa517) containing the predicate (raised6) needs to be excluded, because not excluding it would lead to recursion. Consequently, there is no simple way to link the argument role to its realisation in the tree.</Paragraph> <Paragraph position="2"> In these cases, the argument role is linked to the appropriate phrase (here: np525) and the constituent that contains the predicate (ipa517) is pruned out, which results in a discontinuous argument realisation. Thus, in general, the binding layer allows for complex bindings, with more than one node of the constituent structure to be included in and sub-nodes to be explicitly excluded from a binding to a predicate or argument.11 null When an expected argument is absent on the phrasal level due to speci c syntactic constructions, the binding of the predicate is tagged accordingly, thus accounting for the missing argument. For example, in passive constructions like in Table 1, the predicate binding is tagged as pv.</Paragraph> <Paragraph position="3"> Other common examples are imperative constructions. Although information of this kind may possibly be derived from the constituent 11See the database documentation (Feddes, 2004) for a more detailed description of this mechanism.</Paragraph> <Paragraph position="4"> structure, it is explicitly recorded in the binding layer as it has a direct impact on the predicate-argument structure and thus might prove useful for the automatic extraction of valency patterns. null Sentence wenn korrekt gedolmetscht wurde Gloss if correctly interpreted was (Europarl:de-en/ep-00-01-18.al, 2532) Note that the passive tag can also be exploited in order to query for sentence pairs like (3) and (4) (in Section 2.2), where an active sentence is translated with a passive: it is straight-forward to nd those instances of aligned predicates where only one binding carries the passive tag.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Alignment Layer </SectionTitle> <Paragraph position="0"> On the alignment layer, the elements of a pair of predicate-argument structures are aligned with each other. Arguments are aligned on the basis of corresponding roles within the predications.</Paragraph> <Paragraph position="1"> Comparable to the tags used in the binding layer that account for speci c constructions (see Section 3.2), the alignments may also be tagged with further information. These tags are used to classify types of non-literalness like those discussed in Sections 2.3 and 2.4.12 Sentences (5) and (6) are an example for a tagged alignment. As Section 2.3 has shown, negation may be incorporated in a predicate in L1, but not in L2. Since our predicate-argument structure does not include syntactic negation, this results in the alignment of a predicate in L1 with its logical opposite in L2. To account for this fact, predicate alignments of this kind are tagged as absolute opposites (abs-opp).</Paragraph> <Paragraph position="2"> Similarly, alignment tagging is applied when predications are in some way incompatible, as is the case with sentences (7) and (8) in Section 2.4. As can be seen in the aligned annotation (Figure 4), the di erent information structure of these sentences has caused the two corresponding argument roles of giver and mitgeber to be realised by two incompatible expressions representing di erent referents (np500 12The deviant translations described in Sections 2.1 and 2.2 are already represented via predicate class (see Section 3.1) and on the binding layer (see Section 3.2), respectively.</Paragraph> <Paragraph position="3"> vs. wir5). In this case, the alignment between the incompatible arguments is tagged incomp.</Paragraph> <Paragraph position="4"> If there is no corresponding predicate-argument structure in the other language (as e. g. the adjectival predicate in sentence (2)) or if an argument within a structure does not have a counterpart in the other language, there will be no alignment.</Paragraph> <Paragraph position="5"> Table 2 gives an overview of the annotation layers as described in this section.</Paragraph> <Paragraph position="6"> annotation All elements of the alignment structure are supposed to mark explicitly the way they contribute to or distort the resulting translational equivalence of a sentence pair.13 First and foremost, if two elements are aligned to each other, this alignment is licensed by their having comparable roles in the predicate-argument structures. This is the default case. If, however, a particular alignment relation, either of predicates or of arguments, is deviant in some way, this deviance is explicitly marked and classi ed on the alignment layer.</Paragraph> </Section> </Section> class="xml-element"></Paper>