File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-2202_intro.xml
Size: 2,789 bytes
Last Modified: 2025-10-06 14:02:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2202"> <Title>A Model for Fine-Grained Alignment of Multilingual Texts</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> When building parallel linguistic resources, one of the most obvious problems that need be solved is that of alignment. Usually, in sentenceor word-aligned corpora, alignments are unmarked relations between corresponding elements. They are unmarked because the kind of correspondence between two elements is either obvious or beyond classi cation. E.g., in a sentence-aligned corpus, the n : m relations that hold between sentences express the fact that the propositions contained in n sentences in L1 are basically the same as the propositions in m sentences in L2 (lowest common denominator). No further information about the kind of correspondence could possibly be added on this degree of granularity. On the other hand, in word-aligned corpora, words are usually aligned as being \lexically equivalent&quot; or are not aligned at all.1 Although there are many shades of \lexical equivalence&quot;, these are usually not explicitly We would like to thank our colleague Frank Schumacher for many valuable comments on this paper.</Paragraph> <Paragraph position="1"> 1Cf. the approach described in (Melamed, 1998).</Paragraph> <Paragraph position="2"> categorised. As (Hansen-Schirra and Neumann, 2003) point out, for many research questions neither type of alignment is su cient, since the most interesting phenomena can be found on a level between these two extremes.</Paragraph> <Paragraph position="3"> We propose a more nely grained model of alignment which is based on monolingual predicate-argument structures, since we assume that, while translations can be non-literal in a variety of ways, they must be based on similar predicates and arguments for some kind of translational equivalence to be achieved. Furthermore, our model explicitly encodes the ways in which the two versions of a text deviate from each other. (Salkie, 2002) points out that the possibility to investigate what types of non-literal translations occur on a regular basis is one of the major pro ts that linguists and translation theorists can draw from parallel corpora. In Section 2, we begin by describing some ways in which translations can deviate from one another. We then describe in detail the alignment model, which is based on a monolingual predicate-argument structure (Section 3). In Section 4 we conclude by introducing the parallel treebank project FuSe which uses the model described in this paper to align German and English texts from the Europarl parallel corpus (Koehn, 2002).</Paragraph> </Section> class="xml-element"></Paper>