File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-2202_concl.xml

Size: 4,570 bytes

Last Modified: 2025-10-06 13:54:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2202">
  <Title>A Model for Fine-Grained Alignment of Multilingual Texts</Title>
  <Section position="5" start_page="0" end_page="0" type="concl">
    <SectionTitle>
4 Application and Outlook
</SectionTitle>
    <Paragraph position="0"> The alignment model we have described is currently being used in a project to build a tree-bank of aligned parallel texts in English and German with the following linguistic levels: pos tags, constituent structure and functional relations, plus the predicate-argument structure and the alignment layer to \fuse&amp;quot; the two { hence our working title for the treebank, FuSe, which additionally stands for functional semantic annotation (Cyrus et al., 2003; Cyrus et al., 2004).</Paragraph>
    <Paragraph position="1"> Our data source, the Europarl corpus (Koehn, 2002), contains sentence-aligned proceedings of the European parliament in eleven languages 13Cf. the \translation network&amp;quot; described in (Santos, 2000) for a much more complex approach to describing translation in a formal way; this model, however, goes well beyond what we think is feasible when annotating large amounts of data.</Paragraph>
    <Paragraph position="2"> and thus o ers ample opportunity for extending the treebank at a later stage.14 For syntactic and functional annotation we basically adapt the tiger annotation scheme (Albert and others, 2003), making adjustments where we deem appropriate and changes which become necessary when adapting to English an annotation scheme which was originally developed for German. null We use Annotate for the semi-automatic assignment of pos tags, hierarchical structure, phrasal and functional tags (Brants, 1999; Plaehn, 1998a). Annotate stores all annotations in a relational database.15 To stay consistent with this approach we have developed an extension to the Annotate database structure to model the predicate-argument layer and the binding layer.</Paragraph>
    <Paragraph position="3"> Due to the monolingual nature of the Annotate database structure, the alignment layer (Section 3.3) cannot be incorporated into it.</Paragraph>
    <Paragraph position="4"> Hence, additional types of databases are needed.</Paragraph>
    <Paragraph position="5"> For each language pair (currently English and German), an alignment database is de ned which represents the alignment layer, thus fusing two extended Annotate databases. Additionally, an administrative database is needed to de ne sets of two Annotate databases and one alignment database. The nal parallel tree-bank will be represented by the union of these sets (Feddes, 2004).</Paragraph>
    <Paragraph position="6"> While annotators use Annotate to enter phrasal and functional structures comfortably, the predicate-argument structures and alignments are currently entered into a structured text le which is then imported into the database. A graphical annotation tool for these layers is under development. It will make binding the predicate-argument structure to the constituent structure easier for the annotators and suggest argument roles based on previous decisions. null Possiblities of semi-automatic methods to speed up the annotation and thus reduce the costs of building the treebank are currently being investigated.16 Still, quite a bit of manual 14There are a few drawbacks to Europarl, such as its limited register and the fact that it is not easily discernible which language is the source language. However, we believe that at this stage the easy accessibility, the amount of preprocessing and particularly the lack of copyright restrictions make up for these disadvantages. 15For details about the Annotate database structure see (Plaehn, 1998b).</Paragraph>
    <Paragraph position="7"> 16One track we follow is to investigate if it is feasible to work will remain. We believe, however, that the e ort that goes into such a gold-standard parallel treebank is very much worthwhile since the treebank will eventually prove useful for a number of elds and can be exploited for numerous applications. To name but a few, translation studies and contrastive analyses will pro t particularly from the explicit annotation of translational di erences. nlp applications such as Machine Translation could, e. g., exploit the constituent structures of two languages which are mapped via the predicate-argument-structure.</Paragraph>
    <Paragraph position="8"> Also, from the disambiguated predicates and their argument structures, a multilingual valency dictionary could be derived.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML