File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0308_intro.xml

Size: 3,317 bytes

Last Modified: 2025-10-06 14:01:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0308">
  <Title>TREQ-AL: A word alignment system with limited language resources</Title>
  <Section position="2" start_page="0" end_page="1" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In (Tufis and Barbu, 2002; Tufis, 2002) we largely described our extractor of translation equivalents, called TREQ. It was aimed at building translation dictionaries from parallel corpora. We described in (Ide et al. 2002) how this program is used in word clustering and in checking out the validity of the cross-lingual links between the monolingual wordnets of the multilingual Balkanet lexical ontology (Stamatou et al. 2002). In this paper we describe the TREQ-AL system, which builds on TREQ and aims at generating a word-alignment map for a parallel text (a bitext). TREQ-AL was built in less than two weeks for the Shared Task proposed by the organizers of the workshop on &amp;quot;Building and Using Parallel Texts:Data Driven Machine Translation and Beyond&amp;quot; at the HLT-NAACL 2003  conference. It can be improved in several ways that became conspicuous when we analyzed the evaluation results. TREQ-AL has no need for an a priori bilingual dictionary, as this will be automatically extracted by TREQ. However, if such a dictionary is available, both TREQ and TREQ-AL know to make best use of it. This ability allows both systems to work in a bootstrapping mode and to produce larger dictionaries and better alignments as they are used.</Paragraph>
    <Paragraph position="1"> The word alignment, as it was defined in the shared task is different and harder than the problem of translation equivalence as previously addressed. In a dictionary extraction task one translation pair is considered correct, if there is at least one context in which it has been rightly observed. A multiply occurring pair would count only once for the final  http://www.cs.unt.edu/~rada/wpt/index.html#shared dictionary. This is in sharp contrast with the alignment task where each occurrence of the same pair equally counts.</Paragraph>
    <Paragraph position="2"> Another differentiating feature between the two tasks is the status of functional word links. In extracting translation equivalents one is usually interested only in the major categories (open classes). In our case (because of the WordNet centered approach of our current projects) we were especially interested in POS-preserving translation equivalents. However, since in EuroWordNet and Balkanet one can define cross-POS links, the different POS translation equivalents became of interest (provided these categories are major ones). The word alignment task requires each word (irrespective of its POS) or punctuation mark in both parts of the bitext be assigned a translation in the other part (or the null translation if the case).</Paragraph>
    <Paragraph position="3"> Finally, the evaluations of the two tasks, even if both use the same measures as precision or recall, have to be differently judged. The null alignments in a dictionary extraction task have no significance, while in a word alignment task they play an important role (in the Romanian-English gold standard data the null alignments represent 13,35% of the total number of links).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML