File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/94/h94-1027_relat.xml

Size: 2,870 bytes

Last Modified: 2025-10-06 14:16:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="H94-1027">
  <Title>Translating Collocations for Use in Bilingual Lexicons</Title>
  <Section position="4" start_page="153" end_page="154" type="relat">
    <SectionTitle>
4. Related Work.
</SectionTitle>
    <Paragraph position="0"> The recent availability of large amounts of bilingual data has attracted interest in several areas, including sentence alignment \[10\], \[2\], \[11\], \[1\], \[4\], word alignment \[6\], alignment of groups of words \[3\], \[7\], and statistical translation \[8\]. Of these, aligning groups of words is most similar to the work reported here, although we consider a greater variety of groups. Note that additional research using bilingual corpora is less related to ours, addressing, for example, word sense disambiguation in the source language by examining different translations in the target \[9\], \[8\].</Paragraph>
    <Paragraph position="1"> One line of research uses statistical techniques only for machine translation \[8\]. Brown et. al. use a stochastic language model based on the techniques used in speech recognition \[19\], combined with translation probabilities compiled on the aligned corpus in order to do sentence translation. The project produces high quality  translations for shorter sentences (see Berger et. al., this volume, for information on most recent results) using little linguistic and no semantic information. While they also align groups of words across languages in the process of translation, they are careful to point out that such groups may or may not occur at constituent breaks in the sentence. In contrast, our work aims at identifying syntactically and semantically meaningful units, which may either be constituents or flexible word pairs separated by intervening words, and provides the translation of these units for use in a variety of bilingual applications. Thus, the goals of our research are somewhat different.</Paragraph>
    <Paragraph position="2"> Kupiec \[3\] describes a technique for finding noun phrase correspondences in bilingual corpora. First, (as for Champollion), the bilingual corpus must be aligned sentence-wise. Then, each corpus is run through a part of speech tagger and noun phrase recognizer separately. Finally, noun phrases are mapped to each other using an iterative reestimation algorithm. In addition to the limitations indicated in \[3\], it only handles NPs, whereas collocations have been shown to include parts of NPs, categories other than NPs (e.g., verb phrases), as well as flexible phrases that do not fall into a single category but involve words separated by an arbitrary number of other words, such as &amp;quot;to take .. steps,&amp;quot; &amp;quot;to demonstrate ... support,&amp;quot; etc. In this work as in earlier work \[7\], we address this full range of collocations.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML