File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/98/p98-1004_abstr.xml
Size: 3,122 bytes
Last Modified: 2025-10-06 13:49:16
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1004"> <Title>A Simple Hybrid Aligner for Generating Lexical Correspondences in Parallel Texts</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> We present an algorithm for bilingual word alignment that extends previous work by treating multi-word candidates on a par with single words, and combining some simple assumptions about the translation process to capture alignments for low frequency words.</Paragraph> <Paragraph position="1"> As most other alignment algorithms it uses co-occurrence statistics as a basis, but differs in the assumptions it makes about the translation process. The algorithm has been implemented in a modular system that allows the user to experiment with different combinations and variants of these assumptions. We give performance results from two evaluations, which compare well with results reported in the literature.</Paragraph> <Paragraph position="2"> Introduction In recent years much progress have been made in the area of bilingual alignment for the support of tasks such as machine translation, machine-aided translation, bilingual lexicography and terminology. For instance, Melamed (1997a) reports that his word-to-word model for translational equivalence produced lexicon entries with 99% precision and 46% recall when trained on 13 million words of the Hansard corpus, where recall was measured as the fraction of words from the bitext that were assigned some translation. Using the same model but less data, a French/English software manual of 400,000 words, Resnik and Melamed (1997) reported 94% precision with 30% recall.</Paragraph> <Paragraph position="3"> While these figures are indeed impressive, more telling figures can only be obtained by measuring the effect of the alignment system on some specific task. Dagan and Church (1994) reports that their Termight system helped double the speed at which terminology lists could be compiled at the AT&T Business Translation Services.</Paragraph> <Paragraph position="4"> It is also clear that the usability of bilingual concordances would be greatly improved if the system could indicate both items of a translation pair and if phrases could be looked up with the same ease and precision as single words (Macklovitch and Hannan 1996).</Paragraph> <Paragraph position="5"> For the language pairs that are of particular interest to us, English vs. other Germanic languages, the ability to handle multi-word units adequately is crucial (cf. Jones and Alexa 1997). In English a large number of technical terms are multi-word compounds, while the corresponding terms in other Germanic languages are often single-word compounds. We illustrate with a few examples from an English/Swedish computer manual:</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> English Swedish </SectionTitle> <Paragraph position="0"> after all n~ir allt kommer,omkring trots in spite of in general i allm~inhet</Paragraph> </Section> </Section> class="xml-element"></Paper>