File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/w02-1013_abstr.xml
Size: 1,037 bytes
Last Modified: 2025-10-06 13:42:37
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1013"> <Title>From Words to Corpora: Recognizing Translation</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> This paper presents a technique for discovering translationally equivalent texts. It is comprised of the application of a matching algorithm at two di erent levels of analysis and a well-founded similarity score. This approach can be applied to any multilingual corpus using any kind of translation lexicon; it is therefore adaptable to varying levels of multilingual resource availability. Experimental results are shown on two tasks: a search for matching thirty-word segments in a corpus where some segments are mutual translations, and classi cation of candidate pairs of web pages that may or may not be translations of each other. The latter results compare competitively with previous, document-structure-based approaches to the same problem.</Paragraph> </Section> class="xml-element"></Paper>