File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-1413_intro.xml

Size: 3,466 bytes

Last Modified: 2025-10-06 14:01:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-1413">
  <Title>Using the Web as a Bilingual Dictionary</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In the field of computational linguistics, the term 'bilingual text' is often used as a synonym for 'parallel text', which is a pair of texts written in two different languages with the same semantic contents. In Asian languages such as Japanese, Chinese and Korean, however, there are a large number of 'partially bilingual texts', in which the monolingual text of an Asian language contains several sporadically interlaced English words as follows:</Paragraph>
    <Paragraph position="2"> The above sentence is taken from a Japanese medical document, which says &amp;quot;Since glaucoma is now manageable if diagnosed early, macular degeneration is becoming a major cause of visual impairment in developed nations&amp;quot;. These partially bilingual texts are typically found in technical documents, where the original English technical terms are indicated (usually in parenthesis) just after the first usage of the Japanese technical terms. Even if you don't know Japanese, you can easily guess 'a37a49a39a63a40a49a41 ' is the translation of 'macular degeneration'.</Paragraph>
    <Paragraph position="3"> Partially bilingual texts can be used for machine translation and cross language information retrieval, as well as bilingual lexicon construction, because they not only give a correspondence between Japanese and English terms, but also give the context in which the Japanese term is translated to the English term. For example, the Japanese word 'a40a8a41 ' can be translated into many English words, such as 'degeneration', 'denaturation', and 'conversion'. However, the words in the Japanese context such as 'a28a43a30 (disease)' and 'a3a10a50 (impairment)' can be used as informants guiding the selection of the most appropriate English word.</Paragraph>
    <Paragraph position="4"> In this paper, we investigate the possibility of using web-sourced partially bilingual texts as a continually-updated, wide-coverage bilingual technical term dictionary.</Paragraph>
    <Paragraph position="5"> Extracting the English translation of a given Japanese technical term from the web on the fly is different from collecting a set of arbitrary many pairs of English and Japanese technical terms.</Paragraph>
    <Paragraph position="6"> The former can be thought of example-based translation, while the latter is a tool for bilingual lexicon construction.</Paragraph>
    <Paragraph position="7"> Internet portals are starting to provide on-line bilingual dictionary and translation services. However, technical terms and new words are unlikely to be well covered because they are too specific or too new. The proposed term translation extractor could be an useful Internet tool for human translators to complement the weakness of existing on-line dictionaries and translation services. null In the following sections, we first investigate the coverage provided by partially bilingual texts in the web as discovered by using a commercial technical term dictionary and an Internet search engine. We then present a simple algorithm for extracting English translation candidates of a given Japanese technical term. Finally, we report the results of a preliminary experiment and discuss future work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML