File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/c92-2080_intro.xml
Size: 3,367 bytes
Last Modified: 2025-10-06 14:05:12
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-2080"> <Title>TRANSLATION AMBIGUITY RESOLUTION BASED ON TEXT CORPORA OF SOURCE AND TARGET LANGUAGES</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> It~ecently many kinds of natural lauguage processing systems like machine translation systems have been developed and put into practical use, but ambiguity resolution ill translation and meaning interpretation is still the primary issue in such systems. These systems have conventionally adopted a rule-ba.~ed disambiguation method, using linguistic restrictions described logically in dictionary and grammar to select the suitable equivalent translation and meaning. Generally speaking, it is impossible to provide all the restrictions systematically in advance. Furthermore, such machine translation systems have suffered from inability to select the most suitable equivalent translation if the input expression meets two or more restrictions, and have difficulty in accepting any input expression that meets no restrictions.</Paragraph> <Paragraph position="1"> Ill order to overcome these difficulties, following methods .~r~ proposed these years: 1. F, xample-Ba.sed Translation : tile method based oil trans\[atiou examples (pairs of source text, aml its translation) \[Nagao 84, Sato 90, Smuita 90\] 2. Statistics-Based Translation : the nmthod us null ing statistical mr probabilistic information extracted from a bilingual corpus \[Brown 90, Nomiyama 9\]\] Still, each (ff them has inherent problems and is insufficient for ambiguity resolution. For example, either all examplc~b~mcd translation method or a statistics-based translation method needs a large-scale database of translation exalnpl~, and it is difficult to collect all adequate amount of a bilingual corpus.</Paragraph> <Paragraph position="2"> In this paper, we propose a new method to select the suitable equivalent translation using the statistical data extracted independently from source and target language texts \[Muraki 91\]. The statistical data used here is linguistic statistics repre: senting the dependency degree on the pairs of expressions in each text, especially statistics for cooccurrence, i.e., how frequently the expressions co-occur in the Sallle seutence~ the sanle paragraph or tile same chapter of each text. The dependency relation in the source language is reflected in the translated text through bilingual dictionary by sc~ lecting the equivalent translation which ma.ximizes both statistics tot co-occurrence in tile source and targ(~t language text. Moreover, the method also provid~ the means to compute tile linguistic statistics on the pairs of meaning expressions. We call tlds method for equivalent translation and meaning selection DMAX Criteria (Double Maximize Criteria based on Dual Corpora).</Paragraph> <Paragraph position="3"> First, we make comments on the characteristics and the linfits of the conventional methods of ambiguity resolution in translation and meaning interpretation in the second section. Next, we describe the details of DMAX Criteria for equivalent translation selection in the third section. And last, we explain the means to compute the linguistic statistics on the pairs of meaning expressions.</Paragraph> </Section> class="xml-element"></Paper>