File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/88/c88-1016_intro.xml
Size: 2,395 bytes
Last Modified: 2025-10-06 14:04:39
<?xml version="1.0" standalone="yes"?> <Paper uid="C88-1016"> <Title>A STATISTICAL APPROACH TO LANGUAGE TRANSLATION</Title> <Section position="4" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> In this paper we will outline an approach to automatic translation that utilizes techniques of statistical information extraction from large data bases. These self-organizing techniques have proven successful in the field of automatic speech recognition \[1,2,3\].</Paragraph> <Paragraph position="1"> Statistical approaches have also been used recently in lexicography \[41 and natural language processing \[3,5,6\]. The idea of automatic translation by statistical (information thco,'etic) methods was proposed many years ago by Warren Weaver \[711.</Paragraph> <Paragraph position="2"> As will be seen in the body of tile paper, tile suggested technique is based on the availability of pairs of large corresponding texts that are Iranslations of each other. Ill particular, we have chosen to work with the English and French languages because we were able to obtain the biqingual llansard corpus of proceedings of the Canadian parliament containing 30 million words of text \[8\]. We also prefer to apply our ideas initially to two languages whose word orcter is similar, a condition that French and English satisfy.</Paragraph> <Paragraph position="3"> Our approach eschews the use of an internmdiate ,nechalfism (language) that would encode the &quot;meaning&quot; of tile source text. The proposal will seem especially radical since very little will be sakl about employment of conventional grammars. This omissiol\], however, is not essential, and may only rcl'lect our relative lack of tools as well as our uncertainty about tile degree of grammar sophistication required. We are keeping an open mind! Ill what follows we will not be able to give actual results el French / English translation: our less than a year old project is not I'ar enongh ahmg. Rather, we will outline our current thinking, sketch certain techniqttes, and substantiate our Ol)timism by presenting: some intermediate quantitative data. We wrote this solnewhat specttlativc paper hoping to stimulate interest in applications el statistics to transhttion and to seek cooperation in achieving this difficult task.</Paragraph> </Section> class="xml-element"></Paper>