File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/h94-1029_intro.xml
Size: 2,787 bytes
Last Modified: 2025-10-06 14:05:41
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1029"> <Title>The Automatic Component of the LINGSTAT Machine-Aided Translation System*</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> LINGSTAT is an interactive machine-aided translation system designed to increase the productivity of a translator. It is aimed both at experienced users whose goal is high quality tr~nAlation, and inexperienced users with little knowledge of the source whose goal is simply to extract information from foreign language text. (For an introduction to the basic structure of LINGSTAT, see \[1\].) The first problem to be studied is Japanese to English translation with an emphasis on text from the domain of mergers and acquisitions, although recent evaluations have included general newspaper text as well. Work is also progressing on a Spanish to English system. The approach described below represents the current state of the Japanese system, and will be applied with minimal changes to Spanish.</Paragraph> <Paragraph position="1"> Due to the special difficulties presented by the Japanese writing system, previous versions of LINGSTAT have focused on developing tools for the lexical analysis of Japanese (such as tokenization of the Japanese character stream, morphological analysis, and kat~l~a transliteration), and on providing the user access to lexical information (such as pronunciations, glosses, and definitions) via online lookup tools. In addition, a simple parser was incorporated to identify modifying phrases. No translation of the document was provided. Instead, the user used the results of the above analyses and the online tools to construct a translation.</Paragraph> <Paragraph position="2"> In the newest version of LINGSTAT, the user is pro*This work was sponsored by the Advanced Research Projects Agency under contract number J-FBI-91-239.</Paragraph> <Paragraph position="3"> vided with a draft translation of the source document.</Paragraph> <Paragraph position="4"> For a source language similar to English, a starting point for such a draft might be a word-for-word translation, but because Japanese word order and sentence structure are so different from English, a more general framework has been constructed. The translation process in LINGSTAT consists of the following steps: * Tokenization and morphological analysis * Parsing * Rearrangement of the source into English order * Annotation and selection of glosses via an English language model These modules are described in Section 2 below. Section 3 gives some preliminary results from the January 1994 evaluation, and Section 4 discusses some plans for future improvements.</Paragraph> </Section> class="xml-element"></Paper>