File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/82/c82-2043_abstr.xml

Size: 6,340 bytes

Last Modified: 2025-10-06 13:46:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="C82-2043">
  <Title>AUTOMATIC TRANSLATION THROUGH UNDERSTANDING AND SUMMARIZING</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
AUTOMATIC TRANSLATION THROUGH UNDERSTANDING AND
SUMMARIZING
</SectionTitle>
    <Paragraph position="0"> N. N. Leont eva VsesojuznyJ centr perevodov Kr~l~anovskogo 14, 117 218 Moskva, USSR French to Russian automatic translation system belng developed In All-Unlon Centre of Translation is conceived as part of a multlfunctlonal information processing system in the sense that It should be able to use approaches and methods proper to the information processing field, such as summarizing, abstracting, indexing, making inferences, etc. In such a system translation is realized through building text information representation (IR). The task requires two types of analysis: linguistic analysis (LA) and information analysis (IA) working in interaction, the latter being, in particular, able to refer to the automatic thesaurus. The ultimate aim of LA is the building of sentence semantic representation (SR). It is important that for each individual sentence its SR Is con\structed as a function of the IR of the whole text. (The current version of the system does not operate with the whole text but is limited for each sentence with Its more or less immediate context.) Linguistic analysis calculates morphological structure for words, syntactic and semantic structures for sentences. Each of these structures is determined by the approprlete language realities; still remaining obscurities can be cleared only by referring to higher levels of analysis.</Paragraph>
    <Paragraph position="1"> SR built for an lndlvlduel sentence without regard to other sentences&amp;quot; SR's is normally incomplete (deficient, ambiguous I incorrect, etc.). SR incompleteness Is manifested by incomplet- 178 eness of its unitsdeg The construction of text IR requires operations of comparison of different SR units as well as their comparison with thesaurus units. As a rule, incompleteness proper to SR's As cleared onl~ partially, which calls for some external measures to ensure a formally correct structure ready for the synthesis of the output text. The general  scheme of the system functioning runs as follows: 1 * analysis 2. reconstruc t- 3 * summarlz- 4 * synthesis (LA) ion (LA-IA) Ing (IA) lnl~tIal r SR corrected SR compressed SR input output sent eric e sent ence  Llr~uistlc analysis contains a set of procedures aimed at creating initial S~'s where all cases of Incompleteness are exposed. Reconstruction compares SR's with each other and with the thesaurus and restaures the missing parts of S~'s. Summarizing means obtaining a klnd of an abstract from which all obscure and Incomplete parts are removed so that only essential Information Is available.</Paragraph>
    <Paragraph position="2"> Information processing plays an important role in ~ealleation of the scheme as the system translates only what it comprehends, thus the result may be called not a literal but a &amp;quot;digested&amp;quot; translation. The information model of automatic translation is based on the properties of the coherent text. One of the main properties Is that pieces of information essential for the text are repeated there In many ways and by various iIngulstlc means. IA alms at Identifying such information and making it the basis for SR reconstruction. The level of &amp;quot;information noise&amp;quot; in the synthesized text Is expected to be lower than In the classical approach to AT (sentence-to-sentence translation through syntactic structures). The degree of abstracting (summarizing) can vary depending On the purpose: the system can be oriented at getting a translation - 179 proper, a detailed or a brief abstract s a summery, or, finally, a search patter~. The effect of such reproductions of the Input text with subsiding detallty reminds of an echo which gradually loses almost ell orJ~lnal features keeping the main pattern to the end:no degree of abstracting should affect the document main contents.</Paragraph>
    <Paragraph position="3"> The system Information orientation determines the choice of linguistic means of analTsIs, mainly, the structure and unite of syntactic and semantic representations. Two principles can be formulated: HpurItydeg' Of means at each level of analTaI8 and possibilities of Interaction between levels. The ~Irat principle makes It possible to use with maximum efficiency the laws specific to each level and to certify the formal correctness of the result'lng structure. The second principle Implies a kind of hlerarchlel organlsatlon of g~ammar: If a unit of one level cannot be Interpreted at a hZKher level, It can be mgenerallzedN (8 lexema can be generalized to a semantic class, a labeled relation can be replaced by a more general or even an unlabeled relation). Building of a structure at each level comprises at least two stages: creat-Ion of the 5nttIal structure permitted to be Incomplete and Inco~ect, and reconstruction of a more complete and correct structure, after an Interpretation of the Initial structure by means of the hJ~her level (or levels).</Paragraph>
    <Paragraph position="4"> The division into levels Is manifested not onl~ by different means of analTsls but also by different nature of unite: nodes and ~elatlone. Nodes of syntactic representation are wo~ds (d~fference of lexIcal meanings Is disregarded), nodes of semantic representation are lexical meanings, nodes of IR are notions having denotative status. Relations of syntactic structure are flmctIonal (from predicate to subject, form predicate to'direct or indirect object, attributive relation, etc.). SR-relation8 are of eemarrtIc nature (cause, trine, patient, etc.), IR relation8 are malnl7 the same but - 180 vary In their lr~or~atlon value: some appear J~PSde a notion and are devaluated, other~ connect separate notions and acquire denotative status.</Paragraph>
    <Paragraph position="5"> Uv_tts of translation ere represented by units of IR having an expllclte Inner structure and liable to translat-Ion either ss-a whole or by parts. They are formed In the ~&amp;quot; course of both linguistic and Information analyses.</Paragraph>
    <Paragraph position="6"> - 181 -</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML