File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/85/j85-1002_evalu.xml

Size: 7,336 bytes

Last Modified: 2025-10-06 13:59:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="J85-1002">
  <Title>Tasks MT HT Preparation/input Translation Human Revision Transcription/Proofreading</Title>
  <Section position="13" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
6 EXPERIMENTAL RESULTS
6.1 COST-BENEFIT EVALUATION
</SectionTitle>
    <Paragraph position="0"> In 1981, the sponsor submitted TAUM-AVIATION to a cost-benefit evaluation, in order to determine if the system was usable in a production environment. This evaluation made by an independent consultant is reported in Gervais (1980), and we will only summarize the main conclusions.</Paragraph>
    <Paragraph position="1"> Raw machine output was deemed to have a degree of intelligibility, fidelity, and style that reaches 80% of unrevised human translations (HT).</Paragraph>
    <Paragraph position="2">  Revised MT and revised HT have a comparable degree of quality, but revision costs are twice as high for MT; thus, globally, revised MT turns out to be more expensive than revised HT as shown in Table 3. However, it is noted in the evaluator's report that MT reduces by half the human time required in the translation/revision process. null The direct costs of MT could probably be reduced to an acceptable level, for example by interfacing the system with a suitable word-processing environment and by reducing the percentage of sentences for which no translation is produced (Isabelle 1981).</Paragraph>
    <Paragraph position="3"> Cost-effective production would require the system to be applicable to at least 6 million words per year. In order to reach that target, the system would have to be extended to translate domains other than hydraulics. But the indirect costs involved in these extensions (e.g. dictionary development) are very high. Gervais concludes that it is impossible to assert that translation using TAUM-AVIATION would be globally cheaper than human translation.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
6.2 TECHNICAL EVALUATION OF PERFORMANCE
</SectionTitle>
      <Paragraph position="0"> Cost-benefit evaluations are certainly necessary, but a single evaluation of this type tells one very little about how the system can be expected to perform on different texts, or after further investment. TAUM developed a methodology for analyzing the performance of an MT system through a systematic examination of its translation errors.</Paragraph>
      <Paragraph position="1"> The first step is to collect all the errors in the translation of the sample text; translators/revisors then have the responsibility of deciding what is to be counted as an error. A classification scheme for translation errors will include headings such as the following: incorrect TL equivalent for a word, incorrect word order, lack of an article, etc.</Paragraph>
      <Paragraph position="2"> In itself, an absolute number of such errors for a given text is not very revealing; but a comparison of the ratio of errors to word tokens in different texts, or at various stages of development of the system is an initial source of useful information.</Paragraph>
      <Paragraph position="3"> Still, from the point of view of system development, these &amp;quot;surface&amp;quot; translation problems are merely symptoms for problems in some component of the system. To provide an answer to questions such as: * how many of these problems have a known solution? * how long would it take to correct them? * how much better would the performance of the system be after n person/months of work? * what should the priorities be? it is necessary to identify, for each surface problem, one or several causes in the functioning of the system. For example, the fact that; in the translation of a given sentence, a French adjective is incorrectly inflected could be caused by one or more of the following factors: * incorrect marking in a dictionary entry; * mistake in TL syntactic rules for agreement; * incorrect scoping in SL analysis (e.g. give the wrong bracketing to an ADJ NOUN NOUN sequence); or * absence of relevant marking in SL (e.g. when translating federal and provincial governments into French, should one pluralize the adjectives?).</Paragraph>
      <Paragraph position="4"> A sophisticated error classification grid was developed, so that the sources of translation errors could be investigated in a coherent and meaningful way. Basically, this error grid reflects the internal organization of the system, so that translation errors can be assigned a precise cause in the operation of the system.</Paragraph>
      <Paragraph position="5"> Once a coherent scheme is available, one can proceed with the classification of the translation errors found in the sample text. This classification process is difficult and tedious, but it is crucial that it be done with accuracy and consistency. Frequently, one has to follow &amp;quot;execution traces&amp;quot; to discover the exact source of a given error. The final step is to look at the possible remedies for the problems that have been identified. A careful examination of each problem source will reveal whether or not there is a known way of eliminating it, and if so, what amount of effort is needed.</Paragraph>
      <Paragraph position="6"> If this type of technical evaluation can be carried out at successive stages of development (with both old and new texts), one gets a clear picture of the evolution of the system. The figures obtained will reveal whether or not: * there has been substantial progress compared to previous stages; * an asymptote has been reached in the curve investment/improvement.</Paragraph>
      <Paragraph position="7"> The same figures will also help determine development and research priorities.</Paragraph>
      <Paragraph position="8"> This sort of technical evaluation was applied twice to TAUM-AVIATION in the final year of the project; only a few person-months of development had been invested in Computational Linguistics, Volume l I, Number l, January-March 1985 25 Pierre lsabelle and Laurent Bourbeau TAUM-AVIATION: Its Technical Features and Some Experimental Results the system between the two tests. The main goal was to see how well a system developed on the basis of corpora from the domain of hydraulics would fare in the domain of electronics. Some results were as follows: * In both tests, more than 70% of the failures were classified as having a known solution; the vast majority of these could be corrected in 12 person/months of work.</Paragraph>
      <Paragraph position="9"> * From a syntactic point of view, there is no notable difference between hydraulics and electronics. In fact, as a result of a minimal effort in correcting some problems discovered in the test on hydraulics, the overall performance of the parser turned out to be better in the electronics test.</Paragraph>
      <Paragraph position="10"> * As expected, there was a major dictionary problem in going from one domain to the other. Selectional restrictions as assigned for hydraulics worked so poorly that they did more harm than good to the final result.</Paragraph>
      <Paragraph position="11"> A definitive assessment of the linguistic and computational techniques on which TAUM-AVIATION is based would have required a few more applications of this evaluation/correction cycle.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML