XML Viewer - p97-1015

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/p97-1015_intro.xml
Size: 4,011 bytes
Last Modified: 2025-10-06 14:06:15
<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1015">
  <Title>Probing the lexicon in evaluating commercial MT systems</Title>
  <Section position="2" start_page="0" end_page="112" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The evaluation of machine translation (MT) systems has been a central research topic in recent years (cp. (Sparck-Jones and Galliers, 1995; King, 1996)). Many suggestions have focussed on measuring the translation quality (e.g. error classification in (Flanagan, 1994) or post editing time in (Minnis, 1994)). These measures are time-consuming and difficult to apply. But translation quality rests on the linguistic competence of the MT system which again is based first and foremost on grammatical coverage and lexicon size. Testing grammatical coverage can be done by using a test suite (cp. (Nerbonne et al., 1993; Volk, 1995)). Here we will advocate a probing method for determining the lexical coverage of commercial MT systems.</Paragraph>
    <Paragraph position="1"> We have evaluated 6 MT systems which translate between English and German and which are all positioned in the low price market (under US$ 1500).</Paragraph>
    <Paragraph position="2">  alink) The overall goal of our evaluation was a comparison of these systems resulting in recommendations on which system to apply for which purpose. The evaluation consisted of compiling a list of criteria for self evaluation and three experiments with external volunteers, mostly students from a local interpreter school. These experiments were performed to judge the information content of the translations, the translation quality, and the user-friendliness.</Paragraph>
    <Paragraph position="3"> The list of criteria for self evaluation consisted of technical, linguistic and ergonomic issues. As part of the linguistic evaluation we wanted to determine the lexical coverage of the MT systems since only some of the systems provide figures on lexicon size in the documentation.</Paragraph>
    <Paragraph position="4"> Many MT system evaluations in the past have been white-box evaluations performed by a testing team in cooperation with the developers (see (Falkedal, 1991) for a survey). But commercial MT systems can only be evaluated in a black-box setup since the developer typically will not make the source code and even less likely the linguistic source data (lexicon and grammar) available. Most of the evaluations described in the literature have centered around one MT system. But there are  hardly any reports on comparative evaluations. A noted exception is (Rinsche, 1993), which compares SYSTRAN 2, LOGOS and METAL for German - English translation 3. She uses a test suite with 5000 words of authentic texts (from an introduction to Computer Science and from an official journal of the European Commission). The resulting translations are qualitatively evaluated for lexicon, syntax and semantics errors. The advantage of this approach is that words are evaluated in context. But the results of this study cannot be used for comparing the sizes of lexicons since the number of error tokens is given rather than the number of error types. Furthermore it is questionable if a running text of 5000 words says much about lexicon size, since most of this figure is usually taken up by frequent closed class words.</Paragraph>
    <Paragraph position="5"> If we are mainly interested in lexicon size this method has additional drawbacks. First, it is time-consuming to find out if a word is translated correctly within running text. Second, it takes a lot of redundant translating to find missing lexical items.</Paragraph>
    <Paragraph position="6"> So, if we want to compare the lexicon size of different MT systems, we have to find a way to determine the lexical coverage by executing the system with selected lexical items. We therefore propose to use a special word list with words in different frequency ranges to probe the lexicon efficiently.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML