File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/a94-1016_intro.xml

Size: 11,040 bytes

Last Modified: 2025-10-06 14:05:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="A94-1016">
  <Title>Three Heads are Better than One</Title>
  <Section position="3" start_page="95" end_page="98" type="intro">
    <SectionTitle>
2 INTEGRATING
MULTI-ENGINE OUTPUT
</SectionTitle>
    <Paragraph position="0"> ish/English), the lexicons used by the KBMT modules, a large set of user-generated bilingual glossaries as well as a gazetteer and a list of proper and organization names.</Paragraph>
    <Paragraph position="1"> The outputs from these engines (target language words and phrases) are recorded in a chart whose positions correspond to words in the source language input. As a result of the operation of each of the MT engines, new edges are added to the chart, each labeled with the translation of a region of the input string and indexed by this region's beginning and end positions. We will refer to all of these edges as components (as in &amp;quot;components of the translation&amp;quot;) for the remainder of this article. The KBMT and EBMT engines also carry a quality score for each output element. The KBMT scores are produced based on whether any questionable heuristics were used in the source analysis or target generation. The EBMT scores are produced using a technique based on human judgements, as described in (Nirenburg et al., 1994a), submitted.</Paragraph>
    <Paragraph position="2">  Figure 1 presents a general view of the operation of our multi-engine MT system. The chart manager selects the overall best cover from the collection of candidate partial translations by normalizing each component's quality score (positive, with larger being better), and then selecting the best combination of components with the help of the chart walk algorithm. null Figure 2 illustrates the result of this process on the example Spanish sentence: Al momento de su venta a Iberia, VIASA contaba con ocho aviones, que ten'an en promedio 13 a~os de vuelo which can be translated into English as At the moment of its sale to Iberia, VIASA had eight airplanes, which had on average thirteen years o\[ flight (time). This is a sentence from one of the 1993 ARPA MT evaluation texts.</Paragraph>
    <Paragraph position="3"> For each component, the starting and ending positions in the chart, the corresponding source language words, and alternative translations are shown, as well as the engine and the engine-internal quality scores. Inspection of these translations shows numerous problems; for example, at position 12, &amp;quot;aviones&amp;quot; is translated, among other things, as &amp;quot;aircrafts&amp;quot;. It must be remembered that these were generated automatically from an on-line dictionary, without any lexical feature marking or other human intervention. It is well known that such automatic methods are at the moment less than perfect, to say the least. In our current system, this is not a major problem, since the results go through a mandatory editing step, as described below.</Paragraph>
    <Section position="1" start_page="95" end_page="95" type="sub_section">
      <SectionTitle>
2.1 Normalizing the component scores
</SectionTitle>
      <Paragraph position="0"> The chart manager normalizes the internal scores to make them directly comparable. In the case of KBMT and EBMT, the pre-existing scores are modified, while lexical transfer results are scored based on the estimated reliability of individual databases, from 0.5 up to 15. Currently the KBMT scores are reduced by a constant, except for known erroneous output, which has its score set to zero. The internal EBMT scores range from 0 being perfect to 10,000 being worthless; but the scores are nonlinear. So a region selected by a threshold is converted linearly into scores ranging from zero to a normalized maximum EBMT score. The normalization levels were empirically determined in the initial experiment by having several individuals judge the comparative average quality of the outputs in an actual translation run.</Paragraph>
      <Paragraph position="1"> In every case, the base score produced by the scoring functions is currently multiplied by the length of the candidate in words, on the assumption that longer items are better. We intend to test a variety of functions in order to find the right contribution of the length factor.</Paragraph>
    </Section>
    <Section position="2" start_page="95" end_page="98" type="sub_section">
      <SectionTitle>
2.2 The chart walk algorithm
</SectionTitle>
      <Paragraph position="0"> Figure 3 presents the chart walk algorithm used to produce a single, best, non-overlapping, contiguous combination (cover) of the available component translations, assuming correct component quality scores. The code is organized as a recursive divide-and-conquer procedure: to calculate the cover of a region of the input, it is repeatedly split into two parts, at each possible position. Each time, the best possible cover for each part is recursively found, and the two scores are combined to give a score for the chart walk containing the two best subwalks. These different splits are then compared with each other and with components from the chart spanning the whole region (if any), and the overall best result is</Paragraph>
      <Paragraph position="2"> used. The terminating step of this recursion is thus getting components from the chart.</Paragraph>
      <Paragraph position="3"> To find best walk on a region: if there is a stored result for this region then return it else begin get all primitive components for the region for each position p within the region begin split region into two parts at p find best walk for first part find best walk for second part  Without dynamic programming, this would have a combinatorial time complexity. Dynamic programming utilizes a large array to store partial results, so that the best cover of any given subsequence is only computed once; the second time that a recursive call would compute the same result, it is retrieved from the array instead. This reduces the time complexity to O(n3), and in practice it uses an insignificant part of total processing time.</Paragraph>
      <Paragraph position="4"> All possible combinations of components are compared: this is not a heuristic method, but an efficient exhaustive one. This is what assures that the chosen cover is optimal. This assumes, in addition to the scores actually being correct, that the scores are compositional, in the sense that the combined score for a set of components really represents their quality as a group. This might not be the case, for example, if gaps or overlaps are allowed in some cases (perhaps where they contain the same words in the same positions).</Paragraph>
      <Paragraph position="5"> We calculate the combined score for a sequence of components as the weighted average of their individual scores. Weighting by length is necessary so that the same components, when combined in a different order, produce the same combined scores. Otherwise the algorithm can produce inconsistent results. The chart walk algorithm can also be thought of as filling in the two-dimensional dynamic-programming array!. Figure 4 shows an intermediate point in the filling of the array. In this figure, each element (i,j) is initially the best score of any single chart component covering the input region from word i to word j. Dashes indicate that no one component covers ex- null actly that region. (In rows 1 through 7, the array has not yet been operated on, so it still shows its initial state.) After processing (see rows 9 through 22), each element is the score for the best set of components covering the input from word i to word j (the best cover for this substring) ~. (Only a truncated score is shown for each element in the figure, for readability. There is also a list of best components associated with each element.) The array is upper triangular since the starting position of a component i must be less than or equal to its ending position j. For any position, the score is calculated based on a combination of scores in the row to its left and in the column below it, versus the previous contents of the array cell for its position. So the array must be filled from the bottom-up, and left to right. Intuitively, this is because larger regions must be built up from smaller regions within them.</Paragraph>
      <Paragraph position="6"> For example, to calculate element (8,10), we compute the length-weighted averages of the scores of the best walks over the pair of elements (8,8) and (9,10) versus the pair (8,9) and (10,10), and compare them with the scores of any single chart components going from 8 to 10 (there were none), and take the maximum. Referring to Figure 2 again, this corresponds to a choice between combining the translations of (8,8) VIASA and (9,10) conlaba con versus combining the (not shown) translations of (8,9) VIASA contaba and (10,10) con. (This (8,9) element was itself previously built up from single word components.) Thus, we compare (2.1 + 10,2)/3 - 7.33 with (3.5.2+2.1)/3 = 3.0 and select the first, 7.33. The first wins because contaba con has a high score as an idiom from the glossary.</Paragraph>
      <Paragraph position="7"> Figure 5 shows the final array. When the element in the top-right corner is produced (5.78), the algorithm is finished, and the associated set of components is the final chart walk result shown in Figure 2. It may seem that the scores should increase towards the top-right corner. This has not generally been the case. While the system produces a number of high-scoring short components, many low-scoring components have to be included to span the entire input. Since the score is a weighted average, these low-scoring components pull the combined score down. A clear example can be seen at position (18,18), which has a score of 15. The scores above and to its right each average this 15 with a 5, for total values of 10.0 (all the lengths happen to be 1), and the score continues to decrease with distance from this point as one moves towards the final score, which does include the component for (18,18) in the cover.</Paragraph>
      <Paragraph position="8"> 2In the actual implementation, the initial components are not present yet in the array, since the presence of an element indicates that the computation has been carried out for this position. They are accessed from the chart data structure as needed, but are shown here as an aid to understanding.</Paragraph>
    </Section>
    <Section position="3" start_page="98" end_page="98" type="sub_section">
      <SectionTitle>
2.3 Reordering components
</SectionTitle>
      <Paragraph position="0"> The chart-oriented integration of MT engines does not easily support deviations from the linear order of the source text elements, as when discontinuous constituents translate contiguous strings or in the case of cross-component substring order differences. We use a language pair-dependent set of postprocessing rules to alleviate this (for example, by switching the order of adjacent single-word adjective and noun components).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML