File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/93/j93-1006_concl.xml

Size: 4,483 bytes

Last Modified: 2025-10-06 13:57:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="J93-1006">
  <Title>Text-Translation Alignment</Title>
  <Section position="8" start_page="140" end_page="141" type="concl">
    <SectionTitle>
6. Future Work
</SectionTitle>
    <Paragraph position="0"> For most practical purposes, the alignment algorithm we have described produces very satisfactory results, even when applied to relatively free translations. There are doubtless many places in which the algorithm itself could be improved. For example, it is clear that the present method of building the SAT favors associations between long sentences, and this is not surprising, because there is more information in long sentences. But we have not investigated the extent of this bias and we do not therefore know it as appropriate.</Paragraph>
    <Paragraph position="1"> The present algorithm rests on being able to identify one-to-one associations between certain words, notably technical terms and proper names. It is clear from a brief inspection of Table 2 that very few correspondences are noticed among everyday words and, when they are, it is usually because those words also have precise technical uses. The very few exceptions include &amp;quot;only'/&amp;quot;nur&amp;quot; and &amp;quot;the&amp;quot; / &amp;quot;die-.&amp;quot; The pair &amp;quot;per&amp;quot; / &amp;quot;pro&amp;quot; might also qualify, but if the languages afford any example of a scientific preposition, this is surely it. The most interesting further developments would be in the direction of loosening up this dependence on one-to-one associations both because this would present a very significant challenge and also because we are convinced that our present method identifies essentially all the significant one-to-one associations.</Paragraph>
    <Paragraph position="2"> There are two obvious kinds of looser associations that could be investigated.</Paragraph>
    <Paragraph position="3"> One would consist of connections between a single vocabulary item in one language and two or more in the other, or even between several items in one language and several in the other. The other would involve connections--one-one, one-many, or many-many--between phrases or recurring sequences.</Paragraph>
    <Paragraph position="4"> We have investigated the first of these enough to satisfy ourselves that there is latent information on one-to-many associations in the text, and that it can be revealed by suitable extensions of our methods. However, it is clear that the combinatorial problems associated with this approach are severe, and pursuing it would require much fine tuning of the program and designing much more effective ways of indexing the most important data structures. The key to reducing the combinatorial explosion probably lies in using tables of similarities such as those the present algorithm uses to suggest combinations of items that would be worth considering. If such an approach could be made efficient enough, it is even possible that it would provide a superior way of solving the problem for which our heuristic methods of morphological analysis were introduced. Its superiority would come from the fact that it would not depend on words being formed by concatenation, but would also accommodate such phenomena as umlaut, ablaut, vowel harmony, and the nonconcatenative process of Semitic morphology.</Paragraph>
    <Paragraph position="5"> The problems of treating recurring sequences are less severe. Data structures, such as the Patricia tree (Knuth 1973; pp. 490-493) provide efficient means of identifying all such sequences and, once identified, the data they provide could be added to  Computational Linguistics Volume 19, Number 1 the WAT much as we now add the results of morphological analysis. Needless to say, this would only allow for uninterrupted sequences. Any attempt to deal with discontinuous sequences would doubtless also involve great combinatorial problems.</Paragraph>
    <Paragraph position="6"> These avenues for further development are intriguing and would surely lead to interesting results. But it is unlikely that they would lead to much better sets of associations among sentences than are to be found in the SATs that our present program produces, and it was mainly these results that we were interested in from the outset.</Paragraph>
    <Paragraph position="7"> The other avenues we have mentioned concern improvements in the WAT which, for us, was always a secondary interest.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML