File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/e93-1054_metho.xml
Size: 2,953 bytes
Last Modified: 2025-10-06 14:13:21
<?xml version="1.0" standalone="yes"?> <Paper uid="E93-1054"> <Title>Text Alignment in a Tool for Translating Revised Documents</Title> <Section position="4" start_page="451" end_page="451" type="metho"> <SectionTitle> 4 Identifying the Revisions </SectionTitle> <Paragraph position="0"> On a par with identifying which portions of the SL text were omitted and which portion of the TL were added in the process of translation, the tool needs to identify the differences between the two releases of the SL text. It needs to know which parts of the text remain the same and which parts are revisions.</Paragraph> <Paragraph position="1"> To do that, what is needed is an algorithm that can match segments of equivalent texts which knows how to handle insertions and deletions. The algorithm that was developed for aligning paragraphs is a natural choice. It handles insertions and deletions successfully and it has certain other properties which make it extremely useful. Since it is based on length correspondence (rather than exact string comparison) it can align t.he two texts even when there are irrelevant structural differences between them. The idea is that since the two text are written at different times and presumably by different writers, there can be formatting differences which can complicate the task of identifying the changes. For this reason, a simple utility like 'diff' cannot be used. I found that by treating this problem as a special case of alignment, a much cleaner and simpler solution is obtained.</Paragraph> </Section> <Section position="5" start_page="451" end_page="451" type="metho"> <SectionTitle> 5 Constructing the Bilingual Draft </SectionTitle> <Paragraph position="0"> Once the correspondences between the old and the new versions and between the old version and its translation are obtained, the tool can construct the bilingual draft. In general, this is a very simple procedure. New text that appears only in the new version of the document is copied to the draft as is (in the SL). For text that has not been changed, the corresponding TL text is fetched from the translation and copied into the proper places in the draft. The final result is a bilingual version of the revised document that can be transformed into a full translation with minimal effort. Some complications may occur in this stage as a result of a conspiracy between certain specific factors. For example, if two SL sentences are translated by a single TL sentence and one of them is modified in the new release, probably it is not safe to use any of the translated materials in the draft. In such cases, in addition to the revised text, the tool copies into the draft both the relevant text from the old version and the relevant translation and marks them appropriately. The translator then can decide whether there is a point in using any of the existing TL text in the final translation of the document.</Paragraph> </Section> class="xml-element"></Paper>