File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1155_intro.xml

Size: 4,335 bytes

Last Modified: 2025-10-06 14:02:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1155">
  <Title>A Flexible Example Annotation Schema: Translation Corresponding Tree Representation</Title>
  <Section position="2" start_page="1" end_page="1" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The construction of bilingual knowledge base, in the development of example-based machine translation systems (Sato and Nagao, 1990), is vitally critical. In the translation process, the application of bilingual examples concerns with how examples are used to facilitate translation, which involves the factorization of an input sentence into the format of stored examples and the conversion of source texts into target texts in terms of the existing translations by referencing to the bilingual knowledge base. Theoretically speaking, examples can be achieved from bilin- null Or bilingual knowledge base, we use the two terms interchangeably. null gual corpus where the texts are aligned in sentential level, and technically, we need an example base for convenient storage and retrieval of examples. The way of how the translation examples themselves are actually stored is closely related to the problem of searching for matches. In structural example-based machine translation systems (Grishman, 1994; Meyers et al., 1998; Watanabe et al., 2000), examples in the knowledge base are normally annotated with their constituency (Kaji et al., 1992) or dependency structures (Matsumoto et al., 1993; Aramaki et al., 2001; Al-Adhaileh et al., 2002), which allows the corresponding relations between source and target sentences to be established at the structural level. All of these approaches annotate examples by mean of a pair of analyzed structures, one for each language sentence, where the correspondences between inter levels of source and target structures are explicitly linked. However, we found that these approaches require the bilingual examples that have 'parallel' translations or 'close' syntactic structures (Grishman, 1994), where the source sentence and target sentences have explicit correspondences in the sentences-pair. For example, in (Wu, 1995), the translation examples used for building the translation alignments are selected based on strict constraints. As a result, these approaches indirectly limit their application in using the translation examples that are 'free translation' to the development of example-based machine translation system. In practice, most of the existing bilingual corpus, the meanings of the source sentences are interpreted in target language in the nature of 'freer', other than literally translated in a projective manner and stayed as close to the source text as possible, in particular for the languages-pair that are structural divergences, such as Portuguese and Chinese. null As illustrated in Figure 1, the translation of the Portuguese sentence &amp;quot;Onde ficam as barracas de praia?&amp;quot; is interpreted into &amp;quot;Kayng Yi Shi Zai Na I ? (Where are the bathhouses?)&amp;quot; other than straightly translated to &amp;quot;Sha Tan Zhang Peng Zai Na I ? (Where are the tents of beach?)&amp;quot;. The translations of the words, i.e. &amp;quot;barracas&amp;quot; and &amp;quot;praia&amp;quot;, of the source sentence do not explicitly appear in target sentence. As a result, in the conventional alignment process, to achieve a fully aligned structural representation for such sentences-pair may be problematic. However, we found that such type of examples is very common. We have investigated around 2100 bilingual examples that are extracted from a grammar book &amp;quot;Gramatica da Lingua Portuguesa&amp;quot; (Wang and Lu, 1999), and found that 63.4% of examples belong to the discussed case, where the number of unmatched words is more than half the number of words in source sentence. In this paper, we overcome the problem by designing a flexible representation schema, called Translation Corresponding Tree (TCT). We use the TCT as the basic structure to annotate the examples in our example bilingual knowledge base for the Portuguese to Chinese example-based machine translation system.</Paragraph>
    <Paragraph position="1">  the translations of some words in Portuguese sentence do not appear in target Chinese sentence.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML