File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-1403_intro.xml
Size: 4,047 bytes
Last Modified: 2025-10-06 14:01:17
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-1403"> <Title>Inducing Lexico-Structural Transfer Rules from Parsed Bi-texts</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> This paper describes a novel approach to inducing transfer rules from syntactic parses of bi-texts and available bilingual dictionaries. The approach consists of inducing transfer rules using the four major steps described in more detail below: (i) aligning the nodes of the parses; (ii) generating candidate rules from these alignments; (iii) ordering candidate rules by co- occurrence; and (iv) applying error-driven filtering to select the final set of rules.</Paragraph> <Paragraph position="1"> Our approach is based on lexico-structural transfer (Nasr et. al., 1997), and extends recent work reported in (Han et al., 2000) about Korean to English transfer in particular. Whereas Han et al. focus on high quality domain-specific translation using handcrafted transfer rules, in this work we instead focus on automating the acquisition of such rules.</Paragraph> <Paragraph position="2"> Our approach can be considered a generalization of syntactic approaches to example-based machine translation (EBMT) such as (Nagao, 1984; Sato and Nagao, 1990; Maruyama and Watanabe, 1992). While such approaches use syntactic transfer examples during the actual transfer of source parses, our approach instead uses syntactic transfer examples to induce general transfer rules that can be compiled into a transfer dictionary for use in the actual translation process. Our approach is similar to the recent work of (Meyers et al., 1998) where transfer rules are also derived after aligning the source and target nodes of corresponding parses. However, it also differs from (Meyers et al., 1998) in several important points. The first difference concerns the content of parses and the resulting transfer rules; in (Meyers et al., 1998), parses contain only lexical labels and syntactic roles (as arc labels), while our approach uses parses containing lexical labels, syntactic roles, and any other syntactic information provided by parsers (tense, number, person, etc.).</Paragraph> <Paragraph position="3"> The second difference concerns the node alignment; in (Meyers et al., 1998), the alignment of source and target nodes is designed in a way that preserves node dominancy in the source and target parses, while our approach does not have such restriction. One of the reasons for this difference is due to the different language pairs under study; (Meyers et al., 1998) deals with two languages that are closely related syntactically (Spanish and English) while we are dealing with languages that syntactically are quite divergent, Korean and English (Dorr, 1994). The third difference is in the process of identification of transfer rules candidates; in (Meyers et al., 1998), the identification is done by using the exact tree fragments in the source and target parse that are delimited by the alignment, while we use all source and target tree sub-patterns matching a subset of the parse features that satisfy a customizable set of alignment constraints and attribute constraints. The fourth third difference is in the level of abstraction of transfer rules candidates; in (Meyers et al., 1998), the source and target patterns of each transfer rule are fully lexicalized (except possibly the terminal nodes), while in our approach the nodes of transfer rules do not have to be lexicalized.</Paragraph> <Paragraph position="4"> Section 2 describes our approach to transfer rules induction and its integration with data preparation and evaluation. Section 3 describes the data preparation process and resulting data.</Paragraph> <Paragraph position="5"> Section 4 describes the transfer induction process in detail. Section 5 describes the results of our initial evaluation. Finally, Section 6 concludes with a discussion of future directions.</Paragraph> </Section> class="xml-element"></Paper>