File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/98/p98-2139_abstr.xml
Size: 5,070 bytes
Last Modified: 2025-10-06 13:49:21
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2139"> <Title>Deriving Transfer Rules from Dominance-Preserving Alignments</Title> <Section position="2" start_page="0" end_page="843" type="abstr"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Automatic acquisition of translation rules from parallel sentence-aligned text takes a variety of forms. Some machine translation (MT) systems treat aligned sentences as unstructured word sequences. Other systems, including our own ((Grishman, 1994) and (Meyers et al., 1996)), syntactically analyze sentences (parse) before acquiring transfer rules (cf. (Kaji et hi., 1992), (Matsumoto et hi., 1993), and (Kitamura and Matsumoto, 1995)). This has the advantage of acquiring structural as well as lexical correspondences. A syntactically analyzed, aligned corpus may serve as an example base for a form of example-based NIT (cf. (Sato and Nagao, 1990), (l(aji et al., 1992), and (Furuse and Iida. 1994)).</Paragraph> <Paragraph position="1"> This paper 1 describes: (1) an efficient algorithm for aligning a pair of source/target language parse trees; and (9) a procedure for deriving transfer rules from this alignment. Each transfer rule consists of a pair of tree fragments derived by &quot;cutting up&quot; the source and target trees. A set of transfer rules whose left-hand sides match a source language parse tree is used to generate a target language parse tree from their set of right-hand sides, which is a translation of the source tree. This technique resembles work on NIT using synchronous Tree-Adjoining Grammars (cf. (Abeille et al.. 1990)).</Paragraph> <Paragraph position="2"> The Proteus translation system learns transfer rules from pairs of aligned source and target regularized parses, Proteus's representation of predicate argument structure (cf. Figure 1). 2 Then it uses these transfer rules to map source tanl We thank Cristina Olmeda Moreno for work on parsing our Spanish text. This research was supported by</Paragraph> <Section position="1" start_page="0" end_page="843" type="sub_section"> <SectionTitle> National Science Fotmdation Grant IRI-9303013. </SectionTitle> <Paragraph position="0"> 2Regularized parses (henceforth, &quot;parse trees&quot;) are like F-structures of Lexical Ftmction Grammar (LFG), except, that a dependency structure is used.&quot; guage regularized parses generated by our source language parser into target language regularized parses. Finally a generator converts target regularized parses into target language sentences.</Paragraph> <Paragraph position="1"> An alignment f is a 1-to-1 partial mapping from source nodes to target nodes. We consider only alignments which preserve the dominance relationship: If node a dominates node b in the source tree, then f(a) dominates f(b) in the target tree. In Figure 1. source nodes .4.</Paragraph> <Paragraph position="2"> B, C and D map to the corresponding target nodes, marked with a prime, e.g., f(A) = A'.</Paragraph> <Paragraph position="3"> The alignment may be represented by the set {(d, A'), (B, B'), (C, C'), (D, D')}. We can assign a score to each alignment f, based on the (weighted) number of pairs in f; finding the best alignment translates into finding the alignment with the highest score. Our algorithms are based on (Farach et al., 1995) and related work.</Paragraph> <Paragraph position="4"> We needed efficient alignment algorithms because: (1) Corpus-based training requires processing a lot of text; and (2) An exhaustive search of all alignments is too computationally expensive for realistically sized parse trees.</Paragraph> <Paragraph position="5"> Eliminating dominance violations greatly reduced our search space. Similar work (e.g., (Matsumoto et hi., 1993)) considers all possible matches. Although. our system cannot account for actual dominance violations in a given bitext, there are no such violations in our corpus and many hypothetical cases can be avoided by adopting the appropriate grammar. Cases of adjuncts aligning with heads and vice versa are not dominance violations if we replace our dependency analysis with one in which internal nodes have category labels and the head constituents are marked by HEAD arcs and we assume the following Categorial Grammar (CG) style analyses. Suppose that verb (Vi) maps to adverb (A'I) and adverb (A2) maps to verb (V'2), where A2 modifies V1 and A'l modifies V'2. We assume the following structures: \[VP \[VP1 V1 ...\] A2\] and \[VP \[VP2 V'2...\] A'I\]. No dominance violation exists because no dominance relation holds between VI and A2 or V'2 and A'L Y.</Paragraph> <Paragraph position="6"> Matsumoto (p.c.) notes that the subordinate clause of a source sentence may align with the main clause of a target language and vice versa, e.g., X after Y aligns with Y' before X'. where X, X', Y and Y' are all clauses. Assuming a CG style analysis, \[S X \[after Y\]\] aligns with \[S Y&quot; \[before X'\]\] with no dominance violations.</Paragraph> </Section> </Section> class="xml-element"></Paper>