File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/p01-1067_concl.xml
Size: 4,403 bytes
Last Modified: 2025-10-06 13:53:06
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1067"> <Title>A Syntax-based Statistical Translation Model</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 4 Conclusion </SectionTitle> <Paragraph position="0"> We have presented a syntax-based translation model that statistically models the translation process from an English parse tree into a foreign-language sentence. The model can make use of syntactic information and performs better for language pairs with different word orders and case marking schema. We conducted a small-scale experiment to compare the performance with IBM Model 5, and got better alignment results.</Paragraph> <Paragraph position="1"> Appendix: An Efficient EM algorithm This appendix describes an efficient implementation of the EM algorithm for our translation model. This implementation uses a graph structure for a pair a160 a182 a161a204a183a73a162 . A graph node is either a major-node or a subnode. A major-node shows a pairing of a subtree of a182 and a substring of a183 . A subnode shows a selection of a value a160a199a198 a161a201a200a205a161a204a203a205a162 for the subtree-substring pair (Figure 3).</Paragraph> <Paragraph position="2"> Let a183a49a48a50 a69 a188 a50 a51a181a51a181a51a212a188 a50a7a51a24a52 a169 , where a61 is the length of a183 . Each major-node connects to several a198 -subnodes a58 a163a199a198a63a62 a184 a209a210a161a204a183 a48a50 a169 , showing which value of a198 is selected. The arc between a58 a163 a184 a209a210a161a204a183 a48a50 a169 and a58 a163a199a198a63a62 a184 a209a210a161a204a183 a48a50 a169 has weight Pa163a199a198a13a12a184 a209a240a169 .</Paragraph> <Paragraph position="3"> A a198 -subnode a58 a163a199a198a63a62 a184 a209 a161a204a183a49a48a50 a169 connects to a finalnode with weight Pa163 a203 a12a184 a209a243a169 if a184 a209 is a terminal node in a182 . If a184 a209 is a non-terminal node, a a198 -subnode connects to several a200 -subnodes a58 a163 a200 a62a189a198 a161 a184 a209a201a161a204a183a59a48a50 a169 , showing a selection of a value a200 . The weight of the arc is Pa163 a200 a particular way of partitioning a183 a48a50 .</Paragraph> <Paragraph position="4"> A a64 -subnode a58 a163 a64 a62 a200a205a161 a198 a161 a184 a209a201a161a204a183 a48a50 a169 is then connected to major-nodes which correspond to the children of a184 a209 and the substring of a183a49a48a50 , decided by a160a199a198 a161a201a200a205a161a47a64 a162 . A major-node can be connected from different a64 subnodes. The arc weights between a200 -subnodes and major-nodes are always 1.0.</Paragraph> <Paragraph position="5"> the graph root, selecting one of the arcs from major-nodes, a198 -subnodes, and a200 -subnodes, and all the arcs from a64 -subnodes, corresponds to a particular a213 , and the product of the weight on the trace corresponds to Pa163 a213 a12a182a234a169 . Note that a trace forms a tree, making branches at the a64 -subnodes.</Paragraph> <Paragraph position="6"> We define an alpha probability and a beta probability for each major-node, in analogy with the measures used in the inside-outside algorithm for probabilistic context free grammars (Baker, 1979).</Paragraph> <Paragraph position="7"> The alpha probability (outside probability) is a path probability from the graph root to the node and the side branches of the node. The beta probability (inside probability) is a path probability below the node.</Paragraph> <Paragraph position="8"> Figure 4 shows formulae for alphabeta probabilities. From these definitions, The counts a25 a163a199a198 a161 a14 a169 , a25 a163 a200a202a161 a16 a169 , and a25 a163 a203a52a161 a18 a169 for each pair a160 a182 a161a204a183a73a162 are also in the figure. Those formulae replace the step 3 (in Section 2.3) for each training pair, and these counts are used in the step 4.</Paragraph> <Paragraph position="9"> The graph structure is generated by expanding the root node a58 a163 a184a58a185 a161a204a183a59a60a185 a169 . The beta probability for each node is first calculated bottom-up, then the alpha probability for each node is calculated topdown. Once the alpha and beta probabilities for each node are obtained, the counts are calculated as above and used for updating the parameters.</Paragraph> <Paragraph position="10"> The complexity of this training algorithm is</Paragraph> </Section> class="xml-element"></Paper>