File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-1091_evalu.xml
Size: 3,024 bytes
Last Modified: 2025-10-06 13:59:40
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1091"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A Discriminative Global Training Algorithm for Statistical MT</Title> <Section position="7" start_page="726" end_page="727" type="evalu"> <SectionTitle> 5 Discussion and Future Work </SectionTitle> <Paragraph position="0"> The work in this paper substantially differs from previous work in SMT based on the noisy channel approach presented in (Brown et al., 1993).</Paragraph> <Paragraph position="1"> While error-driven training techniques are commonly used to improve the performance of phrase-based translation systems (Chiang, 2005; Och, 2003), this paper presents a novel block sequence translation approach to SMT that is similar to sequential natural language annotation problems such as part-of-speech tagging or shallow parsing, both in modeling and parameter training. Unlike earlier approaches to SMT training, which either rely heavily on domain knowledge, or can only handle a small number of features, this approach treats the decoding process as a black box, and can optimize tens millions of parameters automatically, which makes it applicable to other problems as well. The choice of our formulation is convex, which ensures that we are able to find the global optimum even for large scale problems. The loss function in Eq. 4 may not be optimal, and using different choices may lead to future improvements. Another important direction for performance improvement is to design methods that better approximate Eq. 6. Although at this stage the system performance is not yet better than previous approaches, good translation results are achieved on a standard translation task. While being similar to (Tillmann and Zhang, 2005), the current procedure is more automated with comparable performance. The latter approach requires a decomposition of the decoding scheme into local decision steps with the inherent difficulty acknowledged in (Tillmann and Zhang, 2005). Since such limitation is not present in the current model, improved results may be obtained in the future. A perceptron-like algorithm that handles global features in the context of re-ranking is also presented in (Shen et al., 2004).</Paragraph> <Paragraph position="2"> The computational requirements for the training algorithm in Table 2 can be significantly reduced.</Paragraph> <Paragraph position="3"> While the global training approach presented in this paper is simple, after a98a89a88 iterations or so, the alternatives that are being added to the relevant set differ very little from each other, slowing down the training considerably such that the set of possible block translations a114a115a7a6a113 a14 might not be fully explored. As mentioned in Section 2, the current approach is still able to handle real-valued features, e.g. the language model probability. This is important since the language model can be trained on a much larger monolingual corpus.</Paragraph> </Section> class="xml-element"></Paper>