File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/p05-1033_concl.xml
Size: 1,713 bytes
Last Modified: 2025-10-06 13:54:44
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-1033"> <Title>A Hierarchical Phrase-Based Model for Statistical Machine Translation</Title> <Section position="7" start_page="268" end_page="269" type="concl"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> Hierarchical phrase pairs, which can be learned without any syntactically-annotated training data, improve translation accuracy significantly compared with a state-of-the-art phrase-based system. They also facilitate the incorporation of syntactic information, which, however, did not provide a statistically significant gain.</Paragraph> <Paragraph position="1"> Our primary goal for the future is to move towards a more syntactically-motivated grammar, whether by automatic methods to induce syntactic categories, or by better integration of parsers trained on annotated data. This would potentially improve both accuracy and efficiency. Moreover, reducing the grammar size would allow more ambitious training settings. The maximum initial phrase length is currently 10; preliminary experiments show that increasing this limit to as high as 15 does improve accuracy, but requires more memory. On the other hand, we have successfully trained on almost 30M+30M words by tightening the initial phrase length limit for part of the data. Streamlining the grammar would allow further experimentation in these directions.</Paragraph> <Paragraph position="2"> In any case, future improvements to this system will maintain the design philosophy proven here, that ideas from syntax should be incorporated into statistical translation, but not in exchange for the strengths of the phrase-based approach.</Paragraph> </Section> class="xml-element"></Paper>