File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/j04-2003_concl.xml
Size: 4,117 bytes
Last Modified: 2025-10-06 13:53:57
<?xml version="1.0" standalone="yes"?> <Paper uid="J04-2003"> <Title>c(c) 2004 Association for Computational Linguistics Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information</Title> <Section position="10" start_page="201" end_page="202" type="concl"> <SectionTitle> 8. Conclusion </SectionTitle> <Paragraph position="0"> In this article we have proposed methods of incorporating morphological and syntactic information into systems for statistical machine translation. The overall goal was to improve translation quality and to reduce the amount of parallel text necessary to Table 11 Results for hierarchical lexicon model Nespole! &quot;Restructuring&quot; entails treatment of question inversion and separated verb prefixes as well as merging of phrases in both languages. The same conventional dictionary was used as in the experiments the Verbmobil. The language model was trained on a combination of the English parts of the Nespole! corpus and the Verbmobil corpus.</Paragraph> <Paragraph position="1"> Computational Linguistics Volume 30, Number 2 train the model parameters. Substantial improvements on the Verbmobil task and the Nespole! task were achieved.</Paragraph> <Paragraph position="2"> Some sentence-level restructuring transformations have been introduced which are motivated by knowledge about the sentence structure in the languages involved. These transformations aim at the assimilation of word orders in related sentences. A hierarchy of equivalence classes has been defined on the basis of morphological and syntactic information beyond the surface forms. The study of the effect of using information from either degree of abstraction led to the construction of hierarchical lexicon models, which combine different items of information in a log-linear way. The benefit from these combined models is twofold: First, the lexical coverage is improved, because the translation of unseen word forms can be derived by considering information from lower levels in the hierarchy. Second, category ambiguity can be resolved, because syntactical context information is made locally accessible by means of annotation with morpho-syntactic tags. As a side effect of the preparative work for setting up the underlying hierarchy of morpho-syntactic information, those pieces of information inherent in fully inflected word forms that are not relevant for translation are detected.</Paragraph> <Paragraph position="3"> A method for aligning corresponding readings in conventional dictionaries containing pairs of fully inflected word forms has been proposed. The approach uses information deduced from one language side to resolve category ambiguity in the corresponding entry in the other language. The resulting disambiguated dictionaries have proven to be better suited for improving the quality of machine translation, especially if they are used in combination with the hierarchical lexicon models. The amount of bilingual training data required to achieve an acceptable quality of machine translation has been systematically investigated. All the methods mentioned previously contribute to a better exploitation of the available bilingual data and thus to improving translation quality in frameworks with scarce resources. Three setups for training the parameters of the statistical lexicon on Verbmobil data have been examined: (1) Using the full 58,000 sentences comprising the bilingual training corpus, (2) restricting the corpus to 5,000 sentences, and (3) using only a conventional dictionary. For each of these setups, a relative improvement in terms of subjective sentence error rate between 13% and 15% as compared to the baseline could be obtained using combinations of the methods described in this article. The amount of bilingual training data could be reduced to less than 10% of the original corpus, while losing only 1.6% in accuracy as measured by the subjective sentence error rate. A relative improvement of 16.5% in terms of subjective sentence error rate could also be achieved on the Nespole! task.</Paragraph> </Section> class="xml-element"></Paper>