File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/p05-1048_concl.xml
Size: 3,435 bytes
Last Modified: 2025-10-06 13:54:42
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-1048"> <Title>Word Sense Disambiguation vs. Statistical Machine Translation</Title> <Section position="9" start_page="392" end_page="393" type="concl"> <SectionTitle> 8 Conclusion </SectionTitle> <Paragraph position="0"> The empirical study presented here argues that we can expect that it will be quite difficult, at the least, to use standard WSD models to obtain significant improvements to statistical MT systems, even when supervised WSD models are used. This casts significant doubt on a commonly-held, but unproven, assumption to the contrary. We have presented empirically based analysis of the reasons for this surprising finding.</Paragraph> <Paragraph position="1"> We have seen that one major factor is that the statistical MT model is sufficiently accurate so that within the training domain, even the state-of-the-art dedicated WSD model is only able to improve on its lexical choice predictions in a relatively small proportion of cases.</Paragraph> <Paragraph position="2"> A second major factor is that even when the dedicated WSD model makes better predictions, current statistical MT models are unable to exploit this. Under this interpretation of our results, the dependence on the language model in current SMT architectures is excessive. One could of course argue that drastically increasing the amount of training data for the language model might overcome the problems from the language model effect. Given combinatorial problems, however, there is no way at present of telling whether the amount of data needed to achieve this is realistic, particularly for translation across many different domains. On the other hand, if the SMT architecture cannot make use of WSD predictions, even when they are in fact better than the SMT's lexical choices, then perhaps some alternative model striking a different balance of adequacy and fluency is called for. Ultimately, after all, WSD is a method of compensating for sparse data. Thus it may be that the present inability of WSD models to help improve accuracy of SMT systems stems not from an inherent weakness of dedicated WSD models, but rather from limitations of present-day SMT architectures.</Paragraph> <Paragraph position="3"> To further test this, our experiments could be tried on other statistical MT models. For example, the WSD model's predictions could be employed in a Bracketing ITG translation model such as Wu (1996) or Zens et al. (2004), or alternatively they could be incorporated as features for reranking in a maximum-entropy SMT model (Och and Ney, 2002), instead of using them to constrain the sentence translation hypotheses as done here. However, the preceding discussion argues that it is doubtful that this would produce significantly different results, since the inherent problem from the &quot;language model effect&quot; would largely remain, causing sentence translations that include the WSD's preferred lexical choices to be discounted. For similar reasons, we suspect our findings may also hold even for more sophisticated statistical MT models that rely heavily on n-gram language models. A more grammatically structured statistical MT model that less n-gram oriented, such as the ITG based &quot;grammatical channel&quot; translation model (Wu and Wong, 1998), might make more effective use of the WSD model's predictions.</Paragraph> </Section> class="xml-element"></Paper>