File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1016_concl.xml
Size: 2,536 bytes
Last Modified: 2025-10-06 13:53:51
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1016"> <Title>Extending MT evaluation tools with translation complexity metrics</Title> <Section position="8" start_page="5" end_page="5" type="concl"> <SectionTitle> 7 Conclusion and future work </SectionTitle> <Paragraph position="0"> In this paper, we presented empirical evidence for the observation that the complexity of an MT task influences automated evaluation scores. We proposed a method for normalising the automated scores by using a resource-light parameter of the average number of syllables per word (ASW), which relatively accurately approximates the complexity of the particular text type for translation.</Paragraph> <Paragraph position="1"> The fact that the potential complexity of a particular text type for translation can be accurately approximated by the ASW parameter can have an interesting linguistic interpretation. The relation between the length of the word and the number of its meanings in a dictionary is governed by the Menzerath's law (Koehler, 1993: 49), which in its most general formulation states that there is a negative correlation between the length of a language construct and the size of its &quot;components&quot; (Menzerath, 1954; Hubey, 1999: 239). In this particular case the size of a word's components can be interpreted as the number of its possible word senses. We suggest that the link between ASW and translation difficulty can be explained by the fact that the presence of longer words with a smaller number of senses requires a more precise word sense disambiguation for shorter polysemantic words, so the task of word sense disambiguation becomes more demanding: the choice of very specific senses and the use of more precise (often terminological translation equivalents) is required.</Paragraph> <Paragraph position="2"> Future work will involve empirical testing of this suggestion as well as further experiments on improving the stability of the normalised scores by developing better normalisation methods. We will evaluate the proposed approach on larger corpora containing different genres, and will investigate other possible resource-light parameters, such as type/token ratio of the source text or unigram entropy, which can predict the complexity of the translated text more accurately. Another direction of future research is comparison of stability of evaluation scores on subsets of the evaluated data within one particular text type and across different text types.</Paragraph> </Section> class="xml-element"></Paper>